The FiveThirtyEight Senate forecast model launched earlier this month. Right now, it shows Republicans with about a 53 percent chance of picking up the Senate next year. We owe you a lot more detail about how that forecast is calculated and how it might change between now and Nov. 4 — and how our model differs from some of the others out there.
This article, which outlines the model’s methodology, is going to be on the detailed side. I’ve tried to keep the descriptions in plain language as often as possible (the footnotes get somewhat more technical). But it’s meant to be a reasonably comprehensive reference guide rather than breezy bedtime reading.
First, however, I want to describe the principles behind the model. Some of these are more philosophical and abstract — they describe what I think of as best practices for applied statistical modeling. I can get passionate about this stuff — but somewhat contrary to the media portrayal of election forecasters as wizards who conjure up spells from their spreadsheets, our goal is not to divine some magic formula that miraculously predicts every election. Instead, it’s to make sense of publicly available information in a rigorous and disciplined way.
Principle 1: A good model should be probabilistic, not deterministic.1
The FiveThirtyEight model produces probabilistic forecasts as opposed to hard-and-fast predictions. In the same sense a weather forecaster might tell you there’s a 20 percent chance of rain tomorrow, the FiveThirtyEight model might estimate the Democrat has a 20 percent chance of winning the Senate race in Kentucky.
My view is that this is often the most important part of modeling — and often the hardest part. Predictions of the most likely outcome (“The Democrat will win the race by 3 percentage points”) are sometimes relatively immune to changes in methodology. But probabilistic forecasts can be very sensitive to them. Does that 3-point lead translate into a 60 percent chance of winning? Or a 95 percent chance? Or what?
This can be tricky. Quick-and-dirty assumptions, like that the margin of error expressed in a poll is a complete reflection of its accuracy, sometimes don’t hold up well in the real world. In our database of polls conducted in the final three weeks of campaigns since 1998, the actual results fell outside of the poll’s reported margin of error2 almost 25 percent of the time.3 So it’s important to model the error empirically — based on how well the polls have done in past races — instead of taking shortcuts.
In fact, while FiveThirtyEight’s forecasts are sometimes seen as extremely bold when compared with news media coverage of campaigns,4 they are fairly conservative compared with some other forecasting models. For a variety of reasons,5 statistical models are prone toward overconfidence unless they’re designed carefully.
The best test of a probabilistic forecast is whether it’s well calibrated. By that I mean: Out of all FiveThirtyEight forecasts that give candidates about a 75 percent shot of winning, do the candidates in fact win about 75 percent of the time over the long run? It’s a problem if these candidates win only 55 percent of the time. But from a statistical standpoint, it’s just as much of a problem if they win 95 percent of the time.
Fortunately, FiveThirtyEight’s Senate forecasts have historically been well calibrated. We’ve posted the data on GitHub so that you can check them out for yourself. For example, out of the 12 instances6 where we gave a candidate between an 85 percent and a 95 percent chance of winning on Election Day, the favored candidate won in 11 cases, or 92 percent of the time.7 We also have a track record of being well calibrated in other types of election forecasts and sports forecasts.
Principle 2: A good model ought to be empirical.
Simplicity can be a virtue in model-building, but every model must be reasonably consistent with the evidence. It’s one thing to have stress-tested your model, determined that only a few things really matter, and removed all the superfluous bits. But if you don’t account for a certain variable or some statistical property in your model, you may be making an implicit assumption that it doesn’t matter much — when sometimes it does.
For instance, it’s probably wrong to treat registered voter polls the same as likely voter polls when there’s reasonably clear historical evidence that registered voter polls tend to overrate the standing of Democrats. As I mentioned, it’s almost certainly wrong to assume the error in a poll is fully captured by its reported margin of error. It’s also wrong to assume that the error in one poll is independent from the next — about as often as not, the polls all miss in the same direction, which can lead to late-breaking “waves” toward one or the other party.
It’s also important to check whether modeling choices have a sound basis in theory. One variable our model uses is an ideology score for each candidate, which seeks to estimate how liberal or conservative he or she is relative to voters in his or her state. This variable is highly statistically significant in predicting election outcomes — but just as important is that it has a strong basis in the political science literature (in this case, see the median voter theorem). By contrast, if we’d found that past election outcomes had been well predicted by the number of consonants a candidate had in her middle name, we’d strongly suspect this was a statistical fluke and we wouldn’t include it in our model.
Principle 3: A good model ought to respond sensibly to changes in inputs.
Models that gyrate around wildly with the slightest provocation should be viewed skeptically. So should those unmoved by even important-seeming information. The FiveThirtyEight Senate model tends to produce fairly stable forecasts. That can make things a bit dull from day to day; most of the time, the new polls and data we collect have little effect on the bottom line. But now and then almost all the polls on a particular day favor one or the other party, and the overall forecast for Senate control moves by a few percentage points in that party’s direction.
There will sometimes be more volatility in the forecast toward the end of the campaign because late changes in the polls can’t be reversed before Election Day. A sports analogy is helpful here: An NFL team that kicks a field goal with two minutes to play in the first quarter becomes only a 59 percent favorite to win,8 but one that does so with two minutes to play in the fourth quarter becomes an 83 percent favorite. Likewise, a 2-point shift in the polls can produce a larger change in win probabilities late in an election.
Principle 4: A good model ought to avoid changing its rules in midstream.
We’re sometimes guilty of talking about the FiveThirtyEight model as though it has a mind of its own. It doesn’t. It’s just a computer program — and we wrote the program.
However we don’t “tweak” the forecast in a given state just because we don’t like the outcome. And we avoid changing the program once we’ve launched the forecasts.9 Nor do we change it that much from year to year — the Senate model is perhaps 80 percent to 90 percent the same as when we launched it in 2008, and is also largely similar to our presidential forecast model. All versions of these models have used polling along with non-polling data, have been probabilistic rather than deterministic, and so forth.
Speaking of that model — what does it do, exactly? There are seven major steps.
Step 1: Weighted polling average
We start by collecting polls — lots of polls. There are only a few types of polls we discard. First are those from pollsters that we know or suspect to have faked their results or to have engaged in other gross ethical misconduct — for instance, Strategic Vision and Research 2000.10 Next, we exclude internal polls conducted directly on behalf of candidates or party organizations like the Democratic Senatorial Campaign Committee and the Republican National Committee, which tend to be inaccurate and biased.11
We err strongly on the side of inclusiveness; the threshold for excluding a poll is high.12 The model has a lot of other defense mechanisms, particularly in minimizing the effect of polls that show signs of partisan bias through our “house effects” adjustment (which I’ll describe in Step 2). For more about our philosophy on this, see this discussion.
A poll is weighted based on three factors:
- How recently it was conducted. Older polls are penalized through an exponential decay formula. The penalty becomes stiffer — that is, more emphasis is placed on recency — the closer we get to the election.13 Our research suggests that the news media place too much emphasis on recency and would be better off looking at a broader range of polls.
- The poll’s sample size. Polls that sample more voters receive a larger weight, although there are diminishing returns. In particular, we’ve found that the improvement in accuracy from sampling more voters is not as large in practice as it’s supposed to be in theory.14 The reasons for this are interesting — we’ll discuss them more when we release our pollster ratings. But the implication is that one should be careful about weighting an otherwise dubious poll heavily just because it took a large sample. This reflects a slight refinement from previous years, when the model placed more emphasis on sample size.15
- The pollster rating. We’ve released an entirely new set of pollster ratings for 2014 and explain the process for calculating them in much more depth in a separate article. The method is similar to one we used in 2010, however, in that pollsters are rated on the basis of both their past accuracy16 and on two easily measurable proxies for methodological quality. First, is whether the polling firm is a member of industry groups and initiatives like the AAPOR Transparency Initiative.17 And second is whether the firm regularly calls cellphones in addition to landlines.18 These factors don’t tell you everything you need to know about a poll — but they tend to be correlated with other strong methodological practices. More importantly, these methodological variables are strong predictors of more accurate polling results going forward.
Truth be told, the poll weights don’t always make a huge impact — although this year could be an exception given how many close races there are. The main way to go wrong is probably in placing too much emphasis on the most recent polls, which can lead to unwarranted volatility.
A few bits of housekeeping on other polling situations that come up from time to time:
- Pollsters routinely poll the same races multiple times. In these cases, we don’t “throw out” the old poll, but it gets a lower weight; see here for how that works.
- If a pollster lists both likely voter and registered voter results, we use the likely voter version.19
- In other cases where the pollster releases multiple versions of the same poll — for instance, results drawn from two different turnout models, or results with and without a minor candidate included20 — we simply average all applicable versions together.
- Tracking polls, which contain overlapping dates in their samples, are weighted based on the number of new respondents in each edition of the poll.21
Step 2: Adjustments to the polling average
The FiveThirtyEight model performs three sets of adjustments to the polls: a likely voter adjustment, a house effects adjustment and (usually the least important of three) a trend line adjustment.
The rationale for the likely voter adjustment is explained at some length here. Polls of likely voters are almost always more favorable to Republicans than polls of broader samples, like registered voters. But polls of likely voters also tend to be more accurate and less biased, especially in midterm years.
So as a default, the FiveThirtyEight model shifts registered voter polls (and polls of all adults) toward Republicans to make them more comparable to likely voter surveys. In particular, the model defaults to shifting polls of registered voters toward Republicans by 2.7 percentage points, which is the historical average difference between likely voter and registered voter polls in midterm years. However, the magnitude of the shift is updated based on polls like this one that list both registered voter and likely voter results in the same survey.22 So far this year, the average23 gap has been just above 3 points.
The house effects adjustment accounts for the tendency of some polling firms to consistently show more favorable results for one or the other party.24 It works by means of a regression analysis on all Senate and generic congressional ballot polls.25 This is one of a number of reasonable ways of comparing a poll against others of the same state.26
However, one or two (or even a few) polls may not tell you that much about a pollster’s house effect; variation from other pollsters may just be statistical noise instead. Furthermore, a model without some tolerance for differences of opinion among pollsters may deprive itself of the benefits of aggregating polls together in the first place. The FiveThirtyEight model handles this by calculating a “buffer zone”27 based on the number of polls a firm has released. For instance, a firm with relatively few polls might have a buffer zone of 2 percentage points. Any house effect beyond that buffer zone is subtracted from the poll, so a polling firm with a 5-point Republican house effect and a 2-point buffer zone will have its results adjusted toward Democrats by 3 points.28
A new feature in the model this year is that house effects from past years29 are used to help calibrate the house effects adjustment.30 However, their influence is relatively minor.31 House effects are generally fairly consistent from election to election, but there are exceptions; for instance, the firm Rasmussen Reports, which had a strong Republican house effect the past, has little house effect so far this year. The main case where this is helpful is when a firm with a long history of partisan polling drops in to poll a few races after having been dormant for most of the cycle.
Another question is how to calculate the baseline that other polls are compared against. We use a weighted average, where the weight is based on the number of polls a firm has released and its pollster rating. This means that the baseline is determined mostly by what the stronger polling firms are saying.32 In 2012, this worked to Democrats’ benefit — the higher-rated polling firms tended to show stronger results for them — but we’ve observed no such consistent pattern this year.
The trend line adjustment is an important part of the FiveThirtyEight presidential model, but not so important in the Senate model. In forecasting the presidential race, you can make accurate inferences about how the polls are changing in one state based on how they’re changing in other states. For instance, if Barack Obama had gained several points relative to Mitt Romney in both Michigan and Minnesota, you could be almost certain that he’d also gained ground in Wisconsin even if Wisconsin hadn’t been polled recently.
In Senate elections, however, there are different candidates on the ballot in each state — so the inferences are much weaker. Instead, the FiveThirtyEight trend line adjustment is calculated solely based on generic congressional ballot polls. It works by looking for changes in the generic congressional ballot as tracked by the same polling firms over the same sample populations — for instance, Quinnipiac polls of registered voters — and then backing out a time trend by means of a lowess smoothing regression. See here for more detail.
The trend line adjustment currently detects some Republican movement on the generic ballot. However, the adjustment is applied conservatively in the Senate model.33 It currently shifts the polling average in each state toward Republicans by an average of only 0.2 percentage points.
Let’s interrupt here to draw some probability distributions. One narrative holds that there are big differences between those Senate models that look only at polls and those that look at polls along with other factors. But it’s more complicated than that. In recent days, running our model based on the adjusted polling average alone (after Step 2) would reduce the GOP’s chances of controlling the Senate by about 5 percent — it doesn’t make a huge difference.
What’s a more complete story? Small differences matter this year because both individual states and the overall Senate race are so close. Most election models (including ours) work in something like the following way: First, they calculate the most likely outcome in a particular state (“The Republican wins by 1 point”) and then they determine the degree of uncertainty around that estimate. Most models do this by means of a normal distribution or something similar to it. In this type of statistical distribution, all outcomes within the margin of error are not equally likely; instead, those closer to the mean of the distribution are more probable.
The graphic below, for example, illustrates a normal distribution with a mean of +1 (as in, a candidate is ahead by 1 point in the polls) and a standard deviation of 5. In this example, we’ll take positive values to mean the Republican wins the race and negative values to mean the Democrat does. According to the normal distribution, the Republican will win 58 percent of the time.
But if we shift the center of the distribution by just 1 point toward the Republican — say, our model averages the polls together a little differently than someone else’s, and it projects her to win by 2 points instead of 1 — it has a noticeable effect on the probabilities. Not huge, but noticeable: She’s gone from being a 58 percent favorite to a 66 percent favorite.
By contrast, if we’d given the Republican an additional point when she was already well ahead, it wouldn’t make much difference. If she were up by 10 points in the polls, for instance, she’d already be a 97.7 percent favorite according to the normal distribution; putting her up by 11 points instead would only increase that chance to 98.6 percent.
However, there’s another way we can affect the candidate’s win probability: by changing the standard deviation. In the example below, I’ve kept the Republican’s lead at 2 points. But I’ve reduced the standard deviation to 2 points instead of 5. Now, with that mere 2-point lead she’s suddenly an 84 percent favorite to win.
In my view, far too little attention is paid to those questions. What is the uncertainty in the forecast, as opposed to the most likely result?
I don’t like to call out other forecasters by name unless I have something positive to say about them — and we think most of the other models out there are pretty great. But one is in so much perceived disagreement with FiveThirtyEight’s that it requires some attention. That’s the model put together by Sam Wang, an associate professor of molecular biology at Princeton.
That model is wrong — not necessarily because it shows Democrats ahead (ours barely shows any Republican advantage), but because it substantially underestimates the uncertainty associated with polling averages and thereby overestimates the win probabilities for candidates with small leads in the polls. This is because instead of estimating the uncertainty empirically — that is, by looking at how accurate polls or polling averages have been in the past — Wang makes several assumptions about how polls behave that don’t check out against the data.34
There’s a rich record of those assumptions failing and resulting in highly overconfident forecasts. In 2010, for example, Wang’s model made Sharron Angle the favorite in Nevada against Harry Reid; it estimated she was 2 points ahead in the polls, but with a standard error of just 0.5 points. If we drew a graphic based on Wang’s forecast like the ones we drew above,35 it would have Angle winning the race 99.997 percent of the time, meaning that Reid’s victory was about a 30,000-to-1 long shot. To be clear, the FiveThirtyEight model had Angle favored also, but it provided for much more uncertainty. Reid’s win came as a 5-to-1 underdog in our model instead of a 30,000-to-1 underdog in Wang’s; those are very different forecasts.
There are a number of other examples like this. Wang projected a Republican gain of 51 seats in the House in 2010, but with a margin of error of just plus or minus two seats. His forecast implied that odds against Republicans picking up at least 63 seats (as they actually did) were trillions-and-trillions-to-1 against.36 If you want a “polls only” model that estimates the uncertainty more rigorously, I’d recommend The Huffington Post’s or Drew Linzer’s.
I wanted to get that out of the way before proceeding to the state fundamentals calculation, which is one of the more complicated and “controversial” parts of the FiveThirtyEight model — but also one that ultimately doesn’t have that much influence on the forecast.
Step 3: Calculate state fundamentals
In presidential elections, as I mentioned earlier, you can take advantage of the fact that the same two candidates are on the ballot in each state. This makes it much easier to make comparisons from one state to the next.37 That isn’t true for Senate races, where the state fundamentals are a rough guide; they miss the final margin in the race by an average of something like 9 percentage points.
Why bother at all? One reason is that you sometimes have no alternative; the occasional Senate race gets literally no polling. Or it gets very limited polling.38 In states like Alaska and Kansas this year, we have little idea of what’s going on from the polls alone. It helps to have some backstop, like knowing that both states are extremely Republican-leaning.
Another reason to look beyond polls is to prevent abrupt shifts in the forecast. For instance, the recent strong polling for Republicans in Kentucky or for Democrats in Michigan put those races more in line with how our fundamentals calculation has them.
In any event, the state fundamentals estimate is based on a series of non-polling indicators that have historically shown some predictive power in Senate races; their relative importance is determined by regression analysis. The indicators are as follows:
The generic congressional ballot. This provides an indication of the overall partisan mood in the country. As of this writing, the FiveThirtyEight model has the generic ballot favoring the Republicans by about 3 percentage points.39
Congressional approval ratings.40 This is the other national indicator. It doesn’t work toward the benefit of either party — instead, it informs the model about the overall amount of antipathy toward incumbents regardless of their party. Right now, congressional approval ratings remain near their historic lows, which mitigates some of the incumbency advantage.41
Fundraising totals. Fundraising data is a useful indicator for a number of reasons: It can reflect the grassroots support for a candidate, or a candidate’s overall level of organization — and money can be exchanged for goods and services like advertisements and a better turnout operation. Our model specifies this variable as the proportion of funds raised by each major-party candidate. For instance, if the Democrat has raised $3 million and the Republican has raised $1 million, the Democrat has raised 75 percent of the money. This definition accounts for the diminishing returns associated with additional fundraising.42 The FiveThirtyEight model looks only at the sum of individual public contributions — as opposed to funds raised through PACs or “Super PACs,” funds donated by the parties, or funds contributed by the candidates themselves. So far this year, this is one of the reasons for Democrats to be optimistic — they’ve outraised Republicans by our definition in almost all of the most important Senate races.
Highest elected office held. This is among the less important variables43 but it has some influence. We rate candidates on a 4-point scale based on the highest office they’ve been elected to:
- 3 points for current or former governors or senators — by definition including all elected incumbent senators44;
- 2 points for members of the House of Representatives, candidates holding statewide elected office (like state attorneys general and lieutenant governors) and mayors of large cities45;
- 1 point for other nontrivial elected offices, such as state senator or state representative;
- 0 points for candidates who have never been elected to any substantive position.46
Margin of victory in most recent Senate election. This variable applies to elected incumbents only.47 Past victory margin is not a terribly reliable indicator — the political mood can shift a lot in six years — but it does tell you something. Victory margins are adjusted relative to the national climate48 in the re-election year. That hurts this year’s crop of Democratic incumbents, since most of them were last elected in 2008, a high-water mark for the party. For instance, Sen. Kay Hagan of North Carolina won her race by an impressive 8.5 percentage points against Elizabeth Dole in 2008 — but that came in an environment when Democrats won the national popular vote for the U.S. House by 10.6 percentage points. That implies Hagan might not have won her election in a neutral political environment. Candidates who did not face major-party opposition in their last re-election bid, such as Mark Pryor of Arkansas, are treated as having won re-election by about 40 percentage points.
Candidate ideology and state partisanship. You can think of these as two variables or as one — the FiveThirtyEight model links them together. It estimates the conservative-liberal ideology of a candidate and then compares it against the estimated ideology of voters in the state. The larger the difference between them, the worse the candidate is expected to perform.
- DW-Nominate scores, which reflect a candidate’s voting record in Congress;
- CFscores — created by Adam Bonica of Stanford University — which estimate left-right ideology based on the identity of a candidate’s donors;
- OnTheIssues.org scores, which reflect public statements made by the candidate on a series of policy issues ranging from gay marriage to tax policy.50
The score from each system is normalized such that each has the same average and standard deviation — this allows for a direct comparison among them.
We in turn estimate the ideology of voters in each state based on two variables:
- Presidential results relative to the national average in 2012 and 2008;
- The winners of recent past congressional races in the state — as measured by the average DW-Nominate score of the state’s congressional delegation over the past four Congresses.51 This helps to account for states — for example, Arkansas — that vote very Republican for president but sometimes still elect Democrats to Congress.
This variable can make some difference. In a purple state that votes exactly in line with the national average, a “mainstream” Republican52 would be expected to perform a net of 4 percentage points better than a more conservative, so-called tea party Republican.53 This variable, for instance, helped to predict Sen. Claire McCaskill’s victory over the conservative Republican Todd Akin in Missouri in 2012. However, the Republican nominees this year are more moderate.
Among the more important Senate races, the state fundamentals estimate slightly hurts Democrats in Alaska, Kentucky, Louisiana, Minnesota and North Carolina, and slightly helps them in Arkansas, Georgia and Iowa. Its most important effect is in Kansas, where a center-left independent candidate, Greg Orman, is polling slightly ahead of the Republican incumbent, Pat Roberts, but where the fundamentals calculation has Roberts as a heavy favorite. However, the polling average and the fundamentals calculation have some tendency to converge toward one another, as has already happened in some states, such as Michigan.54
Step 4: Now-cast/snapshot
This part is pretty simple. The adjusted polling average (Step 2) and the state fundamentals estimate (Step 3) are combined into a single number that projects what would happen in an election held today. We’ve sometimes referred to this as the “now-cast” or “snapshot.”
This works by treating the state fundamentals estimate as equivalent to a “poll” with a weight of 0.35. What does that mean? Our poll weights are designed such that a 600-voter poll from a firm with an average pollster rating gets a weight of 1.00 (on the day of its release55; this weight will decline as the poll ages). Only the lowest-rated pollsters will have a weight as low as 0.35. So the state fundamentals estimate is treated as tantamount to a single bad (though recent) poll. This differs from the presidential model, where the state fundamentals estimate is more reliable and gets a considerably heavier weight.
In states with abundant recent polling, the state fundamentals calculation makes almost no difference. As of this writing, for instance, it gets only 6 percent of the overall weight in Kentucky and 7 percent in Iowa. However, it has more influence on states with less polling; the state fundamentals currently get 15 percent of the weight in Alaska, for example, and 23 percent in Delaware. As the election approaches, the state fundamentals tend to get less and less weight because the volume of polling increases.56
Step 5: Election Day forecast
The FiveThirtyEight model is explicitly meant to be a forecast of how the election will turn out on Nov. 4, 201457 — rather than an estimate of what would happen in an election held today.
Looking toward the future means there’s more uncertainty in the forecast — we’ll discuss that in Step 6. In addition, the model might anticipate an overall shift toward one party or another, although this has only a minor influence on the forecast at this point in the race.
In our presidential model, we calculate a projection of the national popular vote based on an economic index. This estimate also reflects the benefit of incumbency.58 You might think of this as an estimate of the “national fundamentals” (as opposed to the state fundamentals). In 2012, it suggested that Obama would win the national popular vote by about 2 percentage points.59 Whenever Obama established a polling lead of more than 2 percentage points, such as after the Democratic convention, this calculation subtracted something from his lead in the polls to forecast the Election Day result. And whenever Obama’s lead fell below 2 points, such after the first presidential debate in Denver, it added something to it. Put more technically, the model assumed that Obama’s result would revert toward the mean of how past incumbents had performed under similar economic conditions.
However, the model was designed such that the weight placed on the national fundamentals declined over the course of the campaign — until it was zero by Election Day. If the polls had shown Obama with an 11-point lead by Election Day, or Romney with a 9-point lead, that’s what the model would have shown — even though such a result would have been inconsistent with how economic conditions had affected presidential races in the past.
We’ve introduced a similar step into our Senate model this year in order to make it more consistent with our presidential model. However, the Senate version is quite a bit simpler and has less overall influence on the forecast. Specifically, the model assumes the generic congressional ballot will revert toward a mean of favoring the opposition party — in this case, Republicans — by about 5 percentage points as of Election Day, which is the ballot’s historic average performance in midterm elections since 1990.60 It does not account for economic conditions, presidential approval ratings, the favorability of the parties or any other factor.
As in the presidential model, however, the weight placed on this historic average decreases as the election year goes on — it’s already quite small, and it will be zero by Election Day. Furthermore, the Democrats’ position on the generic ballot has already declined to show a deficit with Republicans of about 3 percentage points, not much different from the long-term average.61 For these reasons, Step 5 barely changes the results — currently, it shows Republicans doing only about 0.2 percentage points better in each state than they would otherwise.62 It would make more difference earlier in the election year, or if the generic ballot were way out of line with historical trends.
Step 6: Estimate margin of error
If you’ve gotten this far, you’ll know that I think this step is important. As I’ve said, the goal of the FiveThirtyEight model is not to “call” races but instead to estimate the probability that each candidate will win. And some races are associated with more uncertainty than others.
This is nicely illustrated by our interactive Senate forecast — the picture below reflects how it looked as of Tuesday evening. The dots in the chart represent the most likely outcomes. The gray bars indicate the uncertainty (more precisely, the 90 percent prediction interval). In Colorado, for instance, while the most likely outcome is a win for the Democratic candidate by about 3 percentage points, the prediction interval runs all the way from a 12-point Democratic win to a 6-point win for the Republican. That’s already a fairly wide range — and remember, it captures only 90 percent of the cases. There’s also a 5 percent chance that the Democrat will win by more than 12 points and a 5 percent chance that the Republican will win by more than 6.
However, the prediction interval is even wider in some states, especially Kansas, Alaska and Louisiana. How are these intervals determined?
We’ve done it by looking at which factors are historically correlated with larger errors in the forecast63 — and we’ve identified six important ones. Most of these ought to be pretty intuitive.
- Uncertainty is larger the when there are more days to go until the election.64 This means the model will become more confident as Election Day approaches.
- Uncertainty is larger when there are fewer polls. This year’s election has featured fewer polls than elections in the recent past (and also lower-quality polls65), which makes the uncertainty higher than it was in 2008, 2010 or 2012. Note that the uncertainty is estimated on a state-by-state basis, so relatively well-polled states like Georgia are associated with less uncertainty than thinly polled ones like Alaska.
- Uncertainty is larger when the polls disagree more with one another. Take one state where two polls have the Democrat ahead by 5 points and another where the Democrat is tied with the Republican in one poll but 10 points ahead in another. The Democrat has a 5-point lead in the polling average in both states. But there is considerably more uncertainty, we’ve found, in the state where the polls disagree with one another.
- Uncertainty is larger when the polling average disagrees more with the state fundamentals. Another reason for calculating the “state fundamentals” estimate — however much weight you place on it — is to get a sense for whether it tells a consistent story with the polls. We’ve found that in states where there is more divergence between polls and fundamentals — as in Kansas this year — the uncertainty is much higher.
- Uncertainty is larger when there are more undecideds or third-party voters in the polls. This is another intuitive assumption that checks out in the data. A lot of Senate races this year feature high numbers of undecided voters, something that contributes to high uncertainty about their outcomes. Races with viable third-party candidates are also associated with very high volatility.
- Uncertainty is larger when the race is more lopsided. This is the one counterintuitive-seeming finding; isn’t the outcome more in doubt when the polls show a close race? Of course it is — if you’re concerned about who will win. But as measured by the difference between the polled and the actual margin in the race, the error tends to be larger when there is a bigger gap separating the candidates. It’s fairly common, for instance, for a candidate up by 40 points in the polls to win her race by 30 points or 50 points instead.
Step 7: Simulate outcomes and estimate the probability of each party controlling the Senate
Once we’ve completed Step 6, we’ve calculated what amounts to a mean (“The Republican is ahead by 2 points”) and a standard deviation (“plus or minus 5 points”) for the forecast in each state. As I mentioned, you can use a normal distribution to calculate a candidate’s win probability from these two factors alone. So why not just stop there?
One minor reason is because the FiveThirtyEight model does not quite use a normal distribution; instead it uses a transformation of the normal distribution with slightly fatter tails.66 The transformation gives extreme long-shot candidates slightly shorter odds; it might mean, for example, that we would have a candidate with a 0.5 percent chance to win his race instead of a 0.05 percent chance. But this process is not complicated and makes little difference. Instead, we run simulations to deal with a couple of more important problems.
One is that the error in the polls is not independent from state to state. In a number of recent elections, one party has either gained considerable ground in the closing stages of the race (as Democrats did in 2006), or the polls have had a strong overall bias toward one party or another on Election Day itself (as in 1994, 1998 and 2012). This property is not as pronounced as in presidential races, where the same two candidates are on the ballot in each state. But it happens often enough to worry about.
As I mentioned in Step 6, the model estimates the overall amount of error in each state based on the number of days until the election, the volume of polling there, the number of undecided voters and other factors. Before running the simulations, the model breaks the error down into two subcomponents: national error and state-specific error.67 National error68 affects every state in the same way; state-specific error, as its name implies, affects one state at a time.
In each simulation, the program draws a series of random numbers.69 The first number it draws represents national error; in one simulation, for instance, the draw might be “Republicans +2.” This means that Republicans will outperform their forecasts by 2 percentage points in every state in that simulation. Then it draws another number in each state. Perhaps in Arkansas, for instance, it comes up with “Democrats +5.”
These numbers are then added together to produce a simulated result in each state. In Arkansas in this example, it would mean the Democrat, Mark Pryor, outperformed his projection by 3 percentage points (despite Democrats having a poor night nationally). If Pryor had trailed his opponent by fewer than 3 points (or led by any margin), that would be enough to win him the race in that simulation. If he’d been behind more than 3 points, he’d still come up short.70
In this way, we can estimate not only each candidate’s chance of winning but also the overall number of seats each party will control, specifically accounting for the possibility that it could be a year like 1994 or 2012 when almost all of the races broke in the same direction.
The simulation is also helpful for handling viable third-party candidates such as Larry Pressler of South Dakota71 who present a couple of additional challenges.
One is that the range of outcomes for third-party candidates is not symmetric. A third-party candidate polling at 15 percent with some time to go in the race has some chance (not a lot) of gaining 20 points and finishing at 35 percent, in which case he could win. (As I’ve mentioned; vote shares for third-party candidates can be volatile.72) But he has no chance of losing 20 points and finishing at -5 percent.73 So we model the vote shares for third-party candidates based on a log-normal distribution, which accounts for this type of asymmetry.
Also, third-party candidates are often closer to one of the major-party candidates ideologically and therefore are more likely to “trade” votes with that candidate. In the Maine gubernatorial election in 2010, for example, the independent, Eliot Cutler, was left of center and much closer to the Democrat, Libby Mitchell, than to the conservative Republican candidate, Paul LePage. When Cutler suddenly began to gain ground late in the race, almost all of his votes were taken from Mitchell rather than from LePage. The model accounts for this by using the ideology scores we calculated in Step 4. In South Dakota, for example, Pressler’s vote share (Pressler is a former Republican) is more correlated with the Republican, Mike Rounds, than the Democrat, Rick Weiland. So in those simulations where Pressler does well, he takes more of his votes from Rounds. Likewise, when Pressler does poorly, he gives back most of his votes to Rounds.74
The very last step is simply counting up the number of seats won in each simulation, and adding them to the baseline of 34 Democratic seats and 30 Republican seats that are not up for grabs this year.75 The model assigns any third-party winners to one of the major parties; see here for more on how we do that. Greg Orman of Kansas has said he’ll caucus with whichever party is “clearly in the majority.” (This produces a kink in our probability distribution.) But based on Orman’s ideology score, he’s assigned a 75 percent probability of caucusing with Democrats in the event his vote would determine majority control.
Finally, we can tally the results across thousands76 of simulations to estimate the likelihood of a party finishing with a given number of seats. That produces a probability distribution that looks like this:
Simulations with 50-50 ties are resolved as producing Democratic control because of the tiebreaking vote of Vice President Joe Biden. So we count up the percentage of simulations in which Democrats finished with at least 50 seats; that represents their chances of retaining the Senate. The remaining cases go to the Republicans.
CORRECTION (Sept. 17, 9:23 a.m.): An earlier version of this story incorrectly referred to the Congress serving in the years 2007-08 as the 109th Congress. It was the 110th.
CORRECTION (Sept. 17, 11:34 a.m.): An earlier version of a footnote to this story gave the wrong state for where the Republican Bill Cassidy is running for Senate. He is running in Louisiana, not Kentucky.
CORRECTION (Sept. 17, 3:50 p.m.): An earlier version of this article misstated the percentage of cases in which a candidate we favored between 85 percent and 95 percent actually won. The article correctly stated that it was 11 in 12 cases, but that percentage is 92 percent, not 89 percent.