Soon after the midterm elections, we began our regular process of evaluating how FiveThirtyEight’s forecasts performed. We quickly discovered an error: We were using out-of-date data for one important source used in the Deluxe version of our forecast. Although this had little impact on the topline numbers for each party’s chance of controlling a chamber of Congress, it had modest-to-medium-sized effects on some individual races in the Deluxe forecast. It had no effect on the Lite or Classic forecasts.
The Deluxe forecast differs from the Classic and Lite forecasts in that it accounts for race ratings published by three groups: The Cook Political Report, Sabato’s Crystal Ball and Inside Elections. After adding new Inside Elections ratings for House races in late September, we noticed what we thought was an anomaly in the forecast. To investigate, we disabled automatic updates for that site’s House ratings. We determined that the election model was running correctly, but we neglected to re-enable automatic updates from Inside Elections. As a result, Inside Elections ratings for House races were frozen in time as of late September. (To be clear, this was FiveThirtyEight’s error and there is no fault whatsoever with Inside Elections or their ratings.)
If we had run the model with the updated ratings, the final forecast would still have shown Republicans with a 84 percent chance of winning the House, the same as our final forecast with the out-of-date ratings. And Republicans would have had a 55 percent chance of winning the Senate, instead of 59 percent. (Even though Inside Elections ratings for Senate and gubernatorial races were being updated, because of the way that the model works, there were some very minor, indirect effects on Senate and gubernatorial Deluxe forecasts as well.)1
Only one individual race forecast shifted by more than one category as a result of the error (e.g., a race shifting from “lean Republican” to “lean Democrat,” skipping over “toss-up”), and a number did have a one-category shift, as listed in the table below.
|forecast▲▼||race▲▼||rating▲▼||Dem odds▲▼||rating▲▼||Dem odds▲▼||Diff in Dem odds▲▼|
|House||AZ-02||Lean R||34.2||Likely R||22.2||-12.0|
|House||MN-02||Likely D||80.0||Lean D||68.8||-11.2|
|House||CA-49||Likely D||81.8||Lean D||71.4||-10.4|
|House||NJ-07||Lean R||28.4||Likely R||18.2||-10.2|
|House||NY-04||Likely D||77.7||Lean D||70.5||-7.2|
|House||CA-47||Likely D||79.7||Lean D||72.6||-7.1|
|House||TX-28||Likely D||75.9||Lean D||70.3||-5.6|
|House||OH-09||Likely D||77.8||Lean D||72.3||-5.5|
|House||CA-41||Solid R||5.3||Likely R||6.0||+0.7|
|House||NY-02||Solid R||3.6||Likely R||6.6||+3.1|
|House||AZ-01||Solid R||5.4||Likely R||10.7||+5.3|
|House||CA-45||Likely R||19.3||Lean R||27.4||+8.1|
|House||NY-01||Likely R||22.6||Lean R||31.7||+9.1|
|House||OH-01||Likely R||16.1||Lean R||29.9||+13.8|
|House||NM-02||Likely R||22.4||Lean R||37.2||+14.7|
|House||OH-13||Likely R||18.6||Lean R||33.9||+15.3|
|House||NC-13||Likely R||23.4||Lean R||39.1||+15.8|
Not listed in that table is the House race in Washington’s 3rd Congressional District, which did not see a change in its categorization. It was won by Democrat Marie Gluesenkamp Perez, who was listed with only a 2 percent chance in the forecast. If updated Inside Elections ratings had been used, she would have had a 4 percent chance instead. So the race was a major upset either way — although one should keep in mind that when a model issues forecasts for 435 House districts, some low-probability upsets are to be expected if the model is calibrated properly.
We are reviewing our internal processes for how to better identify errors of this nature. One lesson is that smaller errors are sometimes harder to detect than larger ones. If our forecast in a high-profile race such as Pennsylvania’s U.S. Senate election had differed dramatically from the consensus, we would quickly have investigated it. Small anomalies in a series of mostly low-profile House races are harder to detect with the “eye test,” however. We also strongly appreciate reader feedback, including alerting us to potentially anomalous forecasts. While our models are fairly complex, the forecasts should still follow logically from the inputs. If a given forecast is hard to explain, it may reflect a problem with the underlying data or with the way that we’re processing it.
In evaluating how FiveThirtyEight’s forecasts did — for example, comparing our performance against other forecasts — we would recommend that you use the original, as-published forecasts, even though they were using outdated Inside Elections ratings. We of course would have preferred to use the updated ratings, but we don’t think we should get credit for a mistake that we only identified after the fact. In conducting our own assessment of our forecast once all race calls are finalized, we will show you four versions instead of our usual three: Lite, Classic, Deluxe (as published) and Deluxe (corrected).
A complete set of files showing what our final Deluxe forecast would have shown given updated Inside Elections ratings can be found here.
FiveThirtyEight regrets the error. We appreciate the time you spend on the site, and we hope that you found our midterm elections coverage valuable despite it.