Welp, this is never fun. We discovered an issue with how our primary model was making state-by-state and district-by-district forecasts. Specifically, the model was not properly calculating the demographic regressions that we use as a complement to the polls.
The top-line effect of fixing the error was not very large — for instance, Sen. Bernie Sanders’s chances of getting a delegate majority fell by around 3 percentage points, while former Vice President Joe Biden’s fell by around 1.5 points. In fact, it mainly helped the chances that no one will get a majority of pledged delegates (those chances rose about 3.5 percentage points), at the expense of any one candidate’s chances of getting a majority. The one candidate who did benefit was former New York Mayor Michael Bloomberg, whose chances increased from 7 percent to 9 percent). However, the changes are more notable in some individual states.
This problem was introduced into the model’s code on Feb. 5 (just after the Iowa caucuses); these functions had been working properly before then. The error was fixed as of the forecast we published just after noon on Feb. 19.
Here’s a look at what went wrong and what changed:
First, what is a demographic regression? To infer a candidate’s standing in all 57 states and territories at any given time, our model calculates a series of demographic regressions based on (i) the results of states that have voted so far and (ii) the polls in states where we have abundant polling. For instance, the regressions can figure out that Biden is strong in states with a large African American populations, and that Sanders is strong in liberal states. These demographic regressions are then combined with a geographic prior based on candidates’ home states and regions (for example, Sanders is assumed to be strong in New England). The result is then used as a substitute for polling in states where there are no polls and as a complement to the polls in states where there isn’t much polling. Nevada and South Carolina, for instance, have a fair amount of polling but not as much as the model would like, so the regression gets a small amount of weight in our forecasts.
What was wrong with the regressions? Basically, the regressions weren’t calculating at all, so the model was just defaulting to the geographic prior. (If you want to get very technical, when programming in Stata, please remember that local macros aren’t stored in the program’s memory when you execute another do-file from within the shell of a master do-file.)
What problems was this causing? By just defaulting to the geographic prior rather than also using demographics, the state-by-state forecast distributions were too compressed, or underdispersed. In other words, they weren’t recognizing fairly obvious demographic strengths and weaknesses of each candidate.
While relying solely on the geographic prior isn’t a terrible approach, especially for candidates such as former South Bend, Indiana, Mayor Pete Buttigieg who are strong in their home regions, it doesn’t capture some of the variations elsewhere in the Democratic electorate. Let’s take the presence of African Ameircans and moderate white voters in the South as an example. Because these two demographic groups are so important in that region, Biden is stronger there and weaker elsewhere than you’d guess from geography alone. The same holds true, though to a lesser extent, for Bloomberg. Conversely, Sanders has more strength in the West than you’d guess based on geography alone, because Western electorates tend to be liberal and to have a high number of Hispanic voters, and Sanders does well with both of those voting blocs.
As a result, the model was, for instance, underrating Biden’s chances in states like South Carolina and Alabama and underrating Sanders’s chances in Nevada and California. Since the demographic regressions are phased out in states with a lot of polling, the effect was larger in states with less polling, like Colorado. On average, candidates’ projected vote shares changed by about 1 percentage point (e.g. shifted from 18 percentage points to 19 percentage points) as a result of the bug fix.
What was that about district-by-district forecasts? The model also uses the demographic regressions to forecast the results in individual districts. For instance, although Sen. Elizabeth Warren might not be strong in South Carolina overall, she could be poised to do fairly well in a South Carolina district with a lot of college-educated voters.
Without the demographic regression, though, the model defaulted toward using the statewide forecast in each district. It did still account for random variation between districts — so there might be simulations where, say, Warren got 12 percent of the vote in South Carolina but 17 percent in a particular district, thereby earning delegates there. (Democratic rules generally require that a candidate receives at least 15 percent of the vote to qualify for delegates in a state or district.)
Still, with the demographic regression not working properly, the model wasn’t accounting for enough district-by-district variation. As a result, it tended to underestimate the number of district delegates that a candidate would expect to earn when they finished with less than 15 percent of the vote statewide. This matters because about two-thirds of Democratic delegates are awarded by district rather than statewide. The ability of candidates to earn these district delegates makes it slightly harder for front-runners (e.g. Sanders) to accumulate runaway delegate margins, which in turn makes a “no majority” scenario more likely.
How can you prevent something like this from happening again? I don’t know. We’ve gotten our share of forecasts wrong and had models that, in retrospect, we wish we’d designed differently. But I usually have a good eye for when code changes or new data creates output that doesn’t look right because either the data or the code was incorrect. I didn’t catch it this time, perhaps because the code changes were introduced at the same time that we input initial Iowa results into the model, and Iowa had a much larger impact on the model than the bug we introduced.
But as a note to myself and to other people who program statistical models: It may not always be a good idea to introduce failsafes into your program rather than letting it break. For instance, our model had a failsafe to default to the geographic adjustment if it couldn’t calculate the demographic regressions. But because the geographic adjustment by itself produces reasonable enough (but far from ideal) answers, it became harder to notice the error than it would have been if the program had stopped executing or had produced self-evidently ridiculous forecasts.
So if you see something that looks weird in the forecasts, please let us know! Reader feedback alerts us to a lot of small issues (like polls that were entered incorrectly) and occasionally helps us to catch some bigger ones, too. My sincere apologies for not catching this bug sooner.