Construction Season Over (Technical)

This afternoon, I completed a series of refinements to both the trendline adjustment that was implemented two weeks ago, and the mean-reversion adjustment that was implemented earlier this week. I am hopeful that these will be the last significant changes to our methodology. The refinements are described in more detail below.

Changes to Trendline Adjustment

The most noticeable change is that the trendline curve has been retooled to be considerably more sensitive to changes in the polling data. For example, compare the curve we’re using now (this is the top graph) to the one we had in place a couple of days ago (the bottom graph):

The more sensitive curve does a much more intuitive job of pinpointing Obama’s post-primary bounce. Rather than showing a leisurely jaunt upward for Obama in the polls over the course of the past month, it instead has his numbers improving much more steeply right as the primaries end, but then leveling off. In fact, the new curve thinks that Obama’s numbers peaked shortly after Hillary Clinton’s concession speech and that he’s lost perhaps half a point in the polls since then.

The other problem with the curve we had been using before is that, by being so slow to respond to changes in the polling data, it was causing us to adjust some of the previous polling results incorrectly. For example, it might have been taking a poll conducted 14 days ago and actually giving Obama a bonus point or two from it, when the more sensitive version of the trendline reveals that Obama’s numbers have been flat since then. In other words, the more slow-moving trendline, which was intended to be more conservative, was actually being too liberal about adjusting upward polls taken after most of Obama’s post-primary bounce had been realized.

A second, more technical adjustment to the trendline is that it now weights the daily datapoints based on the number of polls that were conducted that day. Before, a day on which just one poll came out had just as much influence on the curve as a day like 2/27, when SurveyUSA released polling in all 50 states. This idiosyncrasy has now been resolved.

The third adjustment is in the way that the trendline adjustment is attributed to particular states. The formula that we were using before was causing problems because the value of the dummy variables used to calculate our terndline adjustment are arbitrary except when taken relative to one another. The new procedure for calculating the state-by-state trendline adjustment is as follows:

1. For each state in which at least 5 polls have been conducted, we perform a regression of the polling results in that state relative to the LOESS trendline curve. Recent polls are weighted more heavily to place the emphasis on the current movement in the numbers. The coefficient produced by each state’s regression tells us how sensitive that state is relative to changes in the national numbers. For example, in New Hampshire the polls have been about three times as sensitive to national trendline changes as has the nation as a whole, whereas in Iowa there has been essentially no relationship between the polling in that state and the overall national trend.

2. We then take the coefficients produced in each state and regress those against a series of demographic and political variables to determine what exactly is triggering the changes. For example, right now the changes are mostly related to (1) states in which Hillary Clinton had a lot of support in the primaries; (2) states that have a lot of independent voters; (3) states with a high number of voters who identify their ancestry as ‘American’, which means states in Appalachia and parts of the South.

The results of this regression give us our ‘m’ parameter that tells us how to scale the trendline adjustment in each state. As before, m is capped at values of 0.0 and 2.0.

The spirit of the adjustment is exactly the same as it was before, but the results of the calculation appear to be more robust and intuitive than they were before. Obama’s numbers are adjusted upward sharply in states like Connecticut and West Virginia, which have not been polled since the primaries ended, because he has seen big movement toward him in similar states like New Jersey and Kentucky, respectively. But he isn’t assigned much of a bounce in, say, the Dakotas, because his polling in the Upper Midwest has been much flatter.

One implication of being able to do this calculation more precisely is that the model now sees Obama as having a slight excess of popular votes relative to electoral votes. He has gotten a big bounce in large, Clinton-leaning Democratic states like California and New York; perhaps he’ll now win these states by 20 points rather than 15. While that will help with his nationwide popular vote total, it will do little for him in terms of the electoral math.

Change to Mean-Reversion Adjustment

The mean-reversion adjustment, which takes points away from whichever candidate is leading in the national polls because there is a strong historical tendency for the polls to tighten before Election Day, had previously been taking an equal number of points away in each state. If it had calculated that Obama is likely to lose 2 points between now and November, for instance, is was simply lopping 2 points off his margin in each state.

The mean reversion is now state-specific, based on a variant of the procedure used to assign the trendline adjustment to individual states. In other words, we see which types of states and demographics have been most sensitive to movement in the national polling thus far, and use that to infer which states might be most sensitive going forward. In fact, the procedure used to calculate the state-by-state mean-reversion adjustment is identical to the one used to calculate the state-by-state trendline adjustment, with the exceptions that (i) the mean-reversion model does not weight recent movement more heavily, instead looking at the overall sensitivity of each state’s polling since February; (ii) because it does not necessarily follow that those states that have been most sensitive to national polling momentum in the past will continue to be so in the future, we hedge our bets by assigning only half of the mean-reversion adjustment on a state-by-state basis, with the other half being assigned equally to all 50 states.

Lastly, I have slightly tuned down the vote share assigned to third-party candidates by rerunning the regression used to determine this figure while excluding the 1992 and 1980 elections, the two years in which a third-party candidate was invited to participate in a nationally-televised debate. We are now assigning about 3.8 percent of the nationwide popular vote to third-party candidates rather than the almost 5 percent that we had assigned before.

FiveThirtyEight

Construction Season Over (Technical)

Comments