A Technical Note

Some of you may want to bypass this post, but in the interests of full disclosure:

I’ve made a couple of improvements to the regression model that underlies the analysis. The first adjustment is to weight the regression based on the depth of polling data that we have in a given state. Without this weighting, the regression would treat a state like Wyoming, where we have just one poll, as having as much influence over the model as a state like Pennsylvania, where we are already approaching a dozen. Among other things, this should allow the model to “read-and-react” more quickly to new polling data.

The second improvement is to consider a couple of new variables in the analysis: the percentage of the 2004 electorate that identified themselves as Democrat, Independent, and Republican in 2004, according to CNN exit poll data. Obama does comparatively worse in states where a larger share of John Kerry’s vote came from self-reported Democrats, and better where more of his vote came from Republicans and Independents. This is consistent with a finding from the recent Pew Poll, which shows Obama losing more self-identified Democrats to McCain than Clinton does, but getting a larger fraction of the vote from Republicans and Independents. This tends to give the model more confidence in Obama’s polling lead in a state like New Hampshire, which has a huge number (44% of the electorate) of self-reported Independents, while harming him in a state like West Virginia, where just 18% of the electorate identify as Independent (but 50% identify as Democrat).

The overall effect of these adjustments is to slightly hurt Obama’s win percentage, as he loses a few percentage points in industrial states like Pennsylvania that have relatively few independents (20% Independent, 41% Democrat in 2004). Clinton’s numbers have moved up a tiny bit.

To get even more technical, the way the regression model is programmed is to consider eight potential variables:

  1. The Kerry-Bush margin in 2004.
  2. The percentage of Baptists (all Southern Baptists, plus 1/2 of non-Southern Baptists)
  3. Obama fundraising (dollars raised per 2004 general election voter)
  4. Clinton fundraising (” “)
  5. McCain fundraising (” “)
  6. The percentage of African-Americans in the population
  7. The percentage of self-identified Democrats
  8. The percentage of self-identified independents

The model discards any variables that are not statistically significant at the 85% confidence level and retains the rest. The variables presently included in the regression model are as follows:

                   Obama               Clinton
Variable Coeff. t-score Coeff. t-score
Kerry .549 6.20 .714 13.65
Baptist -.261 -2.74 .381 4.00
$_Obama 6.708 3.60 DROPPED
$_Clinton DROPPED 4.627 3.82
$_McCain -9.421 -2.92 -6.236 -2.34
Af-American DROPPED -.173 -1.56
Dem% -.560 -3.09 DROPPED
Constant 24.561 3.78 -3.282 -2.77

Nate Silver is the founder and editor in chief of FiveThirtyEight.