Pursuant to the tease I put out this morning, I’ve been doing some further investigation into the regression model, and identified a variable that has some pretty interesting effects: a state’s educational level, as measured by the average number of years of completed schooling per adult, according to the US Census Bureau.
As you can see from the graphs, there is essentially no relationship between a state’s educational level and Clinton’s performance in the polls; there is a rather strong relationship for Barack Obama. But it’s actually Clinton’s graph, rather than Obama’s, that’s unusual relative to how things have gone in the recent past. If you drew John Kerry’s graph in 2004, it would look much more like Obama’s than Clinton’s.
Put differently, relative to John Kerry, Clinton performs worse in highly-educated states, and better in poorly-educated states. This turns out to be one of the more significant variables in her regression model; in fact, if you look JUST at the Kerry vote and education levels for Clinton, you do very nearly as well as if you consider all the other variables that our model evaluates.
Electorally, this is a bit of a wash for Clinton, but it might require a somewhat different allocation of resources than a Democrat like Obama would use. Following are the most and least educated states:
Average Years of Educational Attainment
Adults >=25
1. Colorado 13.46
2. Massachusetts 13.39
3. Maryland 13.37
4. New Hampshire 13.36
5. Vermont 13.33
6. Connecticut 13.31
7. Washington 13.28
8. Minnesota 13.25
9. Alaska 13.18
10. Montana 13.14
50. West Virginia 12.16
49. Kentucky 12.21
48. Mississippi 12.24
47. Arkansas 12.25
46. Louisiana 12.32
45. Texas 12.38
44. Alabama 12.42
43. Tennessee 12.43
42. South Carolina 12.52
41. Nevada 12.60
So this helps to explain why, for instance, Clinton has struggled in the polls in Colorado and New Hampshire, and why McCain has been close to her in a couple of polls of Connecticut. The regression model now has more confidence in the polls in those states. On the other hand, we can also see why Clinton has polled relatively well in states like West Virginia and Tennessee, and there may be opportunities for her there.
By the way, once we include the education variable, the Southern Baptist variable (which is changing to ‘Evangelicals’ — I’ll explain in a moment) drops out of Clinton’s equation. It appears that Clinton does not have a particular advantage (relative to your usual Democrat) either in the South, or among evangelicals. Instead, she does better among low-education voters than most Democrats usually have, and there tend to be more of these voters in the South. But in a state like Georgia, which is significantly better educated than most of its neighbors, Clinton has performed quite badly in the polls.
I also tested whether its income levels, rather than educational levels, that appear to be the driving force behind this. It isn’t. When both variables are included, the education variable remains highly statistically significant for Clinton, while the income variable drops out.
—
So long as I was including the education variable, I did a little bit of additional maintenance on the regression model:
1. Firstly, the variable for Southern Baptists was replaced with a variable for evangelicals. The Southern Baptist variable was always a little bit of a mess, as it was a hybrid of two different estimates of religious population. But I came across some very good, reliable-looking data on the number of evangelicals from the Association of Religion Data Archives (it’s worth a few minutes of your time to explore the site). In addition to being a bit ‘cleaner’, this variable also turns out to have a slightly stronger relationship with Obama’s polls than the old Southern Baptist variable (as I stated above, neither variable is significant in Clinton’s model once we account for educational levels). Of note: the ARDA does not consider predominantly black churches in its definition of Evangelicals; these are white evangelical Protestants.
2. The two variables ‘Democrat’ and ‘Independent’, which represent party identification in 2004 CNN Exit Polls, were replaced with one variable, ‘Partisan’, which represents the percentage of Democrats less the percentage of Republicans (so in Arkansas, where 41% of the electorate identified themselves as Democrat and 31% as Republican, the partisan index is 41-31 = +10%). This is a cleaner way to do things, as the two variables we had before (‘Democrat’ and ‘Independent’) were somewhat intercorrelated; there doesn’t appear to be any such thing as an ‘independent spirit’ that causes states with a high proportion of independents to behave differently from their overall party leanings. Hillary’s support is oriented slightly more strongly around this partisan axis than John Kerry’s; Obama’s is oriented significantly less so.
3. (Very, very technical). The program is now performing a proper stepwise regression instead of just wiping out a whole bunch of variables at once; there was no reason we weren’t doing this before, but it took me a bit of time to figure out the programming. Also, we’ve slightly lowered the threshold wherein we include a variable from 85% statistical significance to 80% statistical significance.
The site will be updated momentarily with the revised regression model included. Clinton appears to do very slightly better with the new model; there is no real change for Obama.
UPDATE:
For Obama, the parameters currently included in the regression are as follows:
Variable Coeff t-score
$_Obama 8.13 4.35
Kerry 0.59 4.22
Evangelical -0.34 3.47
Partisan -0.36 2.41
$_McCain -7.11 2.17
$_Clinton -2.52 1.65
Constant 3.17 1.63
Dropped: Education (highly correlated with Obama fundraising), AfAmerican.
For Clinton, the parameters are:
Variable Coeff t-score
Kerry 0.63 7.47
Education -7.37 3.61
$_McCain -3.60 1.54
$_Clinton 1.33 1.36
Partisan 0.14 1.33
Constant 94.63 3.59
Dropped: AfAmerican, Evangelical, $_Obama.