I’m going to be making a small change to the model for the polling update that we’ll run in an hour or two today. Specifically, I’m going to increasing that the premium that the model places on the recentness of a poll. If this is the kind of detail that makes your eyes glaze over, just skip ahead to Sean’s Georgia overview. Otherwise, read on.
When I originally designed the model back in the spring, I designed it to be fairly tolerant of “old” polls — far more so than other aggregation sites like Real Clear Politics or Electoral-Vote.com. This was not an arbitrary decision; on the contrary, it was dictated by some empirical work I had done on state-level polling the 2004 and 2000 elections, which suggested that including some comparatively “old” polls produced a more accurate result than an RCP-type calculation in which polls are dumped from the average fairly quickly.
There are a couple reasons why I feel compelled to hedge a bit on this now:
Firstly, 2004 was an unusually stable election, relative to other elections in the past … the numbers just did not move that much, and when they moved, they did not move quickly. (2000 had roughly average volatility; this year appears to have about average volatility as well. To find a highly volatile election, look at something like 1992 or 1976). If the numbers are stable from period to period in a certain election, then recentness will not be all that important. However, since there is reason to believe that the 2004 election was in some ways atypical, and since at least half of my state-level datapoints were from the 2004 election, this presents an argument for ramping up the premium on recency.
Secondly, it seems likely that one of the reasons why my analysis of the 2000 and 2004 elections was finding it useful to include some “older” polls was because of house effects. Take, for instance, Quinnipiac and Mason-Dixon. Both of these are strong, smart polling firms. However, Quinnipiac polls have been on average about 4 points more favorable to Barack Obama than Mason-Dixon polls. So if Quinnipiac sees Pennsylvania as say an Obama +10 state, Mason-Dixon will probably see it as an Obama +6.
This can create problems if you’re using an RCP-type average — one which places a heavy premium on recency — because the numbers can float upward or downward based on which polling firms happen to cycle into and out of the average. For instance, say that a Mason-Dixon poll comes online one day, and the same day a Quinnipiac poll drops out. It may look like a state has “moved” toward McCain when in fact you’re just seeing an Obama-leaning pollster replaced with a McCain-leaning pollster. This is not optimal. If, however, we can detect and adjust for house effects — and we do — this is less of a concern.
Thirdly, this adjustment “feels” right, and when you’re dealing with a complex system like our model of the election, I’m a big believer that you trust your gut when in doubt.
The formula that I’m using to adjust for the recentness of a poll is the one that I developed for our senate polling averages. When analyzing senate data, I found that you did in fact do best by increasing the premium on recency as you got closer and closer to the election. (In may not be surprising that this effect turned up in senate data but not in presidential data, since in any year of senate polling, you have a couple dozen races to look at which behave fairly independently from one another, whereas in a presidential election, you really have just one election to look at with 50 manifestations across the different states. The senate data is probably the more robust data set, in order words). The specific formula that I developed for the senate data is as follows…
H = 14 + D * .188
…where ‘H’ is the half-life of a poll, and ‘D’ is the number of days remaining until the election. As we approach the election, this number will approach 14, meaning that polls will have a two-week half life; that is, a poll conducted two weeks ago will be given half the weight of a poll conducted today. (Keep in mind that this is not the only way that we keep our data fresh; we also adjust ‘old’ polls on the basis of our trendline adjustment).
*-*
As you’ll see in a bit, it turns out that this adjustment does not make all that much difference. It brings John McCain a little bit closer in Pennsylvania (but certainly not a lot closer), and Obama a little closer in a couple of states like North Dakota. Still, every little bit of accuracy helps, especially when we’re dealing with highly sensitive calculations like our tipping point states.