We know more than we think (Big Change #2)

The other major change to our methodology (which I am surprised nobody guessed in the teaser thread) is that we are now making adjustments to the results of all states based on a time trend.

One of the problems with our previous way of doing things is that polling data tends to roll in at different times in different states. Both state and national polls conducted since the conclusion of the Democratic nomination process have reflected a bounce of a few points for Barack Obama. For example, we know that Barack Obama has experienced a bounce in his polling results in states like Wisconsin, Michigan and New Jersey, as well as in both the Rasmussen and Gallup national tracking polls. It would be naive to assume that Obama won’t also experience a bounce in other states like Pennsylvania and Ohio where new polling data has yet to come out. However, we’ve had no way to account for these changes in states where the polling data is not fresh.

Our objective, then, is to infer what is likely to happen in states where we don’t have fresh polling data based on those states where we do. In order to make such an inference, I apply a four-step process. A version of this process was suggested to be by Professor Robert Erikson of Columbia University, who has spent his lifetime studying polling and public opinion, and who is also a family friend.

Step 1: All polls are placed into groups based on (i) the week of the election; and (ii) the state-pollster unit. A state-pollster unit is a combination of a particular state and a particular pollster; for example “Alabama-SurveyUSA” or “New York-Quinnipiac”. The current week is defined as having begun seven days before the current date, with weeks progressing backward from there to the start of the calendar year 2008. One very important note: we treat national polls as a “state”. For example, there are units for “USA-Rasmussen Tracker” and “USA-Gallup Tracker”. One of the most useful elements of national polls, and particularly national tracking polls, is that they provide a robust baseline for measuring changes in candidate support. We do not include national polls directly in our averages. We do use them, however, to help infer trends, which in turn can inform our state-by-state projections.

Step 2: We run a linear regression with a large number of dummy variables. Specifically, we include one dummy variable for each week, and one dummy variable for each state-pollster unit. The coefficients of the weekly dummy variables give us an inkling of a time trend. Specifically, the time trend looks like this:

Let me explain exactly what is going on here. Suppose that in that in Week 15, Rasmussen shows Barack Obama 6 points ahead in Minnesota. Then, in Week 22, it shows him 9 points ahead in Minnesota. This is a piece of information implying that Obama’s standing was 3 points better in Week 22 than in Week 15. If we apply this process to all state-pollster units, we get quite a lot of information about in which way the polls are changing. That’s all that this process is doing. It’s taking the changes that we see in each poll where we have a baseline for comparison, and inferring an overall time trend based on those changes.

Step 3: The time trend is smoothed by means of a LOESS regression. You probably don’t think you know what a LOESS regression is, but if you’ve ever been over to Pollster.com, you have seen one. A LOESS regression is way to create smooth curves through time series data. In our case, that curve looks like this:

When running a LOESS regression, one may choose a “smoothing parameter” that determines how sensitive the regression line is to changes in the data. I use a fairly conservative smoothing parameter, tending toward a smoother rather than a jerkier curve. Nevertheless, we can make out a few fairly clear trends. Obama’s numbers surged in February, when he was winning one primary after another. They slumped in March and early April, as stories like bittergate and Jeremiah Wright dominated the landscape. They have since been gradually improving, but particularly so in the last two weeks since he wrapped up the nomination.

Step 4: Polls from previous weeks are adjusted to match the LOESS estimate from the current week. For example, our LOESS regression line tells us that an average poll in the current week has been about 2.5 points stronger for Barack Obama than a poll in the week ending 5/17. Thus, the Quinnipiac poll of Florida taken on 5/17, which showed John McCain ahead by 4 points, is treated as though it had shown McCain ahead by 1.5 points (i.e. 2.5 points better for Obama). The idea, simply put, is to make all old data match the current polling landscape.

* * *

From there, everything proceeds as it always has. We still run a demographic regression, although it is based on the trend-adjusted polls rather than the original ones. (Also, I am now referring to our result in each state as a “projection” rather than an “average”, as that nomenclature is more consistent with our process.

This adjustment presently results in an increase of about 2 points in Barack Obama’s projected popular vote margin. Because a large number of states in this election are very close, this results in a somewhat dramatic-seeming change in Obama’s win percentage and electoral vote projection. Interestingly, Obama’s current win percentage of 64.7 percent almost exactly matches the price of Democratic contracts on Intrade, which also has the Democrats with a 64 percent chance of winning the election.

FiveThirtyEight

We know more than we think (Big Change #2)

Comments