We know even less than we think (a methodological note)

Way back when I started this, I identified three sources of error that might make election returns different from the polls: sampling error, state-specific movement, and national movement. The latter two types of error can be grouped together, and might not really be thought of as error at all; they’re an inevitable consequence of surveying an election days, weeks or months before it occurs.

Then there is sampling error, which is intrinsic to the business of doing surveying. The problem is that actual errors are never as good as what the official margin of error advertises. There is a fourth (or if you prefer, second) source of error, which is methodological error introduced by the pollster. Conceptually:

We are of course aware of methodological error; that is why some pollsters get higher ratings than others.. However, we had not previously been accounting for it in our simulations; we were accounting for it in our averages, but not in the probability distributions around those averages. Beginning now, however, I am now modeling the error term we use in the simulations based on real, historical data. This means the simulations will now account for methodological error in addition to sampling error.

In plain English: we have introduced more uncertainty into our simulations. The practical upshot of this is that the state-by-state win percentages are regressed more toward the mean, particularly in states with limited polling data (although, this issue doesn’t go away even if we have a large amount of polling, because the error terms tend to be correlated with one another across different survey). The overall effect of this adjustment is negligible, but it does boost the win percentage by a point or two for the losing candidate in each state. (So, Obama’s win percentage goes up a little bit in Alaska, for instance, but goes down in Delaware).

FiveThirtyEight

We know even less than we think (a methodological note)

Comments