What’s Wrong With This Picture? (a.k.a. Nate the Poll Nazi Strikes Again)

From the latest IPD/TIPP poll:

That’s right … IBD/TIPP has John McCain ahead 74-22 among 18-24 year olds. Who knew the kids were groovin’ on J-Mac these days?

IBD/TIPP puts an asterisk by this result, stipulating that “Age 18-24 has much fluctuation due to small sample size”.

Indeed, there may be some fluctuations when looking at small subgroups like these. That’s why I generally don’t pick on a poll if, say, it has John McCain winning 18 percent of the black vote when he’s only “supposed” to be winning 7 percent or whatever. Fluctuations of that magnitude are going to be relatively common, mathematically speaking. In fact, they’re entirely unavoidable, if you’re taking enough polls and breaking out the results amongst enough subgroups.

But fluctuations of this magnitude are an entirely different matter.

Suppose that the true distribution of the 18-24 year old vote is a 15-point edge for Obama. This is a very conservative estimate; most pollsters show a gap of anywhere from 20-35 points among this age range.

About 9.3 percent of the electorate was between age 18-24 in 2004. Let’s assume that the percentage is also 9.3 percent this year. Again, this is a highly conservative estimate. The IBD/TIPP poll has a sample size of 1,060 likely voters, which would imply that about 98 of those voters are in the 18-24 age range.

What are the odds, given the parameters above, that a random sampling of 98 voters aged 18-24would distribute themselves 74% to McCain and 22% to Obama?

Using a binomial distribution, the odds are 54,604,929,633-to-1 against. That is, about 55 billion to one.

So, there is an 0.000000002% chance that IBD/TIPP just got really unlucky. Conversely, there is a 99.999999998% chance that one of the following things is true:

(i) They’re massively undersampling the youth vote. If you only have, say, 30 young voters when you should have 100 or so in your sample, than the odds of a freak occurrence like this are significantly more likely.
-or-
(ii) Something is dramatically wrong with their sampling or weighting procedures, or their likely voter model.

My guess is that it’s some combination of the two — that, for instance, IBD/TIPP is applying a very stringent likely voter model that removes you from the sample if you haven’t voted in the past two elections, which would rule a great number of 18-24 year olds out.

A pollster could get away with a turnout model like that in 2004 (when IBD/TIPP did well in estimating the national popular vote), when the split in the youth vote was relatively small between John Kerry and George W. Bush. They can’t get away with that this year, when the split is much larger.

But the basic takeaway is this: you should absolutely not assume that just because someone has published a poll, they have any particular idea what they’re doing. Pollsters should be treated as guilty until proven otherwise.

FiveThirtyEight

What’s Wrong With This Picture? (a.k.a. Nate the Poll Nazi Strikes Again)

Comments