Let me comment on a bit more length on the so-called “cellphone problem” — the fact that many voters are unreachable to pollsters whose samples consist of landline numbers only. This may have some relevance in explaining the Rasmussen results today in Ohio which showed John McCain with a fairly large lead.
The basic issue with cellphone-only households is that their incidence is not distributed evenly throughout the population. Minorities are more likely to be cellphone-only than whites, and men are more likely to be cellphone-only than women. But the most important differences are in terms of the age of the voter.
The below is data compiled by the Centers for Disease Control on the number of cellphone-only adults by age cohort. Actually, it is not just cellphone-only adults — the CDC also tracks another category which I call “cellphone-mostly” adults. These are people that have a landline, but also have a mobile phone, and use their mobile phone to receive most or all of their calls. I know, personally, a lot of people who fall into this category: they may use their landlines only to make local calls, only to connect to the Internet, only as an emergency in case their cellphone service is down, and they may have the service only because it came bundled with their cable or wireless package. If their friends and family are in the habit of calling them on their cellphones, they may be very suspicious of calls coming into their landlines — assuming that they are likely to be from telemarketers — and not make a practice of answering them.
Table 1. Cellphone-Only and Cellphone-Mostly Adults by Age Cohort
As you can see, fully half of all adults under the age of 30 fall into the cellphone-only or cellphone-mostly buckets, and the number is growing every day. About a third of adults aged 30-44 are cellphone-only or cellphone-mostly, and then the numbers trail off once adults pass the midpoint of their lives.
Obviously, if polling firms did not weight by age, this would be an utter disaster for any election in which preferences vary significantly by age. Suppose for example that the following represented the true distribution of the likely voter population in Big Industrial State:
Age %/LV Obama McCain
18-24 10 69 31
25-29 10 60 40
30-44 30 50 50
45-64 35 46 54
65+ 15 40 60
TOTAL 100 50 50
These numbers have been ‘rigged’ such that each of Obama and McCain receive exactly 50 percent of the vote. Suppose, however, that we exclude cellphone-only and cellphone-mostly voters from our sample, according to their proportions in the CDC data. What you’d instead wind up with is the following:
Age %/LV Obama McCain
18-24 7 69 31
25-29 6 60 40
30-44 28 50 50
45-64 39 46 54
65+ 20 40 60
TOTAL 100 48.5 51.5
What ought to have been a tie instead turns into a 3-point lead for John McCain. (And keep in mind that the numbers in this example are hypothetical — but they probably look something like this).
Pollsters can get around this problem by weighting groups that are likely to be cellphone-only more heavily — in particular younger voters. This is what nearly all smart pollsters do, and it is considerably better than the alternative of not weighting at all. However, it creates a couple of additional problems.
The first and more commonly-discussed problem is that the cellphone-only voters may not be the same as their landline counterparts, even once we control for age and other variables like race and gender. Urban voters are about 50 percent more likely to be cellphone-only than rural voters, for instance, and while some pollsters weight by geography, others do not. Thus, you may wind up with a biased sample.
But even if the sample were unbiased — the pollster is smart enough to figure out how to balance all the weights properly — what you’re still doing in effect is to magnify the importance of sampling error. Suppose that a pollster wants to sample 500 likely voters in a state. Roughly speaking, about 20 percent of these — 100 of them — are likely to fall into the 18-29 age range. But, about half of those voters can’t be reached because they are cellphone-only or cellphone-mostly. So your effective sample size for this subgroup is 50 voters, which carries a margin of error of +/- 14 points. Sometimes, the luck of the draw will come through for you and you’ll wind up with a pretty good sample, but other times you’ll be pretty far off.
If you are not fortunate enough to wind up with a good sample, what you are going to wind up doing is compounding your problems, because you have to weight all the young voters that you do sample more heavily to make up for the ones that you can’t reach because they depend on cellphones.
So what you should get in the habit of doing, where such information is available, is to check the cross-tabs for groups that are known to have problems with non-response bias — by which I mean check them for younger voters because of the cellphone-only problem. If the pollster was unlucky and wound up with a poorly-representative sample of such voters, it may skew their overall results, as such responses wind up being weighted more heavily.
Is this an issue with the Rasmussen poll in Ohio? Actually, it may be. The poll has McCain leading 50-39 among voters aged 18-29, and 67-33 among voters aged 30-39. Obama leads 55-36 among voters in their 40s, and then McCain leads by single-digit margins among voters aged 50 and up. Such an age distribution is inconsistent with most other polling that we have seen in this election.
This does not mean that Rasmussen screwed up. This problem has nothing to do with Rasmussen; it is common to all pollsters that don’t include a cellphone supplement, which means all pollsters except Gallup and Selzer. These pollsters are trying to do everything they can to work around a vexing problem — that about half the young voters they might want to sample can’t be reached, and that they are stuck with small sample sizes of such voters as a result. But it does mean that, if there is greater error in their sample of young voters, it will lead to greater error in their poll as a whole.