On Wednesday, Pew Research issued a study suggesting that the failure to include cellphones in a survey sample — and most pollsters don’t include them — may bias the results against Democrats. Pew has addressed this subject a number of times before, and in their view, the problem seems to be worsening. Indeed, this is about what you might expect, since the fraction of voters who rely on cellphones is steadily increasing: about 25 percent of the adult population now has no landline phone installed at all.
Clearly, this is a major problem in survey research — and one that, sooner or later, every polling firm is going to have to wrestle with. What isn’t as clear is how much of a problem it is right now.
I have written about this in the past, and I encourage you to review those articles. But let me try and come at it from a couple of fresh directions.
1. The results of the Pew study — which suggests that the effect of the failure to include cellphones may result in a 4-point bias against Democrats on the generic ballot — cannot necessarily be extrapolated to other polling firms. It should go without saying that the failure to include cellphones in your sample will lead you to miss certain types of voters. The characteristics of the cellphone-only population — young, urban, often people of color — are also characteristics of voters who tend to have more liberal political views. Therefore, a sample that relies solely on calling landlines will usually find too few Democrats.
This fact alone, however, will not necessarily bias the survey, because pollsters generally do not show their findings in raw form — instead, they publish it after it’s been through the wringer, and some form of demographic weighting has been applied. If you don’t have enough young voters in your sample, for instance — which you won’t if you don’t include cellphones — you can can count the ones you do get at two or three times the usual weight.
Indeed, pollsters had to contend with problems like these long before the advent of cellphones. It’s always been harder to get men on the phone then women, younger people than older people, blacks and Hispanics than whites. Up until now, demographic weighting has usually been up to the task: we have not seen any overall deterioration yet in the accuracy of horserace polls.
What Pew is suggesting, however, is that bias remains even after demographic weighting is applied. Even when one controls for the usual demographic variables, cellphone users still tend to be more Democratic-leaning than landline users who share most of the same characteristics. This is potentially a very problematic conclusion for the polling industry.
The caveat, however, is that weighting techniques vary a lot from pollster to pollster. Some weight based on party identification, but most don’t; some weight by economic characteristics like income, and others don’t; some calibrate their samples to exit polls, and some to Census Bureau data; some use cluster sampling or geographic weighting, while others take a pass.
What seems reasonably clear is this: If you use the types of weighting techniques that Pew uses, you’ll have a lot of problems if you don’t include cellphones in your sample. But this won’t necessarily hold if you use a different set of weighting methods. SurveyUSA, for instance, which conducts surveys by automated script (i.e. “robopolls”), attempted a similar study and found little evidence of bias.
With the stipulation that SurveyUSA — which, unlike Pew, does not ordinarily include cellphones in their samples — has spent considerably less time studying the issue, here’s what I think might be going on. Firms like SurveyUSA — and certainly Rasmussen Reports, which takes every imaginable shortcut to produce polling as cheaply as possible — are the equivalent of junk-ball pitchers in baseball. Their “stuff” might not be much, but they’re pretty used to working with it. Pew, by contrast, is like the classic fastball/curveball pitcher with perfect mechanics. They do everything by the book, and they do it very well.
Sometimes, however, it’s the classically-trained pitchers — Cubs fans might recall Mark Prior — whose performance suffers the most when they are injured: they aren’t used to working under adverse circumstances, whereas the junk-ball pitcher is accustomed to getting by on guile and gumption. The failure to include cellphones should in theory be problematic for a firm like Rasmussen Reports — but then again, so, in theory, are a great number of other things that they do. So far, Rasmussen has found a way to produce decent results anyway. Pew, by contrast, whose by-the-book approach has more upside potential when things are going well, may be less well adapted to dealing with a deterioration in data quality.
Now, this analogy only goes so far — and besides, in the long run you’d certainly rather have a pitching staff full of Mark Priors than Jamie Moyers. But if Rasmussen Reports’ weighting procedures weren’t already attuned to handling poor raw data, they’d have been out of business a long time ago.
This is not to endorse the practices that Rasmussen Reports uses. But the failure to include cellphones may affect that firm in a different way than it would a firm like Pew. So one cannot assume that the 4-point effect that Pew found would hold across the board.
2. The other natural way to analyze the “cellphone problem” — comparing results from firms that include cellphones with those that don’t — is also problematic. I’ve made some back-of-the-envelope attempts to address this issue in the past, and have uncovered some hints of bias. A proper academic-quality study, however, would be more difficult than it might seem.
The issue is that the firms that include cellphones in their samples — like Pew, Gallup, the Washington Post, and The New York Times — also tend to have other characteristics that differentiate them. For instance, all use live interviewers, not robopoll technology. They tend to conduct surveys over several days and to work to “convert” people who initially decline to respond; that costs more, but it improves response rates. They may also use different likely-voter models.
A proper study would need to isolate the impact of including cellphones from the impact of these other factors — and that could be quite difficult. For instance, different likely-voter models are evidently producing some very different results this year. One set of techniques — employed by Gallup and some other companies like Pew and The Washington Post — assumes that overall turnout in the electorate is roughly fixed. (I’m not a big fan of this assumption, but that’s a subject for another day.) That approach seems to be projecting an especially large discrepancy in turnout between the major parties, sometimes 10 points or more in Republicans’ favor. Other models — like the one that The New York Times uses — imply a smaller turnout gap, more like 4 or 5 points, which would be fairly standard for a midterm election.
Perhaps you see the problem here: If the likely-voter model you choose can make a 5-point difference in the results you obtain — while having cellphones in or out might make, say, a 2-point difference — it will be hard to isolate one effect from the other. Gallup, for instance, does include cellphones, but so far their likely-voter model has shown horrible results for Democrats anyway.
I don’t mean to suggest that a study along these lines would be fruitless — but it would need to be constructed carefully to account for these effects.
3. The cellphone issue is part and parcel of some broader and more overarching matters — like deteriorating response rates and the efficacy of “robopolls.” As I’ve mentioned, firms that use automated scripts almost never include cellphones — but they also follow any number of other protocols that tend to lower response rates. So far they haven’t seemed to be much affected, and some of these companies, like SurveyUSA, have produced highly accurate results in the past.
This year, however, the robopoll companies’ results are somewhat different from the live-interviewer polls, something that had not been true in the past. This probably requires a fuller study, but from what I can tell, the effect is somewhere on the order of 2 to 4 points over all, with the robopoll results tending to favor Republicans. That could reflect the cellphone problem, or it could reflect a number of other things.
What we don’t know yet is whether than 2-to-4-point effect indeed reflects bias — in other words, whether it is making the robopollsters’ results worse or better. We can’t know that until Nov. 2. It could be that the robopollsters are getting it right and the traditionalists wrong; it could also be that the robopollsters end up with the “right” results, but for the wrong reasons. This will need to be tracked carefully over the next election cycle or two.
4. The polling in this election could underestimate the performance of Democrats as easily as it could Republicans. Sometimes you get the impression from other analysts that the polls will either be on the money on Nov. 2, or will err low on the Republican side. But there are also reasons to think that the polls may be overstating Republican performance, and the cellphone issue could be one of those reasons.
Do FiveThirtyEight’s forecasting models account for this? To some extent. Our models assume there is a chance that the polling will be uniformly biased in one or another direction, as it has been in some elections in the past. If you look at the probability distributions that we generate, some of the more “extreme” outcomes — say, Republicans winning 75 or 80 seats in the House — result from these sorts of effects, rather than the party just happening to eke that many victories based on local factors.
We also have procedures for calculating what we call a “house effect” for each pollster. One thing we’ve found this year is that the more prolific polling firms tend to show more Republican-leaning results than those who poll more sporadically, like local and regional pollsters — or like companies that poll only the generic ballot and not individual races. (Some of the regional pollsters are very good, and some of them are now dialing cellphones.)
Our polling averages account for this property, since they are calibrated based on a very broad consensus of pollsters, rather than deferring to firms that release 20 surveys a week. Of late, the net impact of this technique has been to produce a very small hedge toward the Democrats relative to the raw polling average in a typical state — on average, something like a single point — although the impact varies from race to race.)
The intent of this procedure is not to serve as a “cellphone adjustment” per se — I don’t think there is sufficient evidence, yet, for doing something quite so explicit. But we feel our projections are made more robust by looking toward the broader universe of polling firms, rather than just those that happen to be active in a particular state.
I would expect us to become more aggressive in future years, however, if further evidence emerges that the failure to include cellphones is systematically biasing polling results. Pew’s study should at the very least wake up those in the industry who think the problem can simply be ignored.