I mentioned in passing in last night’s post that surveys that use automated scripts rather than live interviewers — what are sometimes called ‘robopolls’ — have shown more Republican-leaning results this year. Given how sensitive forecasts can be to fairly minor variations in the polling, it is worth going into more detail on this.
One of the various adjustments that our models make is to identify and correct for “house effects” — that is, persistent differences in the partisan lean of polls issued by a particular research firm. If, for instance, a particular company’s polls are, on average, 4 points more favorable to Democrats than the consensus of pollsters in the states that they’ve surveyed, we’ll pull most (although not all) of those 4 points back out of their survey in calculating our polling average.
Our process for calculating house effects evolves somewhat over the course of an election cycle — there are things we can do now, when we have almost 4,000 polls in our database, that the data wasn’t robust enough to support earlier in the year. But the basic method is to fit a regression model with a lot of dummy variables, one set representing each pollster and another set representing each election in each particular month (for instance, polls of the Connecticut Senate race in September, 2010). Different voter groups (e.g. likely voters and registered voters) are treated as constituting separate polls — so, for instance, we have a Zogby-registered voters house effect and a separate Zogby-likely voters one. All House, Senate, gubernatorial and generic ballot polls are included; races that have been surveyed by more polling firms receive a larger weight in the calculation.
The other tricky bit is in figuring out what the “right” answer should be. For instance, say that Rasmussen Reports polls are 5 points more favorable to Republicans, on average, than polls from Siena College. Do we adjust the Rasmussen polls to match the Siena ones, or the other way around? The answer, of course, is somewhere in between. Specifically, we calculate a weighted average from all the polling firms in our universe, where the weights are based mostly on pollster quality. The idea is to estimate what our polling average would show in a particular state or district if every polling firm in our universe conducted an infinite number of polls there.
Looking specifically at the universe of likely voter polls (not polls of registered voters or adults), what sort of house effect do the automated polling firms have?
First, let’s look at the results for nonpartisan polls only. By “nonpartisan,” I mean excluding polls conducted on behalf of campaigns, campaign committees, or interest groups.
The chart below lists the house effect for all robopoll firms with at least 10 surveys in our database, plus the average house effect for both automated and live-interviewer polling firms. (These averages are weighted by the square root of the number of polls that each company has conducted. These averages include firms with fewer than 10 surveys, even though I haven’t listed their results in the chart.)
Most of the automated polling firms have a Republican-leaning house effect. For instance, it’s about 2 points for Rasmussen Reports (our estimate for Rasmussen includes polls conducted by its subsidiary, Pulse Opinion Research) and 4 points for SurveyUSA. Another automated polling firm, Public Policy Polling, has almost zero house effect. But some of the smaller robopoll firms, like Magellan and Merriman River Group, also have a Republican-leaning effect.
On average, the robopoll firms have a 2-point Republican-leaning house effect, whereas the live interviewer polls have a 0.7-point Democratic-leaning house effect. The difference between the two, then, is 2.7 points.
Even though there’s a fair amount of noise in our calculation of house effects, this difference is statistically significant at the 99.9 percent confidence level. As a check of robustness, I excluded the two most prolific robopoll firms, Rasmussen and SurveyUSA, from the calculation; the effect was still significant at about the 95 percent confidence level.
We can also perform a version of this calculation that includes the partisan and campaign polls.
This produces broadly similar answers, but the ‘robopoll effect’ becomes slightly larger: a difference of about 3.4 points between the automated surveys and the live ones.
This is problematic, frankly — particularly given that these effects were not very apparent in past years. Last year, for instance, while Rasmussen Reports had a slight Republican lean (as it often does), Public Policy Polling had a slight pro-Democratic lean and SurveyUSA played it right up the middle.
I also spoke with a Democratic official, who asked not to be identified, who told me that he had largely discounted using automated polls after discovering systematic differences between them and polls conducted by live interviewers. This official told me that the effects had emerged in mid-2009, and had an effect on the order of 1 to 5 points, depending on the state and the weighting techniques that were applied.
It could turn out to be that the robopolls are right and the traditional polling firms are wrong. Some of the automated polling firms, like SurveyUSA, in fact have quite strong track records. Also, there are some traditional polling firms with good track records, like Quinnipiac and Gallup, who have shown similar results to the robopolling firms.
Our forecasting models are essentially agnostic on this question. Our house effects adjustment, as I mentioned before, is calibrated to the consensus view of the broader universe of pollsters (although we give a larger voice in the consensus to the polling firms that have been most accurate in the past). It assumes that the “right” answer is somewhere in between what the automated surveys and the live interview polls are showing.
This results in a slight adjustment toward the Democratic candidate in most (although not all) states because the robopolling firms poll with much more frequency, and our models are designed to reward quality rather than quantity.
The adjustment would be more severe if we assumed that the live polling firms were right and the robopolls were wrong. I don’t think there is any basis for such an assumption until we have a chance to observe how the various firms perform this year.
If the robopolls turn out to be wrong, it will probably because of some combination of response bias (for instance, the robopolls are only getting the most enthusiastic respondents — who are almost certainly Republican this year — and are essentially overcompensating for the ‘enthusiasm gap’), and the failure of most automated polling firms to include cellphones in their samples.
Overall, this is another reason why people who are giving you overly confident answers about exactly how next Tuesday’s election is going to turn out ought to consider applying more caution.