UPDATE (Sept. 10, 2:15 p.m.): After conversations with SurveyMonkey, we’re convinced that the critique you’ll read below doesn’t apply well to their recent 50-state poll, which instead is more analogous to 50 separate surveys in each state. Here’s what we mean by that. First, SurveyMonkey weighted each state separately, using only data from that state. And second, they took several measures to verify their respondents’ location, such as asking for their ZIP code in addition to the state in which they’re registered to vote. If a survey passes these two tests, we’ll treat it as we would a regular state poll. If not, we’ll still include it in our averages but with a lower weight, as described below.
It sounds like a riddle of sorts: Is one giant poll of all 50 states the same thing as 50 small polls, one for each state, added together?
If this seems like an odd question, it’s because it hadn’t really come up before this year. Sure, technically speaking, any national poll is composed of interviews from all 50 states. For instance, we’d expect a 1,000-person national poll to include about 100 respondents from California, 30 from Virginia, and 5 from Idaho, assuming that the number of people interviewed in each state was roughly proportional to turnout in 2012. But pollsters almost never report those state-by-state breakouts in the same way they do other sorts of demographic splits. That’s probably for good reason: The margins of error on those subsamples would be astronomical for all but the most populous states.
But what if instead of using a sample size of 1,000, your poll interviewed 50,000 people? Now you’d have around 5,000 respondents from California and 1,500 from Virginia — more than enough to go around. Even your Idaho sample size — about 250 people — is semi-respectable.1
Several online pollsters are now doing this, interviewing tens of thousand of people nationally per week or over the course of several weeks, as part of their national polling. And they’re increasingly reporting their results on a state-by-state basis. SurveyMonkey, Ipsos and Morning Consult have all released 50-state surveys, projecting the outcome in each state along with the overall Electoral College result. Google Consumer Surveys, which interviews around 20,000 people per week, has a crosstab showing their state-by-state results.
FiveThirtyEight has been using the state-by-state results from SurveyMonkey and Ipsos in its forecasts, and we’re in the midst of incorporating the data from Morning Consult and Google. (This has already attracted a fair amount of attention; Donald Trump’s campaign erroneously attributed Ipsos polls of Ohio and Iowa to FiveThirtyEight.) Which brings me back to my earlier question: Is a 500-person subsample of Colorado voters from a 20,000-person national poll the same thing as a 500-person poll that was dedicated to Colorado, specifically?
After thinking and researching my way through the problem, my answer is that these polls aren’t quite the same. The Colorado-specific poll is likely to provide a more reliable estimate of what’s going on in that particular state. And it deserves a higher weight in our model as a result.
One reason to give the 50-state technique a lower weight is that hasn’t really been empirically tested. There have been cases in the past where pollsters commissioned simultaneous polls of all 50 states — surveying 600 voters in each state, for example — but for reasons I’ll explain in a moment, that’s potentially different from commissioning a huge national poll and reporting the results of state-by-state subsamples.
One potential source of error has to do with demographic weighting. Polls of all kinds engage in extensive demographic weighting because people aren’t equally likely to respond to polls Typically, for instance, white voters are more likely to respond to telephone polls than black voters. Pollsters attempt to counteract this by giving extra weight to the black voters they reach until the demographics of their poll matches that of Census data or other reliable sources.
But establishing these weights is not easy because voters are not monolithic within these demographic groups. White voters in Oregon are much more likely to vote Democratic than white voters in Mississippi, for instance. If you’re taking a poll just of Oregon or Mississippi, you’ll optimize your demographic weights to match the makeup of those states specifically. But if you’re conducting a national poll that includes interviews from Oregon and Mississippi along with the other 48 states, you might not pay as much attention to how the results shake out in individual states. Perhaps you’ll overestimate the Democratic vote in Mississippi, where whites are especially conservative, and underestimate it in Oregon — but those differences will likely cancel out in the national result.
Another potential problem is misidentifying the state a poll respondent votes in. With online polls, the problem is that IP addresses aren’t 100 percent reliable — for instance, a website would think I’m in Connecticut right now because that’s where ESPN’s internet connection is based, even though I’m writing from the FiveThirtyEight office in New York City. Someone filling out an online survey at their office in Washington, D.C., might actually live in Virginia or Maryland. With telephone polls, the issue is that people carry their mobile phone numbers around when they move from state to state, making it harder to identify a voter’s residence based on her phone number alone.
Pollsters spend a lot of time thinking about problems like these when they’re conducting surveys of a particular state, and they can employ some good workarounds (for instance, asking the voter where they’re registered to vote). But in a national poll, the pollster doesn’t need to be as precise. If you misidentify me as a Connecticut voter when I’m really registered in New York, that won’t affect the topline margin in the national poll, even though it could skew the Connecticut and New York results.
We’ve noticed, anecdotally, that the 50-state polls sometimes produce weird results in cases like these, in states that are either demographically idiosyncratic (such as Mississippi) or in small states (such as New Hampshire) where the sample is potentially contaminated by voters from another state. But this is FiveThirtyEight, and we’re not big on anecdotes. So we looked at the best predecessor for the 50-state polls that we could find: results from the 2012 Cooperative Congressional Election Study (CCES), a project conducted jointly by the online pollster YouGov and a consortium of universities. In 2012, the study surveyed around 50,000 voters, asking them about their presidential vote along with a long battery of demographic and political questions.
This is an incredibly useful dataset that we use all the time at FiveThirtyEight. But what if you use the CCES to estimate the presidential vote in each state? To be clear, this is not a use the authors of the CCES necessarily intended or would recommend. But it’s the closest approximation I could think of for what Ipsos or SurveyMonkey are doing with their 50-state surveys.
It turns out that the CCES didn’t produce very reliable estimates of the state-by-state results, even when applying the demographic weights recommended by the survey. For example, it had Barack Obama beating Mitt Romney by 10 percentage points in Florida (Obama actually won by just 1 point) and narrowly winning Georgia (he lost by 8 points), while it had Romney easily beating Obama in New Hampshire (Obama won there by nearly 6 points). Overall, the poll missed the final result in each state by an average of 7.3 percentage points, a much higher margin than state-specific election polls. That includes 3.1 percentage points of what we call pollster-induced error, meaning that which can’t be explained by sampling error or the difference in timing between the poll and the election.
As a result of this analysis, we’ll continue to use the state-by-state breakouts from Ipsos and other pollsters, but we will significantly lower the weight assigned to them in our polling averages.2 These polls will continue to have some influence around the margin — we don’t want to discard their data entirely — but state-specific polls will have more influence on the forecast. Note that this penalty won’t apply when a pollster such as Ipsos decides to survey one state specifically, only to state subsamples from their national polls.
The overall result of this change is minimal for now. It very slightly lowers Hillary Clinton’s projected popular vote margin over Trump (by about two-tenths of a percentage point) and very slightly increases her projected chances of winning the electoral college (by about 1 percentage point). Even if this change in our algorithm hasn’t had a major effect so far, it’s intended mostly as a preventive measure against allowing our polling averages to be flooded by subsamples from the 50-state polls. This is useful data that we’re pleased to have at our disposal, but not quite the same thing as getting 50 separate surveys from every state.