This is the fifth article in a series that reviews news coverage of the 2016 general election, explores how Donald Trump won and why his chances were underrated by the most of the American media.
Some statistical indicators are more useful than others. Data on early voting, for instance, usually doesn’t provide much predictive insight. Historically, the relationship between early voting in a state and the final voting totals there has been weak, and attempts to make inferences from early voting data have made fools of otherwise smart people. In the 2014 midterms, Democrats used early-vote numbers to claim that the polls were underrating their chances. Instead, it was Republicans who substantially beat the polls.
None of this deterred reporters and analysts from frequently citing early vote data in the closing weeks of last year’s presidential campaign, very often taking it to be a favorable indicator for Hillary Clinton. On Oct. 23, for instance, The New York Times argued that because Clinton had banked votes in North Carolina and Florida, it might already be too late for Donald Trump to come back in those states:
Hillary Clinton moved aggressively on Sunday to press her advantage in the presidential race, urging black voters in North Carolina to vote early and punish Republican officeholders for supporting Donald J. Trump, even as Mr. Trump’s party increasingly concedes he is unlikely to recover in the polls.
Aiming to turn her edge over Mr. Trump into an unbreakable lead, Mrs. Clinton has been pleading with core Democratic constituencies to get out and vote in states where balloting has already begun. By running up a lead well in advance of the Nov. 8 election in states like North Carolina and Florida, she could virtually eliminate Mr. Trump’s ability to make a late comeback.
Initially, these reports on early voting were at least consistent with the polls: Clinton had led in most polls of North Carolina and Florida in mid-October, for instance. But when the race tightened after James B. Comey’s letter went to Congress on Oct. 28, early voting data was increasingly cited in opposition to the polls, with pundits and reporters criticizing sites such as FiveThirtyEight and RealClearPolitics for not incorporating early voting data into their forecasts. (It can be easy to forget now, but we spent a lot of time arguing with people who thought our forecast was too generous to Trump.)
So what happened? In North Carolina, Clinton won the early vote by 2.5 percentage points, or about 78,000 votes. Furthermore, about two-thirds of votes were cast early. But Trump won the Election Day vote by almost 16 percentage points. That was enough to bring him a relatively healthy, 3.6-point margin of victory over Clinton overall.
|Early (mail or in-person)||1,474,296||47.1%||1,552,203||49.6%|
The Election Day surge for the GOP wasn’t anything new in the Tar Heel State, however. In 2012, President Obama had built a 129,000 early vote lead over Mitt Romney — substantially larger than Clinton’s over Trump — but had lost the Election Day vote by a huge margin, costing him the state:
|Early (mail or in-person)||1,297,067||47.2%||1,426,129||51.9%|
So Clinton was running behind Obama’s early voting pace in North Carolina — which obviously wasn’t a good sign, given that Obama had lost the state. Why, then, had people taken the North Carolina numbers as good news for her? Actually, not everybody did. A few news outlets had pointed out that Clinton was running behind Obama’s pace there, and the Clinton campaign itself was worried about its North Carolina numbers.comments it made in December at the Harvard Institute of Politics conference.">1
Still, early voting data can be easy to misinterpret. Early voting is a relatively new innovation. Traditions and turnout patterns vary from state to state, and they can change whenever new laws are passed, or depending on how much the campaigns emphasize early voting.2 Meanwhile, early voting numbers are reported from lots of different states at once. Many news outlets focused on a supposed turnout surge for Clinton among Hispanic voters while giving less attention to signs of decline in African-American turnout.one excellent article on declining black turnout numbers, although it didn’t figure much into their final analyses of the race.">3 The latter was actually more important than the former because blacks are more likely than Hispanics to be concentrated in swing states.
Furthermore, early voting data doesn’t necessarily provide reason to doubt the polls, because early voting is already accounted for by the polls. For instance, some North Carolina polls had shown Clinton losing the state despite winning among early voters, just as actually occurred.
So there are multiple interpretations of the data, but there’s not much empirical guidance on which one works best … that makes for a recipe for confirmation bias. The Times, for instance, was exceptionally confident in Clinton’s chances from the start of the campaign onward, and early voting tended to reinforce its pre-existing views of the race.
There’s also a broader point to be made about the use and abuse of data in campaign coverage. After the election, some of the pundits who had touted Clinton’s early voting numbers as an alternative to polls claimed that “the data” was wrong and had led them astray. And the Times, which had spent a lot of time reassuring its readers that Clinton would win, wrote an article entitled “How Data Failed Us in Calling an Election.”
Whenever I see phrasing like this, I mentally substitute the near-synonym “information” for “data” and reconsider the sentence. Would the Times have published a headline that read “How Information Failed Us in Calling an Election”? Probably not, because that sounds like the ultimate dog-ate-my-homework excuse. Isn’t it the job of journalists to sort through information and uncover the real story behind it?
Related: Politics Podcast
But the thing is, blaming “the data” usually is a dog-ate-my-homework excuse. The problem is often in assuming that because you’ve cited a number, you’ve relieved yourself of the burden of interpreting the evidence. And as we’ve described in the first few installments of this series, news outlets referenced lots of data during the general election but often misinterpreted it, almost always reading it as good news for Clinton even when there were conflicting signals. They touted early voting as favorable for Clinton, even though it hadn’t been very predictive in the past and showed problems for her in states such as North Carolina. They asserted that the Electoral College was a boon for her, even though the data showed it was Trump’s voters and not Clinton’s who were overrepresented in swing states. They highlighted Clinton’s numbers in Arizona, but downplayed data showing Clinton struggling in Ohio and Iowa, which had traditionally been bellwether states. They mostly ignored data showing an unusually high number of undecided voters, which made Clinton’s polling lead much less secure.
I don’t mean to suggest that one should have gone to the other extreme and confidently predicted a Trump victory.4 Nor do I mean to imply that interpreting election data correctly is easy; it usually isn’t. (This goes for us too: FiveThirtyEight got itself in one heck of a mess in assessing Trump’s chances in the Republican primary.) But political journalism circa 2016 was in a place where there was a lot of fetishization of “data,” but not a lot of experience with or appreciation for the tools needed to interpret it — namely, probability, statisticstwo common meanings. There’s statistics as in nuggets of quantified information, e.g., “Tom Brady threw for 28 touchdowns this season” and “there were 17 unprovoked shark attacks in Australia in 2016.” And then there’s statistics as in a branch of science devoted to the analysis and interpretation of data, e.g., “there’s no correlation between shark attacks in Australia and Tom Brady’s passer rating.” At FiveThirtyEight, we’re mostly interested in the latter definition of statistics — that is to say, we’re interested in statistical analysis — since statistical factoids cited without context are mostly just noise.">5 and the empirical method. That made for a high risk of overconfidence in extracting meaning from the data.