The Uncanny Accuracy of Polling Averages*, Part I: Why You Can’t Trust Your Gut

I get a lot of e-mails like this one. I’m going to withhold the reader’s name, because I didn’t ask his permission to use it — so we’ll call him Skeptical Sam.

Nate – I admire your methodology for figuring out who might win Senate contests, but I don’t always buy the results.

For example, your forecast shows that there is a 92 percent chance that Sestak loses to Toomey. I grant that Toomey should be rated the favorite, but I don’t believe that there is a 92 percent chance of that happening. In fact, if anyone were to give me 9.2:1 odds in a bet, I would gladly take it. But that is what your numbers show.

Similarly, most polls show Johnson ahead of Feingold. But your numbers show Johnson has an 80 percent chance of winning. I don’t believe that. I might bet on Johnson winning, but if you gave me 4:1 odds (80 percent), I would gladly bet on Feingold.

Or down the ledger, Murray as a 76 percent chance of holding her Senate seat? I rate her a slight favorite, but not at 3:1 odds.

Carnahan has a 7 percent chance of winning? Laughable.

In short, your numbers are clever but in many instances the formula overstates the chances of one candidate winning over another. They are at variance with common sense. You can’t quantify common sense, but you can question your numbers.

Sam’s argument is pretty simple: he thinks we’re underestimating the amount of uncertainty in our projections.

This is such a pivotal (and common) question that I’m going to take three articles to address it. This is Part I, in which I’ll explain why I don’t think that intuition — what Sam calls common sense — is liable to be an especially precise guide when it comes to estimating the likelihood of particular candidates winning their elections. In Part II, I’ll explain why, conversely, a look at the numbers from recent elections enables us to make such confident-seeming predictions. And in Part III, I’ll explain why there’s an asterisk in the title — why past performance can’t necessarily guarantee future results — and why there are some things that keep me up at night.

Part I

I spend a lot of time thinking about uncertainty. The key difference in Pecota, the forecasting system that I developed eight years ago to predict the performance of baseball players, was not that it did better than its competition, on average (it did in most years, but only by a tiny bit). Rather, it was that it looked at the uncertainty in the forecast as a feature rather than a bug.

For example, it didn’t just tell you how many home runs Derek Jeter would hit on average, but what a best-case scenario looked like and what a worst-case scenario looked like. This not only made the forecasting system more honest, but also provided a lot more information to the reader.

People often forget that there are essentially two parts to any forecast: what we can think of as the mean forecast (“our best guess is that Sarah Palin will win by 7 points”) and the confidence interval (“the margin of error on my guess is plus or minus 9 points”). Taken together, these two figures allow us to calculate the chance of a candidate winning her race (“Ms. Palin is a 93 percent favorite”). This is basically how we get the probabilities you see in all our forecasting products.

The latter part of the forecast — calculating the confidence interval — is frequently taken for granted, despite the fact that it is often the more important part. Certainly that is so for elections, which are binary affairs: the outcome is that one candidate defeats the other(s), whether by 1 vote or 1 million.

I’m guessing we won’t get too much flak if we predict that Tom Coburn is going to win his Senate race in Oklahoma by 50 points, and he actually wins by 70 points. But if some candidate whom we rate as a 10:1 underdog wins his race, I’m sure we’ll hear a lot about it. It’s the latter forecast that requires a confidence interval, that requires us to think probabilistically.

(By the way: not only will we get some of those 10:1 calls wrong, but we should get some of them wrong. If, over the course of several political cycles, every candidate whom we rate as a 10:1 favorite in fact wins his race, that means that our model is not calibrated properly, and we’re overestimating the amount of uncertainty — perhaps these candidates are really 100:1 favorites.)

With due respect to our reader, Skeptical Sam, I’m not sure that people’s intuitions are all that good when it comes to estimating confidence intervals. Most people probably know, almost to the minute, how long their commutes to work take them on average. But if I asked you to tell me how often your commute takes 10 minutes longer than average — something that requires some version of a confidence interval — you’d have to think about that a little bit, and you might wind up being pretty far off. Calculating the average amount you expect your family to spend on groceries in a month, likewise, is easier than estimating the risk of some catastrophic event that will cause you to go bankrupt.

Now, these problems are by no means insoluble. Suppose you know that in a typical N.B.A. game, Kobe Bryant scores 30 points. And suppose I ask you to estimate the likelihood of his scoring 50 points in a game. This is a confidence-interval type of problem (and a slightly tricky one, since Mr. Bryant’s scoring performance probably isn’t quite normally distributed). But you could get a pretty decent answer by looking up how often it has happened in the past (about 3 percent of his games, it turns out.)

Well, in politics, we can do essentially the same thing. How often does a candidate trailing by 10 points in the polls a month before an election come back to win? We could look that up.

And in Part II of this series, that’s what we’ll do.

But also, in politics, there are lots of people working to distract us from this sort of data, and play to (and play with) our intuitions. For example: politicians.

I’ll tell you who doesn’t want you to know how often a candidate like Joe Sestak, who is trailing by 7 or 8 points in the polls, comes back to win his race: Joe Sestak. Because those numbers aren’t very promising for him.

When a candidate appears to trail in a race, he’s going to give you a story about how he’s going to come back (or if he’s a little more enterprising, why he isn’t really trailing in the first place). He’ll talk about how the dynamics of the race are exceptional, about how his internal polls, which are printed on really nice letterhead, show the race to be a dead heat. And he’ll give you some tidbits: Union workers in Wilkes-Barre are breaking 2 to 1 for him, you know, and wait ’til you see his September fund-raising numbers, because people are getting energized, just now getting energized, about this campaign, they’re really getting energized, and that was the plan all along, don’cha know.

Politicians — the ones worth their salt, anyway — are exceptionally skilled at making believers out of people, and they’ll try to make a believer out of you. Some of the time, they’ll make a strong enough argument to persuade even the most seasoned observers. But a much smaller fraction of the time will they actually turn out to be right. That’s what the data says, and it says so pretty clearly.

Politicians may also find willing accomplices in the news media, which can also have an interest in exaggerating the competitiveness of a race. If the kicker to your horse-race story is “Blanche Lincoln is going to lose by about 20 points, and everyone’s just kind of going through the motions here” — well, then it’s not much of a horse-race story.

Instead, ticking off the reasons why so-and-so could still win is sometimes taken as a form of journalistic balance. Despite the odds, the more candid of reporters might include as a qualifier. But expressing too much skepticism about one campaign’s claims might risk a reprimand from an editor.

The (very) few of us who make some sort of living forecasting elections also have perverse incentives. “Experts” are expected to be right — otherwise, they may quickly lose their reputations as experts. Believe me, it’s not doing me any good to say that Carl P. Paladino has only a 1-in-50 chance of becoming the next governor of New York. Even if those are really the odds — and to the best of my abilities, I think that’s about what they are — I’ve got a couple of bad spots on the roulette wheel that could make me look like a total idiot, whereas there’s essentially no up side for me if the “obvious” happens and Andrew Cuomo wins.

If you’re a consumer of political news, a lot of the people you encounter are going to err on the side of exaggerating the amount of uncertainty in a particular race. And a lot of the rest may do just the opposite: they’ll make wild, irresponsible predictions that sound good on television, where nobody remembers 99 percent of what you say half an hour later, even in the YouTube era.

Particularly cunning pundits may even do both at once: make a ridiculous prediction, and hedge it to the hilt. On the eve of the 2008 election, Dick Morris labeled states like Tennessee and Louisiana, where there was no evidence suggesting they were close, as being “tossups,” which implied a landslide victory for Barack Obama was in the offing. But in the very same article, Mr. Morris also said the election was “in flux” and that there could be a “razor-thin margin going either way.” That’s called covering your bases!

All that spin can make it difficult for the reader to get his bearings — to process all the information about the election in a way that is conducive to making a good gut estimate of a confidence interval.

And when we do cut through the clutter and look to the historical record, we may do so in a selective way. We can probably all remember cases when the polling was way off (like the New Hampshire Democratic primary in 2008) or when it moved very sharply in the late stages of the race (as in the Delaware Republican primary this month). We’re less likely to remember the dozens upon dozens of times when a candidate was supposed to win by 15 points, and that’s exactly what she did, or when the outcome of the election was preordained months in advance.

What the Delaware and New Hampshire examples also have in common is that they were primary elections, and polling in primaries can be both highly unreliable and highly volatile — which is what you get when essentially everyone is a potential swing voter. But it’s rarely so in general elections, when only 10 or 20 percent of the electorate really has some sort of decision to make, and the rest vote down the party line each year.

A lot of the lessons that are important for understanding primary polling are important to un-learn for general elections. (Since we’re just coming out of a primary cycle now, we may be especially vulnerable to this problem now.)

Finally, there’s some evidence from behavioral economics that human beings are bad at estimating probabilities out at the tail ends of the bell curve. We’re pretty decent at telling a favorite from an underdog, but we’re not so good at telling an 8:1 underdog from an 80:1 underdog or an 8,000:1 underdog, even though those are huge differences statistically.

All of these are good reasons not to trust your gut.

I’m tempted to paraphrase Charles Barkley: Any knucklehead can calculate an average, but it takes brains to calculate a confidence interval. But that’s not really right: it doesn’t take any special kind of intelligence to calculate a confidence interval. It just takes data, and a willingness to trust it.

Fortunately, we have quite a bit of data, which is what we’ll get to in Part II.

FiveThirtyEight

The Uncanny Accuracy of Polling Averages*, Part I: Why You Can’t Trust Your Gut

Part I

Comments