When Good Enough Isn’t Good Enough: On House Forecasts and the Generic Ballot

Emory’s Alan Abramowitz, by way of Pollster.com’s Brendan Nyhan, has a reaction of sorts to our post from Tuesday in which I talked about some of the ambiguities of the generic Congressional ballot. Here’s Abramowitz:

Nate provides a lot of excellent analysis. But there are two pretty silly statements here. First, the generic ballot is a pretty good predictor of both the national popular vote and the national seat results. Second, the national popular vote is a very good predictor of the overall seat results. It definitely is not “relatively irrelevant” to those results.

Let me warn you that some of the ensuing discussion boils down to semantics. What do we mean by a “pretty good” predictor and a “very good” predictor, for instance? There are a lot of times when a particular equation might have a reasonably high R-squared, for example, but would not be all that useful to us in terms of our ability to make predictions about the sort of questions that are most interesting to us, such as what the likelihood is of a Republican takeover of the House.

If we wanted to take the generic ballot today, August 19th, and use it to project out the number of seats that Republicans will gain (or lose, I suppose) in the House, there are basically three sorts of uncertainties that we face.

1. The generic ballot today is an imperfect predictor of the generic ballot on Election Day.

As I wrote on Tuesday, and as others like Columbia’s Robert Erikson have found, this is not actually all that big a concern, as the generic ballot is relatively stable as compared with most political indicators. Still, this does produce some additional uncertainty. The generic ballot moved several points toward the Republicans by election day 2004 as compared how it was printing during the summer, and several points toward the Democrats in 2006.

But suppose that we ignore this, for now. Suppose that we take Pollster’s current trendline estimate of the generic ballot, which has Republicans winning by 5.6 points, and assume that this is exactly the margin that will separate the two parties on Election Morning. There are still two additional problems that we face.

2. The Generic Ballot on Election Day is an imperfect predictor of the national House popular vote.

This is probably the most significant source of error. In addition to the normal ambiguities surrounding any poll — the consensus of polls is often wrong in one or the other direction — we also have the fact that with very rare exception, the generic ballot is framed as presenting “the Republican candidate in your District” against “the Democratic candidate”, or some variation of this, rather than naming the candidates specifically. Some voters might react differently if they knew what the names of the candidates were. Also, some voters literally won’t have the chance to vote for the candidate from their preferred party, because there are usually several dozen Congressional districts — and there have been as many as 100 in some past election cycles — in which one or the other major parties doesn’t nominate a candidate. Finally, some states like Florida don’t even bother to tally the results when a candidate runs uncontested, so their votes won’t count toward the national popular vote at all.

For all these issues, the generic ballot certainly tells you something about the House popular vote, particularly if you make certain adjustments to it, like recognizing the difference between registered voter and likely voter polls. But this contributes a significant amount of uncertainty.

3. The national House popular vote is an imperfect predictor of the seat count.

It’s true that if we knew exactly what the popular vote were, we could come up with a not-bad estimate of the seat count. From the chart that Abramowitz posted, it looks like the average error in projecting the seat count from the popular vote is something just a wee bit north of 10 seats, which would imply a 95 percent confidence interval of about X ± 20 or 25 seats. Again — semantics! — I would call that “pretty good” rather than “very good”: saying, for example, that the Republicans will with 95 percent confidence gain somewhere between 25 and 65 seats, doesn’t seem to impart all that knowledge. But even though the distribution of votes into seats is somewhat uneven — for example, Democratic districts tend to have lower turnout, which somewhat contradicts the fact that the generic ballot tends to overestimate their standing in the national popular vote; but on the other hand, Democratic voters tend to be more concentrated into particular Congressional districts than Republican ones, usually in urban centers — this is less problematic than Step #2.

But of course, we don’t have any knowledge of what the popular vote is in advance — instead, it has to be estimated from the generic ballot (and perhaps other factors). What happens in the real world where when we have to go directly from the generic ballot to projecting a seat count, processing steps #2 and #3 in one fell swoop? Well, we get a big mess. Here is the direct translation from the parties’ generic ballot standing, as inferred from the trendlines that Charles Franklin has generated, into the number of Democratic-held seats in the House, for all elections since 1946.

The forecast misses on average by about 20 seats, which translates into a 95 percent confidence interval of about ±48 (!) seats. At a generic ballot reading of Republican +5.6, for example, where Pollster.com has it now, the regression line above projects a Republican gain of 41 seats — which sounds reasonable, I suppose — but the 95 percent confidence interval runs between a gain of 90 seats and a loss of 7 seats. Not very helpful! And of course, this assumes that we know what the standing of the generic ballot on Election Day, which we don’t, since there are still 75 or so potentially volatile political days to occur before then.

(As an aside, if you were to apply the same technique only to data from 1994 onward, the regression equation would project Republican gains of between 36 and 107 seats; if you were to use data from 1980 onward, it would project somewhere between a 14 seat loss and a 104-seat gain.)

I don’t mean to slam all macro-level attempts at Congressional forecasting: there are significantly more sophisticated versions of macro-level forecasts that political scientists like Abramowitz have worked on. But the generic ballot alone is a very blunt instrument. It basically tells us, “okay, things are probably going to be pretty bad for Democrats, and they could be really bad,” something which any sentient observer of politics would already have known.

That’s why we’re going through the trouble of building a ground-up projection of the House, which attempts to predict the outcome of individual seats, while also understanding that the outcomes and uncertainties in different congressional districts are correlated. This has required collecting lots and lots of data: essentially, every district-level poll, every fundraising record, and every independent forecast since 1998. Although I’m not quite ready to tease at the results, it does seem reasonably clear that taking into account a multiplicity of indicators is helpful — for example, the generic ballot does have some influence, even if you have lots of other information about the races you’re attempting to forecast, but the same is true of each of the other indicators I just mentioned.

I’m sure that even a perfectly-constructed model (and ours won’t be) would still be subject to a significant amount of error — and anecdotally, I suspect that this is a very tricky election to forecast. But given that a lot of this data — which granted, took weeks and weeks to compile and is not exactly sitting at our fingertips — has essentially gone unexamined, I hope you’ll appreciate my desire to demand a greater degree of precision.

FiveThirtyEight

When Good Enough Isn’t Good Enough: On House Forecasts and the Generic Ballot

Comments