Thus far, the trendline adjustment that we implemented last week has been quite successful. It has correctly anticipated bounces for Obama in states ranging from Florida to Ohio to Tennessee. It has allowed the model to fall more intuitively into line with changes in the momentum of the race, and to correct some of the timing bias associated with different states being polled at different times.
The model believes that if the election were held today, Obama would win by approximately 6 points. That’s very close to his current lead in the national polling. Intuitively, it feels just about right to me.
However, our goal is not to predict what would happen if the election were held today. Our goal is to predict what will happen in November. In an earlier article on this subject, I framed the question thusly: Suppose we are correct that Obama would win an election held today by 6 points. Is a 6-point Obama win therefore the best prediction of the outcome in November? Up until now, our model has always assumed that it was.
However, this assumption is not correct. Rather, there is a fairly strong tendency for national polling to tighten as one approaches election day. National polls are not equally likely to move upward or downward at any given time. Rather, they are more likely to move in the direction of the candidate who is trailing in the race.
This tendency is actually fairly easy to eyeball if you look at some historical polling data. Below is a table containing the largest lead held by each candidate in any public poll in my database released within 200 days of that year’s election. For 1952-1984 and 1996, the database consists of Gallup polling only; for the other years, it consists of a variety of national polls.
Largest leads for each candidate in public poll
released within 200 days of general election.
.... Biggest Biggest ......
Year GOP Lead DEM Lead Result
---------------------------------------------------
1952 Eisenhower +28 None* Eisenhower +11
1956 Eisenhower +27 None* Eisenhower +15
1960 Nixon +6 Kennedy +4 Kennedy +0.2
1964 None* Johnson +59 Johnson +23
1968 Nixon +16 Humphrey +6 Nixon +0.7
1972 Nixon +34 None* Nixon +23
1976 Ford +1 Carter +33 Carter +2
1980 Reagan +16 Carter +8 Reagan +10
1984 Reagan +21 None* Reagan +18
1988 Bush +17 Dukakis +18 Bush +8
1992 Bush +16 Clinton +30 Clinton +6
1996 None* Clinton +23 Clinton +9
2000 Bush +14 Gore +17 Gore +0.5
2004 Bush +13 Kerry +11 Bush +2
* In 1952, 1956, 1964, 1972, 1984 and 1996, one candidate
led in all public polls in my database taken within 200
days of the election. The *closest* that the trailing
candidate came in those years was as follows: Stevenson
(1952), 2 points; Stevenson (1956), 10 points; Goldwater
(1964), 28 points; McGovern (1972), 16 points; Mondale
(1984), 1 point; Dole (1996), 11 points.
Look at some of those numbers! LBJ at one point had a 59-point lead over Barry Goldwater. Bill Clinton once polled 30 points ahead of George Bush (and Bush once polled 16 points ahead of Clinton). Jimmy Carter once held a 33-point lead on Gerald Ford.
Of course, if you go about looking for the largest leads you can find, you are naturally going to expect to see some regression to the mean. But even if we look at this data more systematically, we still find a fairly robust tendency for a lead in the national polling to diminish by election day. The extent to which it diminishes is a function of two things: the magnitude of the lead — the larger the lead, the more it needs to be discounted — and the number of days until the election. We can specify a regression equation to project the November outcome based on a candidate’s present polling lead as follows:
PROJECTION
= MARGIN*.909
+ MARGIN*ROOTDAYS*-.0475
+ SQRT(MARGIN)*ROOTDAYS*.0604
ROOTDAYS = Square root of the number of days until election.
MARGIN = Size of lead for leading candidate.
Visually, that looks about like this:
This chart is perhaps a little confusing, but it’s exhibiting the two essential features that I talked about before: the larger the lead, the more it needs to be discounted (both proportionately and absolutely), and the closer we get to election day, the less it needs to be discounted. Particularly, a lead starts to become significantly more meaningful once we get within about 30 days of the election, although it’s also the case that presidential elections have tended to tighten within the last 30 days.
So, for instance, a 20-point lead in a poll 300 days before the election projects to only a 6-7 point victory in November. A 15-point lead in a poll taken 100 days before the election projects to a 9-point victory. And so forth. These are very significant corrections; big leads held a long ways before the election must be discounted quite heavily.
As for Barack Obama’s lead right now, the correction required is not quite as dramatic. The regression equation specifies that a 5.9-point lead held 130 days before the election should be discounted by about one-third — to 3.8 points to be exact. That is our new projection for Obama’s margin of victory.
Specifically, what the model now does is to calibrate the trend adjustment to a candidate’s discounted lead in the polls. What this process involves is to run the numbers once through without the discount (just as we had run them before), and then figure out the difference between the candidate’s current lead and his projected winning margin based on our discount formula, and then subtract that number of points from the candidate’s margin in each state. Put less fancily, we are subtracting 2.1 points from Obama’s present trend-adjusted estimate in every state, because all else being equal, we expect McCain to gain 2.1 points between now and November. This lowers Obama’s win percentage from 76 percent to 69 percent, a figure that squares a lot better with my intuition about this election.
*-*
I’m sure that people are sick and tired of all these changes, but this really ought to be the last missing piece of the puzzle, and it’s something that we absolutely must do if our goal is to predict the November outcome rather than merely give a snapshot of the current polling. This is something, frankly, that I should have looked at before, although since the election had been so close until recently, it would not have mattered very much.
You’ll also notice one other, less important change. Our projection now allocates the undecideds in each state 50:50 to the two major candidates, after making an allocation for third-party votes. The third-party allocation differs slightly from state to state depending on the other + undecided vote in that state’s polling. The model had implicitly been allocating the undecideds this way before, but now I’m doing it explicitly, as I want to make it absolutely clear that our projection in each state is in fact a projection of the final outcome rather than some kind of supercharged polling average.
Acknowledgments: I again want to thank Robert Erikson of Columbia University, who has performed similar calculations in the past and gave me the idea for this one, and Andrew Gelman, also of Columbia, who lent me use of his historical polling database.
EDIT: Per some early feedback in the comments, I have changed the way I present the polling detail chart. What we formerly called our projection is now presented as before and described as the “Snapshot”. The Snapshot is our best estimate of what the election would look like if it were held today.
In contrast to the Snapshot is the Projection, which discounts current national polling leads through the process described herein, and also allocates out the undecided vote. This is our best guess at what the election will look like in November.