Cracking a Tracking Poll: Theory and Practice

At a couple of points over the past week, I have posted estimates of the individual daily results as derived from rolling tracking polls. This evidently is a fairly common hobby, as many of my readers have done the same thing.

After examining the problem in more detail, I have come to the conclusion that this is a rather difficult exercise that inherently involves a large margin for error. But first, let me talk about how one might theoretically go about trying to ‘crack’ a tracking poll, and then we’ll cover some of the difficulties one inevitably encounters.

Suppose that we have a tracking poll that meets the following criteria:

1. We know the exact results of each day’s rolling average for some significant, uninterrupted period. By “exact” results I mean down to the decimal place, etc. The fact that in practice tracking poll results are almost always rounded off to the nearest whole number is a fairly large problem, as we’ll discuss in a moment.

2. The same number of interviews are conducted each day of the sample.

3. The one-day samples are compiled and weighted independently of one another, rather than being weighted as a joint, three-day sample.

If these conditions are met, then we can most likely do a reasonably satisfactory job of estimating daily numbers.

Specifically, we are attempting to solve for n-1 unknowns for each candidate, where n represents the number of days included in the tracking poll. For instance, if we are trying to solve for John McCain’s numbers in the Gallup tracking poll, which uses a 3-day rolling average, we might call these unknowns M1 and M2, which are John McCain’s numbers over some consecutive two-day period. If we know (or can estimate) M1 and M2, we can solve for (or can estimate) McCain’s numbers on any other day of the tracking poll.

This part of the process itself is not at all complicated. If McCain’s three-day average is 46 points in polling taken Monday through Wednesday, and he’s polled at a 44 on Monday (M1) and a 46 on Tuesday (M2), we can see that he’d need a 48 on Wednesday to produce the 46-point average. Then, to solve for Thursday’s numbers, we’d simply repeat the process, using our estimate for his Tuesday number and our newly-derived estimate for Wednesday to solve for Thursday’s result.

What’s tricky, of course, is estimating those initial values of M1 and M2. For any given sequence of tracking poll numbers, there are an infinite number of mathematically valid values of M1 and M2. So what we have to do is guess, and then have some sort of criterion for evaluating which guess is better than another.

For example, let’s take a hypothetical sequence of tracking numbers, and take a couple of guesses at what M1 and M2 might be in order to produce them:

The red, green and blue columns each represent mathematically satisfactory solutions to the daily tracking poll results. But we can probably regard some of these guesses as being better than others. The blue sequence is relatively stable from day to day. The red sequence is slightly less stable, but not terrible. The green sequence, on the other hand, fluctuates by as many as 20 points from day to day — theoretically possible, but not very likely:

What my process does is to take a large number (specifically, 40,000) of different guesses at M1 and M2, as well as O1 and O2 (Obama’s tracking results over the same period). It then scores these guesses over a 60-day window of the tracking poll according to a couple of different criteria:

a. All else being equal, we prefer the day-to-day fluctuations in each candidate’s daily results to be as small as possible.

b. In addition, when we add the results for the two candidates together, we want the fluctuations for unknown/other to be as small as possible. We also want to avoid the two candidates’ results adding up to implausible numbers, i.e., the Obama and McCain numbers should not add up to more than 100 on any given day.

c. We also do not want to see any sort of periodicity in the results. For instance, in the green pattern above, we see a very large result every third day (Wednesday, Saturday, Tuesday); this is typically a signature of a bad guess. To check for this, we break the data up into three interweaving sequences separated by three days each…

`. Sequence A   Sequence B    Sequence C.  August 1.               August 2.                             August 3.  August 4.               August 5.                             August 6.  August 7.               August 8.                             August 9`

…and then take the average result of each sequence. Over the long run, the average result of the three sequences should be roughly equal to one another. Therefore, guesses in which the averages of the different sequences are closer to one another receive better scores.

*-*

From among the 40,000 guesses, we take the 1 percent (400) that receive the best scores according to these criteria. These 400 guesses are averaged together, producing our daily estimates.

If the assumptions I outlined above were valid, this process would produce some fairly definitive results — all of the guesses end up within a percentage point or so of one another. In fact, it would probably be possible to ‘solve’ for the optimal values (either through iteration, algorithm, or some sort of brute force method) that maximize one or more of the scoring criteria.

In practice, unfortunately, these assumptions are not valid:

1. We only see the rounded results for each day’s tracking average, rather than the exact one.

This is a far bigger problem than you might think. Suppose that the tracking average for a candidate is 44 on Wednesday, and 46 on Thursday. Both tracking averages have Tuesday and Wednesday’s results in common, but the latter replaces Monday’s results with Thursday’s.

Since the candidate’s numbers moved up by 2 points, and since two-thirds of the data is common to both samples, we can say that the one-third of the sample that was changed was responsible for the entirety of the movement. That is, we can say that Thursday’s results were 6 points better for the candidate than Monday’s.

Except that — since the figures are rounded, we actually aren’t sure that the change in the tracking poll was 2 points. It could be as small as 1 point, if Wednesday’s results were 44.4999 (rounded down to 44), and Thursday’s were 45.5001 (rounded up to 46). If this was the case, Thursday’s results were only 3 points better than Monday’s. On the other hand, Wednesday’s results might have been as low as 43.5001 (rounded up to 44) and Thursday’s as high as 46.4999 (rounded down to 46). In this case, the tracking poll increased by almost 3 points in one day, meaning that Thursday’s results were 9 points better than Monday’s.

So simply because of this rounding issue, there is a 3-point margin of error built into our daily estimates. Depending on how the figures were rounded, in other words, the daily number could be as many as 3 points higher or 3 points lower than it appears to be.

The way I attempt to adjust for this problem is that in each of the 40,000 simulation runs, I make a random guess at the “true” result for each day’s tracking average, removing the rounding. In some simulations, a tracking number of 44 may be treated as though it’s actually a 43.50, and in others a 44.49. This is a marginally more robust procedure, but it doesn’t really reduce the intrinsic uncertainty due to rounding.

2. The number of interviews may vary from day to day.

Although Rasmussen conducts exactly 1,000 interviews for its tracking poll each day, the numbers for other trackers like Gallup and Hotline can vary slightly from day to day. This is not really a mission-critical concern, but it does contribute to the uncertainty.

3. The daily samples may not be truly independent.

Rasmussen’s process — I am not sure about Gallup’s — is to take the three-day tracking sample and treat it as one collective whole for purposes of weighting and processing its results. For this reason, the daily samples are not truly independent of one another, as the demographic composition of one day’s sample may affect the way the next day’s results are weighted. It is hard to say exactly how much more uncertainty this contributes to the model, but it is probably a bigger problem for a poll like Rasmussen, which uses a ‘fancier’ weighting process involving party ID.

*-*

For all these reasons, attempts to extract daily tracking poll results should be treated as best as rough guesses, subject to margins of error of 5 points or higher.

Since you’ve come this far, I’ll provide you with my current estimates for the Gallup tracking poll, but I’m going to try and avoid doing too much of this going forward:

`Saturday 9/6: McCain 50.6, Obama 44.3; McCain +6.3Friday 9/5: McCain 47.8, Obama 45.2; McCain +2.6Thursday 9/4: McCain 45.5, Obama 45.5; TIEWednesday 9/3: Obama 50.3, McCain 41.7; Obama +8.6Tuesday 9/2: Obama 48.2, McCain 44.8; Obama +3.4Monday 9/1: Obama 48.5, McCain 39.5; Obama +9.0Sunday 8/31; Obama 50.3, McCain 44.8; Obama +5.5`

Nate Silver is the founder and editor in chief of FiveThirtyEight.

Filed under