Nonrandomness in Research 2000’s Presidential Tracking Polls

This is one of the things that I pointed out to Mark Blumenthal had been odd-seeming about Research 2000’s polling:

Likewise, take a look at their Presidential tracking numbersfrom 2008 (http://www.dailykos.com/dailypoll/2008/11/4).They published their daily results in addition to theirthree-day rolling average ... and the daily results wereremarkably consistent from day to day.  At no point, forinstance, in the two months that they published daily resultsdid Obama's vote share fluctuate by more than a net of 2points from day to day (to reiterate, this is for the dailyresults (n=~360) and not the rolling average).  That justseems extremely unlikely -- there should be more noise thanthat.

Let’s put some flesh on them bones.

In 2008, Research 2000 published the results of its daily samples in its Presidential tracking poll. To clarify, this means that if they had a tracking poll that ran from Wednesday through Friday, they’d tell you what the individual results were for Wednesday, Thursday and Friday respectively, in addition to the aggregated numbers. I for one appreciated this and actually used the daily numbers rather than the multi-day tracking averages in our forecasting models.

A lot of pollsters would have been reluctant to do this because the sample sizes were quite small — on average, about 360 persons for each daily sample — and presumably would have revealed rather striking variation from day to day simply due to sampling error. The margin of error on a sample size of 360 is +/- 5.2 points, so it would be fairly normal for Barack Obama’s numbers to careen (for example) from 54 points one day, to 48 points the next, to 52 the day afterward.

But in fact, this didn’t happen. In fact, their daily samples showed barely any movement at all. In the 55 days of their tracking poll, Barack Obama’s figure never increased by more than 2 points, nor declined by more than 2 points.

In contrast, we can run a simulation to see how much movement there “should” have been in the daily samples based on the following assumptions:

* Since the average performance of Barack Obama over the course of Research 2000’s tracking poll was 50 percent, we assume that a voter has a 50 percent chance of choosing Obama and a 50 percent chance of not choosing Obama (i.e. either McCain or undecided; it doesn’t matter which). Note that we assume the true, underlying level of Obama support is constant at 50 percent over the course of the tracking period. In fact, it of course would have varied some, owing to events on the campaign trail. But this should have resulted in more day-to-day variation rather than (as we see in Research 2000’s polling) less. So, this assumption is actually favorable to them.

* We assume that the sample size for Research 2000’s daily sample was some random number between 350 and 370 persons.

* We take the number of Obama “voters” from our random sample and divide by the sample size, and then round to the nearest whole number, to produce that day’s result.

* We then then measure the change in Obama’s performance from one day to the next, repeating this process about 30,000 times to create a robust sample.

The simulation found that Obama’s daily numbers should have moved by at least 3 points from one day to the next about half of the time, given the sample sizes that Research 2000 was using. In fact, they never moved by as many as 3 points, not even once. This behavior is exceptionally nonrandom. Indeed, we should also have seen a fluctuation of 5 or more points about once every four or five days, and a change of 7 or more points about once every two weeks — this obviously never happened, either.

We can run the same experiment on John McCain’s numbers, the only difference being that we assume his true level of support is 42 percent (the average that he polled at during this period) rather than 50 percent. Research 2000 did show McCain’s numbers change by more than 2 points on two occasions: he improved to 44 percent from 41 percent on 10/26, and to 46 percent from 43 percent on 11/1. Those were the only instances, however, and overall the results were just as nonrandom:

Even Bob Barr’s numbers were unusually behaved. Research 2000 had Barr at exactly 2 percent for 50 consecutive days from 9/10 to 10/29; he then fluctuated for a few days before eventually settling in at 1 percent:

You only get results like these if something is orders of magnitude upon orders of magnitude divergent from random. Now, just because Research 2000’s polling is extremely nonrandom, does that necessarily indicate that it is fraudulent? I suppose there are alternate hypothesis, although the jury is out (or soon will be) on how compelling they might be.

They might cite their weighting procedures, but the weighting techniques that pollsters ordinarily use would not cause this kind of underdispersion. In fact, the normal weighting methods have the effect of essentially reducing the sample size (since you’re effectively double-counting some voters while throwing out others), so they would increase, rather than decrease, the amount of variance relative to the sample. (** see note) But perhaps Research 2000 is using some really avant-garde techniques that have the effect of stripping a lot of variance out of the sample, e.g. a statistical model which uses polling as one of its inputs, along with making certain other fairly strong assumptions. This could also be a bug of some kind rather than a feature. Either way, it would take a lot of explaining on their part — but it’s possible.

As a slight variation of this, it’s possible that Research 2000 actually did the polling, but that Del Ali puts his finger on the scale to an usually large degree, i.e. he decides what he thinks the numbers “should” be, and then works backward (such as through his assumptions about likely voters) in order to achieve it.

And it’s possible that Research 2000 actually did the polling, but for some reason were too lazy to do the cross-tabs and decided to reverse-engineer them, including the daily figures that were printed alongside the tracking averages.

None of these alternate hypothesis exactly speaks well for Research 2000, as all would imply significant departures from what we ordinarily think of as sound and scientific polling practice. Nor, even if they were plausible explanations for this particular anomaly, would they necessarily account for others. But this is another oddity that begs an explanation from them, and none has been forthcoming.

** Actually, having thought about this more (and read a few comments), this won’t necessarily be true, since while it’s true that you’re reducing the effective sample size, you’re also introducing additional a priori information, i.e., that you know the true distribution of party weights in your sample. In different circumstances, this could either increase or decrease the degree of dispersion.

Based on some additional simulations that I’ve done, the average absolute change in Obama’s day-to-day numbers with a party ID weighting scheme should be about 2.3 or 2.4 points, rather than 3.0 points, and a shift of 3 or more points in either direction should occur about 40 percent of the time, rather than 50 percent of the time. Still, this would not explain why Obama’s numbers moved by an average of only 1.0 point each day in Research 2000’s daily samples, or why they never moved by more than 3 points.

Nate Silver is the founder and editor in chief of FiveThirtyEight.

Filed under