Comparison Study: Unusual Patterns in Strategic Vision Polling Data Remain Unexplained

The biggest complaint I received in response to yesterday’s article, “Strategic Vision Polls Exhibit Unusual Patterns, Possibly Indicating Fraud”, is that I had not provided for an adequate control group. Sure, perhaps Strategic Vision’s polls exhibit apparently highly nonrandom behavior (this is almost irrefutably true, insofar as it goes). But perhaps this is true of all pollsters, rather than Strategic Vision specifically?

To provide for a more apples-to-applies comparison, I’ve decided to compare Strategic Vision against the Quinnipiac Poll. Why Quinnipiac?

— Like Strategic Vision, Quinnipiac tends to concentrate on certain states and regions, rather than the entire country. In fact, they survey many of the exact same states as Strategic Vision. Quinnipiac regularly polls Florida, Ohio, Pennsylvania, Connecticut, New Jersey and New York, and somewhat less regularly, Colorado, Michigan, Minnesota, and Wisconsin. Of these states, Florida, Ohio, Pennsylvania, New Jersey, Michigan and Wisconsin are all among those routinely polled by Strategic Vision. Strategic Vision does poll some states like Georgia that Quinnipiac doesn’t, and Quinnipiac polls some states like Connecticut that Strategic Vision is not engaged in. But generally speaking, the overlap is quite strong.
— Like Strategic Vision, Quinnipiac tends to produce somewhat long survey instruments that ask a variety of questions, not just “horse race” numbers but also approval ratings and questions on various dimensions of public policy.
— Quinnipiac and Strategic Vision also tend to poll at broadly similar time scales, issuing new data in a region perhaps every month or every couple of months, which some acceleration in frequency as an election nears.

Quinnipiac and Strategic Vision, in other words, are asking many of the same questions of many of the same people. If there are unusual statistical patterns evident in Strategic Vision’s polls, and these features are “normal” parts of the survey landscape, then they are likely to be replicated to a large degree by Quinnipiac.

For the comparison, I looked at all Quinnipiac polls conducted since the date of November 12, 2007. This cut-off point was selected because it yields 5,535 data points, almost exactly matching the 5,544 data points we got by looking at all Strategic Vision polls since 2005.

The ground rules are otherwise the same. There is no fancy math here really — the exercise simply counts the trailing digits in the survey data (for example, if a certain poll is Obama 42, Clinton 38, the trailing digits are ‘2’ and ‘8’). I do not include “non-response responses” like “other” or “undecided” in the count; categories like “about the same” (where the alternatives might be “better” or “worse”) are also considered “non-responses”. Nor did I include a tally for third-party candidates in races between the two major parties. I also excluded party primaries in which more than two candidates were listed, and approval and policy questions for which more than two affirmative choices were provided.

Quinnipac also conducts a small amount of polling at the city level (New York City, specifically), and at the national level. I exclude these; only the state-level polls are included. They also conduct a very small amount of polling on sports questions (“do you like the Red Sox or the Yankees?”). I exclude these polls too; only the questions on politics and policy questions are used.

Here, then, is the distribution of trailing digits for Quinnipiac:

These results appear to be slightly nonrandom. For example, there are a few too many 2’s and 3’s, and somewhat too few 7’s and 9’s. The worst discrepancies are about 2.4 standard deviations (σ) from what you’d expect from a truly random, uniform distribution.

There also appears to be some tendency for the smaller values (like 0, 1, 2, and 3) to occur more frequently than the larger ones (like 7, 8, and 9). This would be consistent with a distribution that at least partially observes Benford’s Law, in which smaller digits are more likely to occur.

By contrast, here’s what we had for Strategic Vision.

These differences from random are much, much larger. Whereas, for the Quinnipiac data, the gap between the smallest value (505, for the digit 9) and the largest (608, for the digit 2) is 20 percent, for Strategic Vision the gap (676 versus 431) is 57 percent.

In addition, the pattern of the discrepancies is different. Whereas, for Quinnipiac, the smaller digits may have been occurring somewhat more frequently — something that would be consistent with a quasi-“Benfordian” distribution — in Strategic Vision’s case it’s the largest digits that are associated with the highest frequencies. Although the mathematics here are actually fairly complex, there is no recognized mathematical process that I am aware of that would produce a distribution like Strategic Vision’s.

Here is an alternate illustration of the same data, measured in terms of the deviation of the actual values from a uniform distribution, first in as raw numbers and then in terms of σ.

As I mentioned, the worst discrepancies for the Quinnipiac data are about 2.4 σ (standard deviations) from the norm, something that will occur through chance alone in about 1 out of every 60 cases, assuming a two-tailed probability. This is not the same as saying that the entire distribution has only a 1-in-60 odds of occurring by chance, since if you’re looking at ten digits, you have ten opportunities to get unlucky and have an aberrant result. Still, the distribution is probably not completely random relative to an assumption of uniformity, although it appears potentially quite random relative to a more “Benfordian” distribution.

By contrast, the worst discrepancies in the Strategic Vision data are 5.7 and 5.3 standard deviations from the norm. Deviations of that magnitude will occur by chance alone only about once per 83,000,000 occasions, and once per 8,600,000 occasions, respectively.

***

To recap, it is not clear that the distribution of trailing digits in polling data is, or should be, entirely uniform or random. For a relatively heterogeneous set of polling data (many different questions from many different states), the most likely hypothesis seems to be that the distribution is somewhat uniform, and somewhat “Benfordian”, with some concentration toward the lower digits.

For a more homogeneous set of data — if we were looking only at McCain versus Obama polling in New Hampshire, for instance — these assumptions very well might not hold at all. However, both the Quinnipiac and Strategic Vision data sets are in fact quite heterogenous. Moreover, they are about as heterogenous as one another, so if we saw deviations of a certain magnitude it one sample, we’d probably expect to see deviations of a broadly similar magnitude in the other.

But that’s not what we see at all. The Strategic Vision data is much, much, much more nonrandom than the Quinnipiac data, as compared to a uniform distribution. If the comparison is to a fully or partially “Benfordian” distribution instead, then the discrepancy is even worse.

Bottom line: It is highly unlikely, in my opinion, that the distribution of the results from the Strategic Vision polls are reflective of any sort of ordinary and organic, mathematical process.

That does not necessarily mean that they simply made these numbers up.

As the brilliant Mark Grebner pointed out to me, for instance, some systematic deviations from uniformity could plausibly occur as a result of rounding. If Strategic Vision’s standard polling sample were 750 people, for instance, and they followed any of the typical rounding procedures (i.e. rounding to the nearest whole number, always rounding down, or always rounding up), then the odd-numbered digits would occur about 14 percent more often than the even-numbered ones. However, Strategic Vision’s samples all consist of exactly 600, 800 or 1,200 respondents. These particular values are divisible by 100, which means that should map uniformly upon rounding.

Another possibility is that these results are an artifact of Strategic Vision’s weighting procedures. Maybe their weighting algorithm is oddly or poorly designed, and so these irregularities are introduced only after their raw data has been massaged. I don’t think this is particularly likely. But perhaps if David Johnson at Strategic Vision could take the time to carefully explain his weighting procedures, we could explore this possibility.

***

Instead, Mr. Johnson has been busy telling reporters that he’s going to sue me.

I am well aware of Strategic Vision’s history of litigiousness. As a result, I have been fairly circumspect about exactly what I’ve said. There is a lot of “hearsay” and circumstantial evidence about Strategic Vision’s practices that I could introduce, but I have not done so (although I absolutely assert the right to engage in responsible speculation at a later point in time). We are simply taking a good and honest look at the numbers — reporting verifiable facts — and providing a number of possible interpretations of them, all of which are entirely legally, morally, and statistically responsible.

I would encourage other researchers, including the members of Strategic Vision’s team, to critique, examine and replicate my studies. There are undoubtedly some assumptions I have made that can reasonably be debated or altered. In addition, there are almost certainly also some transcription errors, since all of this data was hand-coded. There are also a lot of people who are much more versed in probability theory than I am, and could probably place more precise estimates on the magntidue of the discrepancies.

However, I would emphasize that these appear to be extremely robust findings. I believe they would hold up, and would do so somewhat vigorously, even with fairly significant changes in assumptions or methods, and even if some errors were detected.

Mr. Johnson may be right that the implication that his data may have been forged could be difficult to categorically disprove. Had the statistical evidence been only marginally compelling, I would not have made it. With that said, I would also tend to treat — and would encourage those in the media to treat — “alternate hypotheses” raised by Strategic Vision with some greater-than-usual amount of sympathy. So far, Johnson has not offered any.

FiveThirtyEight

Comparison Study: Unusual Patterns in Strategic Vision Polling Data Remain Unexplained

Comments