Seen Through Sharper Statistical Lens, Anomalies in Strategic Vision Polling Remain

Note: the below is fairly technical, but since the discussions of Strategic Vision’s polling had become quite technical in the comments, I thought it was worth giving Michael Weissman, a retired physics professor at the University of Illinois and a frequent commenter at this website, a guest column in this space. Using a robust and fairly elegant statistical technique known as Fourier analysis, Weissman has found that Strategic Vision’s polls indeed contain unusual statistical artifacts that are highly unlikely to have arisen by chance alone and which differ substantially from those of comparable pollsters. I have given Weissman’s words a light, non-technical edit, with his permission, from the version he originally sent to me. –Nate Silver
____

Fourier visits Strategic Vision
by Michael Weissman

Three weeks ago, a polling association censured Strategic Vision LLC (“SV”) as the only pollster who refused to answer repeated requests for routine information on their methodology — twenty other pollsters had complied. Nate Silver followed up by checking whether there was anything statistically odd about SV’s results, finding that the distribution of trailing digits in their reported percentages for the two major candidates showed much larger deviations from uniformity than would be expected by pure chance draws from a uniform distribution. Some digits, such as 8, appeared much more often than others, such as 1. A closely matched comparison group of polls from Quinnipiac also showed larger-than-expected deviations from uniformity, but not nearly as extreme. A commenter on this site, “steve”, sent in the results from a comparable collection of SurveyUSA polls, showing no unusual non-uniformities at all. A discussion immediately ensued on this blog and others as to whether the strongly non-uniform SV results could easily arise by normal causes or whether they constituted evidence suggesting that the results had not been obtained by other-than-normal polling methods.

One potential source of non-random non-uniformity, pointed out here and elsewhere, could be some rounding method that systematically favored evens or odds. It turns out, however, that evens and odds appeared with nearly the same frequency in the SV result. In addition, the sample sizes that SV typically uses are divisible by 100, making rounding errors unlikely.

A more challenging objection was as follows: There is no a priori reason to expect the distribution of trailing digits to be uniform, even in a large sample. We know that the full percentage poll results are not uniformly distributed from 0 to 100. Polls, rather, are generally taken in races where the leading candidates each have some major chunk of the vote. There are usually a few undecideds as well. So you might expect the distribution of ideal poll results to have a broad peak somewhere roughly in the vicinity of perhaps 45, trailing off smoothly on either side. That’s not uniform – although, if the results were routinely spread throughout the 30s, 40s, 50s and 60s (as is the case with Strategic Vision’s polls), you would expect a pretty smooth distribution. The distribution should also be cyclic, in that 0 – such as in the number 50 — is just as ‘close’ to 9 (49) as it is to 1 (51).

The problem is that pretty smooth isn’t definitive enough to say whether the extra variance (the average of the squared differences from uniformity) should be considered alarming or not. Nate explored tentatively some other possible non-uniform distributions, but there were some justified objections that these were arbitrary. What’s needed is a way to remove the variability due to the non-uniform distribution without pretending to know just what the distribution is. Fortunately, we have some tools – in particular, a tool called Fourier analysis — to solve what might sound like an intractably subjective problem.

Fourier waves of different frequencies combining to form

another, seemingly complex wave.

First, regardless of the true underlying distribution of results, the actual poll results cannot show any major non-random variations between adjacent digits. The reason is that SV’s polls are taken of relatively small numbers of subjects (generally 600, 800 or 1200), leaving random uncertainties in each result of about 2 percent. If, for example, there were some (wildly implausible) real tendency of the true values to cluster on even digits as opposed to odd, the poll results wouldn’t show it very much because the random errors would smear them out too much.

Second, we have a standard mathematical tool called Fourier analysis to describe our ten digits in terms of components. These components can be manipulated such that the non-random non-uniformity is concentrated in some of the components, while leaving others random. This provides for a big advantage over the initial form, in which the non-uniformity might be distributed among all ten numbers.

One of the Fourier components is completely flat and just represents the average value. The other nine Fourier components are sinusoidal waves on our plots of occurrence rates for the ten digits. These include a range of broad and narrow waves. The wave with the most frequency is the period-2 even-odd cycle — but I have chosen to ignore this because it might plausibly arise from rounding methods. The most slowly-moving wave has period 10. There are two such period-10 components, with different peak locations. These are the components where some plausible non-uniform distribution could show up, even after smoothing by the random sampling error. So we can remove them too, without bothering with arguments about what we think they should be.

Now comes the fortunate part: the smoothing of the distribution from random sampling effectively wipes out all the non-random shorter-period components. This reduction in the shorter-period components can be calculated with great quantitative precision using the known width and Gaussian shape of the sampling error distribution, providing for very clean random sampling variations.

Does this leave us enough statistics to work with? There were originally ten Fourier coefficients, and we’ve thrown out the irrelevant mean, the two that could reflect non-uniformity, and the one that could come from rounding. That leaves six with random amplitudes. It would be nice to have more than six, but that’s enough to catch extreme cases. We know how big the coefficients should be on the average because standard simple statistics tell us precisely how big the random variations are on average in our numbers.

Now we can ask: how big is the variation of the numbers, after all the suspect components of the variation are filtered out, compared to the statistical expectation? Remember, this filtering is done precisely in response to the serious objections, largely by defenders of SV, which were made to Nate’s original post, removing the potentially innocent components which could have made SV’s statistics look suspect. Here, then, is the filtered variance as compared to the statistically expected filtered variance:

SurveyUSA: 0.46
Quinnipiac: 0.30
Strategic Vision 4.40

How unlikely are those results? The SUSA result is pretty much typical. Quinnipiac actually has notably low variance, but random chance would result in variances that low or lower about 5 percent of the time. The Strategic Vision result, on the other hand, or something more extreme, would occur by chance with probability only 0.00019. That’s not as low a p-value as the results obtained without filtering the non-uniform components, but it’s still very low — less than one chance in 5000 to have occurred by chance alone. For statistical sophisticates, this is a genuine relevant p-value, testing a well-specified prior hypothesis, not the sort of misleading p value obtained when one screens many data sets looking for anything unusual.

I’d like to thank Nate for getting this started and various commenters for helping keep the discussion lively: especially ecarlson, Mark Grebner, MarkinIL, steve, shma, and loner. Finally, since the core issue here is transparency, I’ve included the code by which the p value was calculated. Anybody who writes real programs will get a kick out of this, since I used a baby language (Basic) to handle a fairly basic statistical problem.

>list
10 dim d(10)
15 d(0) = 562
16 d(1) = 431
17 d(2) = 472
18 d(3) = 490
19 d(4) = 526
20 d(5) = 599
21 d(6) = 533
22 d(7) = 639
23 d(8) = 676
24 d(9) = 616
30 for i = 0 to 9
40 sum = sum+d(i)
41 sumcos = sumcos+(d(i)-554.4)*cos(0.6283*i)
42 sumsin = sumsin+(d(i)-554.4)*sin(0.6283*i)
43 sumdif = sumdif+d(i)*(-1)^i
45 sumsq = sumsq+d(i)*d(i)
50 next i
60 ave = sum/10
70 print ave
80 var = sumsq/10-ave^2
90 print var
100 lowf = 0.02*(sumcos^2+sumsin^2)+0.01*sumdif^2
110 print var-lowf
120  dev = (var-lowf)/(0.6*ave)
130 print dev
140 p = exp(-3*dev)*(1+3*dev+4.5*dev^2)
150 print p
>run
554.4
5521.44
1463.071532
4.398363
1.882962E-04

Michael Weissman is a retired physics professor (University of Illinois) whose research has focused on using random noise to characterize disordered materials. He is a Fellow at the American Physical Society, was once nominated for the Nobel Peace Prize by Barbara Boxer, and was born and raised a St. Louis Cardinals fan.

FiveThirtyEight

Seen Through Sharper Statistical Lens, Anomalies in Strategic Vision Polling Remain

Comments