Skip to main content
Is Research 2000 Merely Mangling Its Data — Rather Than Fabricating It?

Talking Points Memo received a long and somewhat rambling e-mail from Research 2000’s Del Ali concerning the accusations from Daily Kos that it has fabricated some or all of its polling data. In the e-mail, Ali suggests another mechanism by which some of the unusual patterns observed in Research 2000’s polling data might have occurred.

To you so-called polling experts, each sub grouping, gender, race, party ID, etc must equal the top line number or come pretty darn close. Yes we weight heavily and I will, using the margin of error adjust the top line and when adjusted under my discretion as both a pollster and social scientist, therefore all sub groups must be adjusted as well.

More Politics

[Note: Some of Mr. Ali’s typographical errors have been cleaned up and the emphasis is mine.]

Although it is not crystal-clear what Ali is suggesting, one interpretation is that he feels he has the liberty “under my discretion as both a pollster and social scientist” to adjust his topline results anywhere within the margin of error. Thus, if his raw data had the Democrat at 46 percent and the Republican at 44 percent, and had a margin of error of +/- 4 percent, the Democrat’s number could presumably be adjusted by Ali to be anywhere from 42 percent to 50 percent, or the Republican’s anywhere from 40 percent to 48 percent.

As you can see, this would give Ali quite a bit of discretion: he could adjust his poll to show essentially anything that he wanted, from a decent-sized lead for the Democrat to a modest one for the Republican. Needless to say, this is not what they teach you in Polling 101. There are differences of opinion about whether pollsters should just report their numbers “as is” no matter what, or whether, if the result “feels wrong” to them, they should have the liberty to re-examine their assumptions, such as by applying a different likely voter model. Now, in practice, a pollster will usually have enough knobs to twist between likely voter screens, weighting and sampling assumptions, etc., that they could back into almost any result they wanted more often than not. But there would usually be some scientific pretense for it. Ali’s attitude seems to be considerably more cavalier and his process considerably more ad-hoc.

In another context, this would not be a flattering admission for a pollster to make. But it does formulate something of a defense against some of the statistical evidence that has been presented against Research 2000. For instance, Grebner, et. al. have discussed the “missing zeroes” problem — the fact that Research 2000 has rarely had Obama’s favorability rating remaining the same from week to week, even though this will occur fairly often statistically:

So far as we are aware, no such algorithm shows too few changes of zero, i.e. none has an aversion to outputting the same whole number percent in two successive weeks. On the other hand, it has long been known that when people write down imagined random sequences they typically avoid repetition, i.e. show too few changes of zero.

Grebner et al. argue that this phenomenon could only be a reflection of human intervention — no naturally occurring statistical process could produce it. In my view, that conclusion is correct beyond the shadow of a doubt. This does not necessarily imply, however, that the human being is making up the numbers entirely. He could also be manipulating real data in a scientifically unsound way. Perhaps it feels abnormal to Ali when his raw data has not shown some change in Obama’s ratings: he would therefore tweak the numbers upward or downward by a point, as he feels he has license to. In essence, he could be using real data and making it look fake.

The other major set of statistical claims against Ali concern the odd patterns in his cross-tabular data, which produce results inconsistent with any naturally occurring statistical process. But it doesn’t seem out of the question that you could wind up with some very unusual-looking crosstabs through some horrible buggy mess of a spreadsheet model, particularly if your attitude was that you generate the topline number first and then back into the cross-tabs mostly as a formality.

Long story short, the line between a pollster who is fabricating data and one who mutilates real data beyond recognition is rather blurry, perhaps even intractably so. While I wouldn’t want to hire either one of them, it might make quite a bit of difference from a legal standpoint.

This is why looking at the non-statistical details of the case seems to be essential. Mr. Ali’s dog-ate-my-homework excuses for not releasing either his raw data or the names of his call centers to the public are unpersuasive, to say the least, and from a Bayesian standpoint, makes the hypothesis of fraud much more likely than it otherwise would be. Still, there is nothing in Ali’s long statement to TPM that would convey much confidence in his facility with statistics, and there remains a theoretical possibility that he is guilty of nothing other than having a cavalier and scientifically unsound attitude toward the sanctity of his data.

Nate Silver is the founder and editor in chief of FiveThirtyEight.