On Wednesday, we ran an item about the hypothesis, put forward by certain conservative blogs, that the Obama administration was systematically more likely to close Chrysler dealerships whose owners are significant contributers to Republican political candidates. As we noted, it’s absolutely true that the owners of the closed dealerships donated disproportionately to Republicans. However, as one could have learned through a few minutes of searching through FEC disclosure records, this characteristic was not unique to the owners of the closed car dealerships. Rather, auto dealers in general, whether or not their dealerships had been closed, donate disproportionately to Republican candidates — as one might reasonably expect from a group of (mostly) wealthy, older men in suburban areas.

This morning, Marla Singer at the blog Zero Hedge provided a more sophisticated take on the subject. Instead of comparing the list of closed dealerships to the entire universe of car dealers, Singer instead went through the trouble of looking up campaign finance data for the Chrysler dealers who had been allowed to remain open, as well as those who had their businesses closed. She then ran a series of regression analyses based on this data, which produced the following results:

The key thing to look at is this table are the P-values, which are the probabilities of the outcome s occurring due to chance alone. When social scientists look at P-values in order to test the validity of a hypothesis, they are generally looking for a figure of .05 or lower, meaning there is no more than a 1 in 20 chance that an outcome could have occurred due to pure randomness. That is, they want there to be at least a 95 percent chance of the hypothesis being true.

There is a lot of debate in academic circles, which I mostly won’t bore you with now, about whether the choice of 95 percent is the “right” number for tests of statistical significance. The choice of a statistical significance threshold may depend on the particular application as well as things like Bayesian priors. It’s important to emphasize that no statistical analysis exists in a vacuum. There are times — such as when I’m building a predictive model rather than trying to evaluate the “truth” behind a particular hypothesis — when I’ll include a variable even if its statistical significance is less than 95 percent. There are other times, such as when a hypothesis lacks a clear explanatory mechanism, and/or conflicts with other evidence, when I’ll treat even a 95 percent positive finding quite skeptically, and would want a statistical significance threshold of 99 percent or even higher. But for better or for worse, the 95 percent threshold represents the default; if someone claims that something is “statistically significant”, you can assume that they are referring to the 95 percent threshold unless they state otherwise. And if they claim that something is “highly” statistically significant, they are usually referring to a 99 percent likelihood of a positive finding or greater.

As you can see from Singer’s data set, while there are some intriguing relationships in the data, none of them are particuarly close to statistically significant using the 95 percent test. The nearest “hit” is that for Hillary Clinton donors, who — Singer found — were slightly more likely to have their dealerships remain open. However, the associated p-value of .125 for the Clinton dealers does not imply statistical significance at the 95 percent or even the 90 percent level.

In spite of this, Singer reports that “there [is] a significant and highly positive correlation between dealer survival and Clinton donors”. Although she hedges her conclusion a bit later on, this is a fairly irresponsible sentence to have written. Most people, in looking at this same exact set of data, would not only have avoided the implication that it proves the dealergate hypothesis, but would probably have come to something of the opposite conclusion: it argues strongly against the dealergate hypothesis. After all, there is no positive relationship whatsoever in the data on Democratic, Republican, Obama or McCain donations — which until Singer’s analysis was posted approximately 10 hours ago — had been the focus of the dealergate hypothesis. In fact, in several cases — such as for the data on Republican donations — the coefficient has the opposite sign of the one that the purveyors of the dealergate hypothesis were hoping for. Republican donors were incrementally less, rather than more likely likely to have their dealerships shuttered, according to Singer’s analysis, although the pattern is nowhere in the ballpark of being “statistically significant” as most of us would define it.

Predictably, this has not prevented people like Michelle Malkin and Doug Ross from claiming that Singer’s data confirms their hypothesis. Of course, it does not confirm their original hypothesis, which was that donors to Republican candidates were more likely to have their dealership closed. Instead, a new hypothesis has evolved — it’s all about those dirty, rotten Clintons! — the sole reed of evidence for which is Singer’s overstated conclusion (but not really her underlying data itself).

Whenever you see a Magically Mystery Hypothesis like this one — one which constantly transforms itself to fit the (lack of) available evidence — you should be skpetical. Suppose I wanted to prove that some people are skilled at a game of chance like roulette. At the Bellagio in Las Vegas on a busy Friday evening, there are — I don’t know — probably something like 300 people playing roulette at any given time. If I tracked their performance over the course of the evening, I would find that some of them were doing improbably well — there would be “evidence” that about 15 of them were in fact “skilled” roulette players at a 95 percent degree of confidence. I’d also find that about 15 of them were “coolers” — that they were doing worse than one might expect through chance alone.

Would I claim that these results are evidence that roulette is in fact a game of skill? I would certainly hope not. Instead, I’d find that the same people who were “skilled” at roulette one night were, as a group, doing no better than the average player the next night (unless they were cheating).

The way this data is being used is almost the same. Singer ran six sets of regression analysis: one each for Obama, McCain, Clinton, Democratic and Republican donors, and another for those dealers who had made no political contributions at all. She was therefore testing six hypotheses. If these hypothesis were independent from one another (which, to be clear, in this case they aren’t), the odds that at least one of the six would return a p-value of .125 or lower are better than 50:50! Not only are false positives possible — they are practically inevitable, particularly if you test enough hypotheses and tolerate a low enough threshold for statistical significance.

Why, after all, stop at Clinton donors, who until this morning had never been central to the dealergate hypothesis? Why not look at John Edwards donors, or Ron Paul donors, or donations to any of various political action committees, or donations to members of the Senate Banking Committee, or donations to Congressmen who voted for the auto bailout plan? If you looked at enough of these, you would eventually come up with a few positive results — and then you could work backward to formulate your own conspiracy theory around it. There is a name for this sort of practice: data dredging.

At the end of the day, people are going to believe what they want to believe: some people believe that the moon landing was faked, that 9/11 was a grand conspiracy, and that Barack Obama was born in Indonesia. There is no evidence for any of these claims, but that doesn’t stop tens of millions of people from believing them! Dealergate, particularly in its original formulation (that Obama was punishing Republican donors with the Chrysler closings), is in largely the same category.