Benford’s law is an amusing mathematical pattern in which the first digits of randomly sampled numbers tend to have a distribution in which 1 is the most common first digit, followed by 2, then 3, and so forth. It’s the distribution of digits that arises from numbers that are sampled uniformly on a logarithmic scale.
In our Teaching Statistics book, Deb and I describe a classroom demonstration where we show how Benford’s law applies to street addresses sampled randomly from the telephone book. In a more serious vein, Walter Mebane has written about the application of Benford’s law to vote counts.
In the past several days, a few people have asked me about applying these ideas to the recent Iranian election. Today, someone pointed me to an article by Boudewijn Roukema, which states:
The results of the 2009 Iranian presidential election presented by the Iranian Ministry of the Interior (MOI) are analysed based on Benford’s Law and an empirical variant of Benford’s Law. The null hypothesis that the vote count distributions satisfy these distributions is rejected at a significance of p < 0.007, based on the presence of 41 vote counts for candidate K that start with the digit 7, compared to an expected 21.2-22 occurrences expected for the null hypothesis. A less significant anomaly suggested by Benford's Law could be interpreted as an overestimate of candidate A's total vote count by several million votes. Possible signs of further anomalies are that the logarithmic vote count distributions of A, R, and K are positively skewed by 4.6, 5.8, and 2.5 standard errors in the skewness respectively, i.e. they are inconsistent with a log-normal distribution with p ` 4 × 10−6, 7 × 10−9, and 1.2 × 10−2 respectively. M's distribution is not significantly skewed.
I don’t buy it. First off, the whole first-digit-of-7 thing seems irrelevant to me. Second, the sample size is huge, so a p-value of 0.007 isn’t so impressive. After all, we wouldn’t expect the model to really be true with actual votes. It’s just a model! Finally, I don’t see why we should be expecting distributions to be lognormal.
Maybe there’s something I’m missing here, but that’s my quick take. This is not to say that I think the election was fair, or rigged, or whatever–I have absolutely zero knowledge on that matter–just that I don’t find this analysis convincing of anything. I will say, though, that Roukema deserves credit for presenting the analysis clearly.
P.S. In response to comments: let me emphasize that I’m not saying that I think nothing funny was going on in the election. As I wrote, I’m commenting on the statistics, I don’t know the facts on the ground. To move my comments in a more constructive direction (I hope), let me pull out this useful comment from Roukema’s article: “One possible method to test whether this is just an odd fluke would be
to check the validity of the vote counts for candidate K in the voting areas
where the official number of votes for K starts with the digit 7.” Further investigation could be a good thing here.
I did not find Roukema’s argument convincing; that does not mean that I consider it a bad thing that the article was written. The article is a first draft of an analysis; it might end up leading to nothing, or it might be unconvincing as it stands now but lead to some important breakthroughs. We can see what further analysis turns up. Again, my verdict is not a Yes or a No, it’s an “I’m not convinced.”
P.P.S. A commenter on our other blog pointed out this analysis of the Iran vote counts by Walter Mebane, who’s the expert in this area.