Since the recent presidential election, I have seen a number of claims about supposed election fraud. The Trump campaign and allies have filed a number of lawsuits alleging voter fraud and other voting irregularities, trying to get judges to stop several battleground states from certifying their vote tallies, but so far almost all these suits have been essentially laughed out of court because the allegations have either been too nebulous, or specific testimony was not seen as particularly credible. For someone who believes President Trump’s allegations of election fraud, this has got to be frustrating, because once the vote counts are certified by the states, any other outcome than a Biden presidency becomes almost impossible.

The next line of defense involves statistical arguments, based on fraud detection techniques used, for instance, by credit card companies. Have you ever gone out of town and tried to use your credit card, only to find that it is being declined? This has happened to me a few times, and the reason for it was that some computer algorithm had determined that the transaction I was trying to make was “unusual,” in some way (e.g., I was in a different state or country than usual), so it was flagged and denied until I called in to confirm that it was really me trying to use the card. The argument goes that we can use these same statistical techniques to show that various election results should be flagged as possibly fraudulent, so we should take some extra time to investigate where these “anomalies” come from.

Is there any merit to this kind of argument, with respect to the presidential election? The short answer is, “No.” For the longer version, I’ll give a couple examples to show why I don’t think these type of arguments should be taken seriously.

What we need to remember when evaluating arguments like these is that they rely on techniques to identify when something statistically “unusual” is going on–not specifically for identifying “fraud.” For instance, the credit card transactions I mentioned were flagged because I WAS doing something unusual–traveling outside my home state–which I typically only do a couple times per year. In other words, sometimes “unusual” things happen for completely innocent reasons. Therefore, if we know in advance that something unusual is going to happen, the computerized flags that pop up don’t worry us. That’s why, nowadays, when I travel out of state (or especially out of the country) I call my credit card company and inform them what I’m doing. They put a note on my account, and when transactions are flagged for being outside my usual geographic area, the company doesn’t refuse them.

One type of argument about the election results I’ve seen a lot goes like this. “The 10,000 votes they counted overnight in State X were about 90% for Biden. Given that Biden’s total fraction of the vote in State X was about 50%, there is only a 0.00… (typically lots of zeros here)… 0007% chance that any batch of 10,000 randomly chosen votes would be 90% for any candidate.” Here’s the main problem with arguments like this. They are predicated on the assumption that the votes were a “random selection” from the pool of all votes–but they were not randomly selected–they were almost all (at least in the cases I’ve seen) from mail-in ballots. What’s more, President Trump had spent the last several months telling his supporters to vote in person, not by mail, whereas VP Biden had been encouraging his supporters to vote in whatever way they could. Polls before the election showed that many, many more Democrats were planning on voting by mail than Republicans. In other words, it’s as if the country had called the credit card company beforehand, and told them to expect that mail-in votes would skew heavily in favor of the Democrats.

Another type of argument I’ve seen relies on something called “Benford’s Law.” (There’s an excellent Wikipedia page for Benford’s Law that you should check out if you are interested in learning more about it.) The idea behind Benford’s Law is as follows. Suppose you randomly choose numbers between zero and some other value–let’s say it turns out to be 145. Next, you randomly select a batch of integers within that range. So the numbers we can choose are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12… 20, 21, 22… etc., etc., 99, 100, 101, 102, 103, 104, 105… 145. If we look at the first digit of every one of those numbers, it turns out that there are more numbers with a first digit of “1” than any other number. That is, the lower the digit, the more probable it is be the first digit of a number within some randomly chosen range. Wikipedia explains:

The law states that in many naturally occurring collections of numbers, the leading digit is likely to be small. In sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time.

https://en.wikipedia.org/wiki/Benford’s_law

If you take the first digits from all the numbers in the collection, the distribution will look like this.

“Forensic accountants,” for instance, will routinely take the first digits from massive lists of financial transactions, to see if they follow a Benford’s Law type of distribution. From long experience, they have determined that such lists usually do follow Benford’s Law, unless something out of the ordinary is going on, so they can use this tool to flag companies to investigate, etc.

It turns out that there are a number of Trump-supporting forensic accountants, and the like, who are trying to apply Benford’s Law to things like vote totals in different counties, just like they would to numbers in a hedge fund’s account books. Watch this YouTube video to see a forensic accountant, Robert A. Bonavito, CPA, explain how he is using Benford’s Law to supposedly detect “fraud” from the vote totals in Georgia counties.

The fundamental problem with Robert Bonavito’s reasoning is that he hasn’t asked any of the fundamental questions that must be asked before knowing whether it’s even reasonable to apply Benford’s Law to a data set like this. For instance, many collections of numbers follow Benford’s Law, but many do not. Are vote totals from different counties one of those sets of numbers that * should* tend to follow Benford’s Law? Bonavito never demonstrates this before drawing the conclusion that there must have been “fraud,” because said vote totals do not follow the law. Also, there are only 159 counties in Georgia. Is this a large enough sample for a problem like this? For instance, if there were only 10 counties, there would be no way their vote totals could possibly follow Benford’s Law.

We can take a first stab at answering these questions by looking at the distribution of first digits in the populations of the 159 counties in Georgia. Here it is.

Clearly, the populations of the 159 counties in Georgia fail the Benford’s Law test just as profoundly as the vote totals from those counties. And if the populations don’t follow Benford’s Law, why would we assume that the vote totals should? In any case, even if county populations and vote totals over the entire country ** do** follow Benford’s Law, 159 counties might be too small a sample to tell if anything unusual is really going on in Georgia.

None of this proves there is no massive conspiracy to fix the presidential election. However, it does prove that conspiracy theorists like Bonavito have a lot more work to do to make the case that there is any reason for suspicion.