What does Benford’s Law have to do with the Presidential Election?

I have been traveling this week and unable to follow much of the discussion about specific voter fraud allegations. I have VERY mixed feelings about the seriousness of alleging voter fraud that could be significant enough to warrant a change in the outcome of a Presidential election amidst our already divided nation. But tonight, a Twitter-scrolling friend asked me “what is Benford’s Law?” while thrusting a phone screen full of histograms into my view.

This is the work I do for a living. Professionally speaking, I am a fraud examiner, a data analyst, a nerd who is a lover of data sets and the stories hidden within them. My jaw nearly fell into the family bonfire when I saw THESE pictures pertaining to the 2020 election:

I’m going to try to take my time to explain what these mean. First, I want to be clear that I was so excited to talk about this topic, that I haven’t vetted out this source of these histograms or many others like it that are floating around on Twitter. I’m going to find time to look into the source data and perform my own analysis to verify that they’re legit. I’ll put that in my next blog. However, make no mistake about it, if these Benford’s Law histograms are based on actual precinct vote total counts, the Biden/Harris data has been doctored by humans for Chicago IL, Milwaukee WI, and Allegheny PA. I included the Miami Dade data because it shows a more realistic possibility of what undoctored election data could look like using the same analysis that is so MESSED UP in those other three localities.

Back up a minute into the life of a data analyst. There are many ways to analyze large data sets, and different methods work better for different types of data, but it is VERY easy for a data analyst to determine when a “shape” (I’ll call it for simplicity) of a histogram can be similarly applied to different categories of the same type of data. Once a trend or pattern is clear, significant deviations from that pattern are either consistent with a specific and well known / explainable factor OR there is something seriously wrong going on in that data set. Abnormalities, anomalies, or outliers must be sampled and investigated to figure out the nature and extent of what is going on.

In the case of these histograms, the red line represents the “shape” of the election data trends that were graphed in their raw forms by the blue bars. If that shape (the red line) fits multiple candidates on the same ballot, it becomes a pattern, and it should hold true for all of the candidates. If any single candidate’s data (blue bars) exhibits significant deviation from that shape, something is wrong.

Straight from Wikipedia, Benford’s law is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. Benford’s law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.

When fraud is suspected in any data set, Benford’s Law is a great starting analysis because when the human brain tries to type in “random” numbers, it simply can’t do it. We tend to repeat our own preferred numbers or what “feels” like it will be random if someone else checks our work. But we aren’t actually random. Random isn’t even the statistically correct word here, but it’s the best way for a non-data person to think about the difference between “naturally occurring” data and “human-created” data. Benford’s law simply shows statistically whether data is “naturally occurring” or whether a human created it, TRYING their very hardest to make it appear “naturally occurring.”

I have performed, summarized, and presented on exactly this kind of trend and been able to show specifically where a deviation from a histogram shape trend was a very clear and provable pattern of fraud. My presentations looked exactly like the graphs you can see. As an example, let’s talk specifically about Milwaukee. Here’s a reminder picture.

Check out Trump’s graph in the top middle. His actual data (the blue bars) are pretty close to the Benford shape (red line). We see a general adherence of the blue bars to the red line, with two offsetting outliers: his precints where the first digit of his count total are “1” were expected to be around 140, but they were actually lower, around 115. Benford’s Law would have predicted the number of precints where his vote total began with a “3” to be around 60, but his actual total was somewhere closer to 85. Other than those two values (1 and 3), Trump’s precinct totals were quite close to Benford’s Law. To clarify, the “leading” digit is only loosely related to the number of votes a candidate received. A leading digit of 1 could indicate that a candidate received 12 votes, 129 votes, 1,910 votes, 15,743 votes, 1,247,112 votes, etc. Similarly, a leading digit of 3 could refer to 3 votes or 3 million votes. So for anyone who would simply say that Joe Biden received excessive leading digits of “4”, “5”, or “6” because he got more votes would be missing the proven point of Benford’s Law.

Similarly to the Trump histogram, Jo Jorgenson’s histogram in the bottom left shows incredibly close actual totals to predicted totals. And in most of the other candidates’ graphs, you can see that the shape of the blue bars always stays reasonably close to the red line. The only exception? The Biden/Harris ticket in the top left. The blue bars (actual data) not only do not approximate the predicted shape (red line), but they bear zero resemblance whatsoever to the shape. The “boxier” trend of Biden/Harris data is exactly the type of outlier that a fraud examiner would look for if fraud was suspected to prove their hypothesis. A human entering vote totals accidentally started more of the totals with a 4, 5, or 6 than they realized would happen in a normally occurring data set. An examiner would then dive into the greatest anomalies, take a large sample from those specific values (in this case, this would be precints where the leading digit of the vote total was a “4”, “5”, or “6”) for more detailed audit. In this way, an election auditor could most efficiently skip into the most likely fraud values to determine whether these anomalies are justifiable.

I hope this makes sense. It is NOT possible that every candidate other than Biden/Harris generally follow a predicted trend, but Biden/Harris do not. In other words, these histograms are completely made up OR the Biden/Harris vote totals by precinct represented in these histograms are completely made up.

I’d like to repeat. I’m going to look for source data to perform my own analysis as time allows, but if these histograms are real, there is 0% chance that these election results for the Biden/Harris ticket in Milwaukee WI, Chicago IL, or Allegheny PA are valid or naturally occurring. And there are countless data analysts / fraud examiners who will be verifying this for any court who dares to ask them.

Happy 2020 election aftermath!

PS I wrote a follow up to this blog that I hope you’ll read here. It clarifies some things, tones down some things, and is generally helpful for understanding what could happen next.

9 thoughts on “What does Benford’s Law have to do with the Presidential Election?

  1. Hi Jennifer, I’m watching this process from the UK and trying to get a handle on what is going on. In the discussions I’ve seen on these graphs on Twitter, the two main criticisms are:

    1) That the OP was doing some sort of cherry picking. I assume your analysis will be able to discover that if that is the case.
    2) It’s not appropriate to apply Benford’s Law to election results; something to do with the act of districting that makes your inputs artificial to begin with.

    It would be useful if you could shed some light on these.

    Like

    1. Thank you for the feedback! I threw this together in about 30 minutes late last night and will definitely address these two criticisms.

      In short on your #2, any data pattern or analysis type either does or does not work for a given set of data. It can’t work for Trump and Jorgenson data but NOT work for Biden data. That’s why the graphs were so astonishing on first glance.

      I’m looking forward to doing some serious analysis on this and appreciate you reading!

      Liked by 1 person

  2. Just did a quick analysis of the dataset (https://county.milwaukee.gov/EN/County-Clerk/Off-Nav/Election-Results/Election-Results-Fall-2020 (You can copy paste the table from the website to excel). The histograms i got are practically similar to the ones above, so that is quite worrying. Would love to see a proper investigation into this, although it feels like the entire world has decided that doesn’t matter. At least that’s how it feels here in Europe
    Greetings from probably the last european to still think the election isn’t settled yet 🙂

    Like

  3. “we find that conformity with and deviations from Benford’s Law follow no pattern. It is not simply that the Law occasionally judges a fraudulent election fair or a fair election fraudulent. Its ‘‘success rate’’ either way is essentially equivalent to a toss of a coin, thereby rendering it problematical at best as a forensic tool and wholly misleading at worst”

    Click to access 206427437.pdf

    Like

    1. Hi John, thanks for the comment. I’ve seen this argument on Twitter, but not from anyone with experience using a Benford analysis in investigating fraud.

      My next blog will cover that Benford is primarily used to assess probable areas of risk for purposes of targeted sampling. Benford’s alone does not conclude “fraud” or not, and there are some data sets where it does not apply at all. However, once data DOES conform to Benford (as is the case with Trump and other candidates as shown), I’ve NEVER seen a scenario where a parallel category from the same data set (like Biden, in this case) with complete non-conformity to Benford turns out to be ordinary upon further investigation. The abnormalities shown here would be the primary targets for further audit or investigation.

      Like

    1. Thanks for reading; I really enjoyed your blog on the subject, too. These are great conversations. My next blog does a better job (I think) of explaining that fraud investigators use Benford to launch or direct their investigations because anomalies are, duh, anomalies. If there’s fraud, that’s where you’ll find it. If there’s an valid explanation for the anomalies, you’ll find that too. It’s how we assess risk in every fraud investigation to make our investigation most efficient.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: