I have been traveling this week and unable to follow much of the discussion about specific voter fraud allegations. I have VERY mixed feelings about the seriousness of alleging voter fraud that could be significant enough to warrant a change in the outcome of a Presidential election amidst our already divided nation. But tonight, a Twitter-scrolling friend asked me “what is Benford’s Law?” while thrusting a phone screen full of histograms into my view.
This is the work I do for a living. Professionally speaking, I am a fraud examiner, a data analyst, a nerd who is a lover of data sets and the stories hidden within them. My jaw nearly fell into the family bonfire when I saw THESE pictures pertaining to the 2020 election:
I’m going to try to take my time to explain what these mean. First, I want to be clear that I was so excited to talk about this topic, that I haven’t vetted out this source of these histograms or many others like it that are floating around on Twitter. I’m going to find time to look into the source data and perform my own analysis to verify that they’re legit. I’ll put that in my next blog. However, make no mistake about it, if these Benford’s Law histograms are based on actual precinct vote total counts, the Biden/Harris data has been doctored by humans for Chicago IL, Milwaukee WI, and Allegheny PA. I included the Miami Dade data because it shows a more realistic possibility of what undoctored election data could look like using the same analysis that is so MESSED UP in those other three localities.
Back up a minute into the life of a data analyst. There are many ways to analyze large data sets, and different methods work better for different types of data, but it is VERY easy for a data analyst to determine when a “shape” (I’ll call it for simplicity) of a histogram can be similarly applied to different categories of the same type of data. Once a trend or pattern is clear, significant deviations from that pattern are either consistent with a specific and well known / explainable factor OR there is something seriously wrong going on in that data set. Abnormalities, anomalies, or outliers must be sampled and investigated to figure out the nature and extent of what is going on.
In the case of these histograms, the red line represents the “shape” of the election data trends that were graphed in their raw forms by the blue bars. If that shape (the red line) fits multiple candidates on the same ballot, it becomes a pattern, and it should hold true for all of the candidates. If any single candidate’s data (blue bars) exhibits significant deviation from that shape, something is wrong.
Straight from Wikipedia, Benford’s law is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. Benford’s law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.
When fraud is suspected in any data set, Benford’s Law is a great starting analysis because when the human brain tries to type in “random” numbers, it simply can’t do it. We tend to repeat our own preferred numbers or what “feels” like it will be random if someone else checks our work. But we aren’t actually random. Random isn’t even the statistically correct word here, but it’s the best way for a non-data person to think about the difference between “naturally occurring” data and “human-created” data. Benford’s law simply shows statistically whether data is “naturally occurring” or whether a human created it, TRYING their very hardest to make it appear “naturally occurring.”
I have performed, summarized, and presented on exactly this kind of trend and been able to show specifically where a deviation from a histogram shape trend was a very clear and provable pattern of fraud. My presentations looked exactly like the graphs you can see. As an example, let’s talk specifically about Milwaukee. Here’s a reminder picture.
Check out Trump’s graph in the top middle. His actual data (the blue bars) are pretty close to the Benford shape (red line). We see a general adherence of the blue bars to the red line, with two offsetting outliers: his precints where the first digit of his count total are “1” were expected to be around 140, but they were actually lower, around 115. Benford’s Law would have predicted the number of precints where his vote total began with a “3” to be around 60, but his actual total was somewhere closer to 85. Other than those two values (1 and 3), Trump’s precinct totals were quite close to Benford’s Law. To clarify, the “leading” digit is only loosely related to the number of votes a candidate received. A leading digit of 1 could indicate that a candidate received 12 votes, 129 votes, 1,910 votes, 15,743 votes, 1,247,112 votes, etc. Similarly, a leading digit of 3 could refer to 3 votes or 3 million votes. So for anyone who would simply say that Joe Biden received excessive leading digits of “4”, “5”, or “6” because he got more votes would be missing the proven point of Benford’s Law.
Similarly to the Trump histogram, Jo Jorgenson’s histogram in the bottom left shows incredibly close actual totals to predicted totals. And in most of the other candidates’ graphs, you can see that the shape of the blue bars always stays reasonably close to the red line. The only exception? The Biden/Harris ticket in the top left. The blue bars (actual data) not only do not approximate the predicted shape (red line), but they bear zero resemblance whatsoever to the shape. The “boxier” trend of Biden/Harris data is exactly the type of outlier that a fraud examiner would look for if fraud was suspected to prove their hypothesis. A human entering vote totals accidentally started more of the totals with a 4, 5, or 6 than they realized would happen in a normally occurring data set. An examiner would then dive into the greatest anomalies, take a large sample from those specific values (in this case, this would be precints where the leading digit of the vote total was a “4”, “5”, or “6”) for more detailed audit. In this way, an election auditor could most efficiently skip into the most likely fraud values to determine whether these anomalies are justifiable.
I hope this makes sense. It is NOT possible that every candidate other than Biden/Harris generally follow a predicted trend, but Biden/Harris do not. In other words, these histograms are completely made up OR the Biden/Harris vote totals by precinct represented in these histograms are completely made up.
I’d like to repeat. I’m going to look for source data to perform my own analysis as time allows, but if these histograms are real, there is 0% chance that these election results for the Biden/Harris ticket in Milwaukee WI, Chicago IL, or Allegheny PA are valid or naturally occurring. And there are countless data analysts / fraud examiners who will be verifying this for any court who dares to ask them.
Happy 2020 election aftermath!
PS I wrote a follow up to this blog that I hope you’ll read here. It clarifies some things, tones down some things, and is generally helpful for understanding what could happen next.