More Benford’s Law, How to Investigate Fraud, and Milwaukee

My last blog, What does Benford’s Law have to do with the Presidential Election? took off far more than I expected. I saw some shocking charts on Twitter (that’s before they started getting censored), got excited, and wrote it quickly. The overwhelmingly positive feedback came with some questions that I knew I’d need to clarify pretty quickly.

I acknowledged in my original blog that it was written based on a slate of histograms that I hadn’t generated myself or even tested the accuracy of the underlying source data. I always intended to shore that up, and that’s my actual purpose for this follow-up blog. But before I get to that, I want to address some of the questions that have come up along the way or points that need more clarity.

  1. Histograms, best fit lines, and data distributions/correlations of all sorts (not just Benford) are very useful for determining outliers. Benford is different in that its predicted line (or “shape” I called it before) is always the same; that’s why it’s called “Benford’s Law.” It says that naturally occurring data, on the whole, has 30.1% prevalence of leading digit 1, 17.6% prevalence of leading digit 2, 12.5% prevalence of leading digit 3, and so on, diminishing down to 4.6% prevalence of leading digit 9.
  2. MOST fraud has unusual/manual characteristics that make it stand out in some way. Accordingly, if you analyze data enough different ways, you will see it POP out from the ordinary. Benford’s Law or Benford Analysis is an excellent example of an analysis to determine whether or not there are anomalies worth investigating. There are infinite others, and which one works best just depends on the nature of the fraud and of the data.
  3. All anomalies are not fraud. However, a fraud examiner would be reckless not to check the anomalies first. By definition, anomalies are pieces of data that look different from the norm. Upon audit/investigation, the anomaly is either explainable by valid evidence (some unusual occurrence, but not fraudulent) or by invalid evidence (fraudulent).
  4. There’s some “research” floating around that a few people have pointed out to me that says Benford is basically a coin toss on its ability to detect fraud. I haven’t evaluated the qualifications of the people who put out that article or seen their methods in arriving at that conclusion, so I’m not going to discuss specifically where or if they’re going wrong. But my points 2 & 3 above basically provide my response to that. It is true that some anomalies turn out to have a valid explanation. However, it is also true that anomalies are where fraud is concentrated. This makes the prevalence of fraud within those anomalies far more likely than “a coin toss.” This concentration of fraud becomes thicker and thicker the more that “ordinary data” adheres to a pattern and the more that fraud actually exists within the data.
  5. I’ve seen commentary that Benford is worthless in election data, specifically. This is not realistic when you see data set after data set of election data adhering so closely to Benford. The only reason I was excited about the charts in the first place was because so many candidates DID conform to the Benford distribution. It’s reasonable to conclude that when one candidate did not, it’s a violation of a pattern that is beyond worthy of exploration.
  6. You can probably gather from all of the above points that Benford is an investigative starting point. If you’re looking for needles in a haystack, Benford (once you have established the fact that it “works” for your data) can point you to which part of the haystack to look in. In a bed of rocks, it points you toward which rocks to overturn. Benford analysis can’t stand alone as a conclusion, but it stands as a well-respected fraud detection methodology for launching or directing an investigation.

Moving on. Pictures of histograms were flying all over Twitter and I based my last blog off of them. Explaining that IF those histograms were real, then they’re significant and there’s definitely something to see there. I realized that’s a weak starting point, because if they’re fake, then the entire blog was pointless. So I owe you a follow up, and I intend to provide it. I also have a day job, so it will be in pieces.

In preparing my own analysis, I’m only willing to use “source data” that comes directly from the source. That is, Milwaukee County Wisconsin ward-level granular vote totals can only come from Milwaukee County Wisconsin, not from a spreadsheet that some person uploaded to the internet. I found it here, you have to select Federal, and then Expand/Collapse Totals by Wards. That provides a table of results for all 478 wards (3 have no data, 1 has only 1 vote…these are also technically “anomalies” that probably have an explanation that would make sense to someone familiar with Milwaukee County), vote count totals for all the presidential candidates. I downloaded the data, made my own histograms, and got the same results as the original blog. Whew. That’s great starting evidence that whoever prepared the original histograms did them correctly. In order to both present them somewhat differently and also so as to not have a separate chart for every single candidate, I’ve summarized them like this.

This shows the same anomaly on the Biden/Harris vote totals, just in a different way. The thick red line is the Benford distribution, the white Biden/Harris line strays more from the Benford shape than any other candidate, by far. It’s an anomaly. Bringing this home from the fraud investigation points that kicked this blog off, examiners would want to take a heavy sample of the wards where the Biden/Harris vote total leading digit is a “5” (“6” and “4” are of secondary interest, since they also are elevated, but “5” is the MOST significant deviation). Benford’s Law predicted 37 wards with a leading digit vote count of “5”, but Biden/Harris vote totals had a leading digit of “5” in 79 wards. If there’s fraud in Milwaukee County, they’re likely to find it in those wards. Remember, 37 were predicted, so if these results are due to fraud, we’d expect to find fraud in roughly 53% of these 79 wards. The wards that Trump investigators would want to check for unusual activity are:

City of Milwaukee Ward 15,  City of Milwaukee Ward 19,  City of Milwaukee Ward 27,  City of Milwaukee Ward 30,  City of Milwaukee Ward 31,  City of Milwaukee Ward 37,  City of Milwaukee Ward 39,  City of Milwaukee Ward 40,  City of Milwaukee Ward 41,  City of Milwaukee Ward 49,  City of Milwaukee Ward 62,  City of Milwaukee Ward 63,  City of Milwaukee Ward 66,  City of Milwaukee Ward 74,  City of Milwaukee Ward 75,  City of Milwaukee Ward 76,  City of Milwaukee Ward 77,  City of Milwaukee Ward 82,  City of Milwaukee Ward 100,  City of Milwaukee Ward 107,  City of Milwaukee Ward 108,  City of Milwaukee Ward 112,  City of Milwaukee Ward 118,  City of Milwaukee Ward 119,  City of Milwaukee Ward 124,  City of Milwaukee Ward 137,  City of Milwaukee Ward 141,  City of Milwaukee Ward 142,  City of Milwaukee Ward 148,  City of Milwaukee Ward 150,  City of Milwaukee Ward 160,  City of Milwaukee Ward 163,  City of Milwaukee Ward 173,  City of Milwaukee Ward 175,  City of Milwaukee Ward 187,  City of Milwaukee Ward 198,  City of Milwaukee Ward 205,  City of Milwaukee Ward 208,  City of Milwaukee Ward 212,  City of Milwaukee Ward 213,  City of Milwaukee Ward 217,  City of Milwaukee Ward 218,  City of Milwaukee Ward 227,  City of Milwaukee Ward 245,  City of Milwaukee Ward 258,  City of Milwaukee Ward 272,  City of Milwaukee Ward 279,  City of Milwaukee Ward 284,  City of Milwaukee Ward 295,  City of Milwaukee Ward 298,  City of Milwaukee Ward 304,  City of Milwaukee Ward 305,  City of Milwaukee Ward 308,  City of Milwaukee Ward 312,  City of Milwaukee Ward 317,  Village of Bayside Ward 5,  City of Franklin Ward 8,  City of Franklin Ward 9,  City of Franklin Ward 11,  City of Franklin Ward 13,  City of Franklin Ward 16,  City of Franklin Ward 19,  City of Greenfield Ward 1,  City of Greenfield Ward 4,  City of Greenfield Ward 5,  City of Greenfield Ward 9,  City of Greenfield Ward 10,  City of Wauwatosa Ward 13,  City of West Allis Ward 1,  City of West Allis Ward 3,  City of West Allis Ward 5,  City of West Allis Ward 8,  City of West Allis Ward 12,  City of West Allis Ward 14,  City of West Allis Ward 15,  City of West Allis Ward 20,  City of West Allis Ward 22,  City of West Allis Ward 24,  V. West Milwaukee Wards 1,2,5

I haven’t had time yet to do Allegheny or Chicago from the original post or a whole host of additional heavily populated counties from swing states that would be valuable to review. I’ve had a specific request for Minnesota, and clearly key counties from Arizona, Nevada, Georgia, Michigan, North Carolina, Ohio, and Florida are warranted. I also want to analyze the same counties’ 2016 results to see whether we see anomalies in 2020 that are different from 2016.

Stay tuned.

4 thoughts on “More Benford’s Law, How to Investigate Fraud, and Milwaukee

  1. The average total number of votes in those wards was about 960 votes. The average number of votes for Biden for those wards was about 660. That’s consistent with known levels of support for Biden. In that circumstance you’d expect the leading digit to be close to the mean i.e. around 5 or 6 for Biden…and sure enough it is.

    Trump was less popular (again quite consistent with US voting patterns) and on average got 280 votes per ward. So Trump having more leading digits clustered near 1 and 2 is not surprising.

    There’s not a way mathematically for a popular candidate to match Benford’s Law with these wards.


  2. For example. If you put the City of Milwaukee wards aside and only look at the others (Franklin, Glendale etc) then Trump’s 1st digit figures no longer has 1 as the most common
    1 16
    2 15
    3 31
    4 30
    5 18
    6 17
    7 13
    8 8
    9 3
    That’s not fraud either or a red flag. It’s just a natural outcome of the rough size & distribution of wards and the proportion of the vote he got in them.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: