How to Spot Bad Data Benford’s Law
Transcript of How to Spot Bad Data Benford’s Law
![Page 1: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/1.jpg)
How to Spot Bad Data
Peter O’Reilly
![Page 2: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/2.jpg)
![Page 3: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/3.jpg)
theory
&
application
![Page 4: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/4.jpg)
![Page 5: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/5.jpg)
![Page 6: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/6.jpg)
Simon Newcomb
![Page 7: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/7.jpg)
Frank Benford
![Page 8: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/8.jpg)
“mere curious observation”
![Page 9: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/9.jpg)
![Page 10: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/10.jpg)
![Page 11: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/11.jpg)
Chuck Zlotnick/Warner Brothers Pictures
![Page 12: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/12.jpg)
𝑃 𝑑 = log10(1 +1
𝑑)
![Page 13: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/13.jpg)
First non-zero digit, d Probability according to Benford’s Law, P(d)
1 0.3010
2 0.1761
3 0.1249
4 0.0969
5 0.0792
6 0.0669
7 0.0580
8 0.0512
9 0.0458
Total Sum 1.0000
![Page 14: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/14.jpg)
First Significant Digit Law
a.k.a. Benford's Law
![Page 15: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/15.jpg)
SIGNIFICANT DIGIT
• All non-zero digits are significant:
1, 2, 3, 4, 5, 6, 7, 8, 9
• Zero digits between non-zero digits
are significant:
305, 6002, 70008
• Leading zeros are never significant:
0.01, 0.000424
• Number with a decimal point, trailing
zeros are significant:
1.01000, 2.200, 36.5400
![Page 16: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/16.jpg)
Red digits are significant
4210
505
2190.30
0.09
0.23
![Page 17: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/17.jpg)
Data - best application for
• Random sampling
• Large sample size
• Sufficient variability
• No bounded maximum value
• Counting or measuring based
numbers
![Page 18: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/18.jpg)
No–Go for
• Sequentially assigned numbers: e.g.
check numbers, invoice numbers,
purchase order numbers
• Where numbers are influenced by
human thought: e.g. psychological
price setting thresholds ($9.99)
• Accounts with a large number of
firm-specific numbers: e.g.
accounts set up to record $10
refunds
• Accounts with a minimum or maximum
![Page 19: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/19.jpg)
![Page 20: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/20.jpg)
=LEFT(text,[num_chars])
LEFT returns the first
character or characters in a
text string, based on the
number of characters you
specify.
![Page 21: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/21.jpg)
![Page 22: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/22.jpg)
=COUNTIF(range, criteria)
COUNTIF function counts the
number of cells within a
range that meet a single
criterion that you specify.
![Page 23: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/23.jpg)
![Page 24: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/24.jpg)
First non-zero digit, d Probability according to Benford’s Law, P(d)
1 0.3010
2 0.1761
3 0.1249
4 0.0969
5 0.0792
6 0.0669
7 0.0580
8 0.0512
9 0.0458
Total Sum 1.0000
![Page 25: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/25.jpg)
Digit Count Actual Frequency Expected Frequency
1 1,402 29.40% 30.10%
2 909 19.06% 17.61%
3 587 12.31% 12.49%
4 459 9.63% 9.69%
5 382 8.01% 7.92%
6 285 5.98% 6.69%
7 281 5.89% 5.80%
8 258 5.41% 5.12%
9 205 4.30% 4.58%
Totals 4,768 100% 100%
![Page 26: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/26.jpg)
![Page 27: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/27.jpg)
values generated using Excel’s RAND() function
![Page 28: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/28.jpg)
SQL Example (database query)
SELECT
LEFT(deposit_amount,1)
AS Digit,
COUNT(LEFT(deposit_amount,1))
AS Digit_Count
FROM
revenue_tax_collection
GROUP BY
LEFT(deposit_amount,1)
ORDER BY 1;
![Page 29: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/29.jpg)
Recap
1. =LEFT()
2. =COUNTIF()
3. Plot bar chart
![Page 30: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/30.jpg)
Further considerations
• 2nd significant digit
• Chi-Square Test 2
• Not absolute proof
![Page 31: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/31.jpg)
Peter O’Reilly, MBA, CMFO, CTC, QPARed Bank CFO, former Jersey City
Treasurer, Pension Actuary, Finance I.T.
Definitive Guide to Local Public Finance in
New Jersey, 2019 publication, available at:
njcmfo.com
![Page 32: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/32.jpg)
![Page 33: How to Spot Bad Data Benford’s Law](https://reader034.fdocuments.in/reader034/viewer/2022042421/62617a9923a795158c0f596a/html5/thumbnails/33.jpg)
References of copyright and public domain image to
comply with the respective terms of public use:
{source, image description (slide deck page)}
• pixabay.com
• “FAKE” (2), Fraud (2), file cabinets (4), thumbs up
(17, 25), thumbs down (18, 26), curved arrow(28),left
arrow (19), finger counting (21)
• wikipedia.com
• Islamic Republic or Iran flags and presidential
candidates (8), Simon Newcomb (6)
• s9.com
• Frank Benford (7)
• Microsoft.com (https://www.microsoft.com/en-us/legal/intellectualproperty/permissions/default.aspx)
• Microsoft Excel logo (5, 19, 21)
• Chuck Zlotnick/Warner Brothers Pictures, https://www.thewrap.com/accountant-
adds-up-real-review-ben-affleck/
• The Accountant movie screen shot (11)