TECHNOLOGY YOU CAN USE AGAINST THOSE ... - Fraud · PDF fileWHO USE TECHNOLOGY BENFORD’S...

©2012

TECHNOLOGY YOU CAN USE AGAINST THOSE

WHO USE TECHNOLOGY

BENFORD’S LAW: THE FUN, THE FACTS, AND THE FUTURE

Benford’s Law is named after physicist Frank Benford, who discovered that there were

predictable patterns to the frequencies of the digits in tabulated data. Using real-world fraud and

bankruptcy-related examples, this session will review the foundations of this interesting

statistical phenomenon and its place in forensic analytics. The session will also cover other

recent Benford applications, including the analysis of bankruptcy filings, financial statement

numbers, and the analysis of Ponzi scheme numbers.

MARK NIGRINI, PH.D.

Professor

The College of New Jersey

Pennington, NJ

Mark J. Nigrini, Ph.D., is a professor at The College of New Jersey, where he teaches

managerial accounting and forensic accounting. His current research involves advanced

theoretical work on Benford’s Law and the legal process surrounding fraud convictions.

Nigrini is the author of Forensic Analytics (Wiley 2011), which describes tests to detect

fraud, errors, estimates, and biases in financial data. Nigrini is also the author of Benford’s Law,

published by Wiley in April 2012. His next book, Losing the War Against Fraud, will be

published in March 2013. His work has been featured in national media, including The Financial

Times, The New York Times, and The Wall Street Journal, and he has published papers on

Benford’s Law in accounting academic journals, scientific journals, and pure mathematics

journals, as well as professional publications such as Internal Auditor and Journal of

Accountancy. His radio interviews have included the BBC in London and NPR in the United

States. His television interviews have included an appearance on NBC’s Extra. He regularly

presents professional seminars for accountants and auditors in North America, Europe, and Asia,

with recent events in Singapore, Switzerland, and New Zealand.

“Association of Certified Fraud Examiners,” “Certified Fraud Examiner,” “CFE,” “ACFE,” and the

ACFE Logo are trademarks owned by the Association of Certified Fraud Examiners, Inc. The contents of

this paper may not be transmitted, re-published, modified, reproduced, distributed, copied, or sold without

the prior consent of the author.


23rd

Annual ACFE Fraud Conference and Exhibition ©2012 1

NOTES Introduction

Benford’s Law gives the expected patterns of the digits in

tabulated data. The law is named after Frank Benford, who

noticed that the first few pages of his tables of common

logarithms were more worn than the later pages (Benford,

1938). From this he hypothesized that people were looking

up the logs of numbers with low first digits (e.g., 1, 2, and

3) more often than the logs of numbers with high first digits

(e.g., 7, 8, and 9) because there were more numbers in the

world with low first digits. The first digit of a number is the

leftmost digit, and 0 is inadmissible as a first digit. The first

digits of 2,204, 0.0025, and 20 million are all 2. Benford

empirically tested the first digits of 20 diverse lists of

numbers and noticed a marked skewness in favor of the low

digits that approximated a logarithmic pattern. He then

made some assumptions related to the geometric pattern of

natural phenomena—despite the fact that some of his

datasets were not related to natural phenomena—and

formulated the expected patterns for the digits in tabulated

data. These expected frequencies are shown below, with D1

representing the first digit and D1D2 representing the first-

two digits of a number:

P(D1=d1) = log(1 + 1/d1) d1 {1, 2, ... 9} (1)

P(D1D2=d1d2) = log(1 + 1/d1d2) d1d2 {10, 11, 12, ... 99} (2)

P indicates the probability of observing the event in

parentheses, and log refers to the log to the base 10. For

example, the expected probability of the first digit 2 is

log(1 + ½), which equals 0.1761.

Durtschi, Hillison, and Pacini (2004) review the types of

accounting data that are likely to conform to Benford’s

Law and the conditions under which a “Benford Analysis”

is likely to be useful. Benford’s Law as a test of data


23rd


NOTES authenticity has not been limited to internal audit and the

attestation functions. Hoyle (et al., 2002) apply Benford’s

Law to biological findings, and Nigrini and Miller (2007)

apply it to earth science data. The mathematical theory

supporting Benford’s Law is still evolving. Examples

include: Berger, Bunimovich, and Hill (2005); Kontorovich

and Miller (2005); Berger and Hill (2007); Miller and

Nigrini (2008); and Jang (et al., 2009).

A set of numbers that closely conforms to Benford’s Law is

called a Benford Set in Nigrini (2012). The link between a

geometric sequence and a Benford Set is well known in the

literature, and is discussed in Raimi (1976). The link was

also evident to Benford, who titled a part of his paper

“Geometric Basis of the Law” and declared, “Nature

counts geometrically and builds and functions accordingly”

(Benford 1938, 563). Raimi (1976) relaxes the tight

restriction that the sequence should be perfectly geometric,

and states that a close approximation to a geometric

sequence will also produce a Benford Set. Raimi further

relaxes the geometric requirement and notes that “the

interleaving of a finite number of geometric sequences”

will also produce a Benford Set. A mixture of approximate

geometric sequences will therefore also produce a Benford

Set.

The digits of a geometric sequence will form a Benford Set

if two requirements are met. First, N should be large, and

this vague requirement of being “large” is because even a

perfect geometric sequence with 1,000 records cannot fit

Benford’s Law perfectly. For example, for the first-two

digits from 90 to 99, the expected proportions range from

0.0044 to 0.0048. Since any actual count must be an

integer, it means that the actual counts (probably either 4 or

5) will translate to actual proportions of either 0.004 or

0.005. As N increases, the actual proportions are able to


23rd


NOTES tend toward the exact expected proportions of Benford’s

Law. Second, the difference log(b)–log(a) should be an

integer value. The geometric sequence needs to span a large

enough range to allow each of the possible first digits to

occur with the expected frequency of Benford’s Law. A

geometric sequence over the range [20, 82] will be clipped

short with no numbers beginning with either a 1 or a 9, and

very few numbers with a first digit of 8.

The Nigrini Cycle

In Forensic Analytics, Nigrini (2011) introduces the Nigrini

Cycle, which is a series of tests that should be run on every

dataset as a start to the data analysis phase. The Nigrini

Cycle comprises six tests made up of the periodic graph,

the data profile, the first-two digit tests, the number

duplication test, the summation test, and the second-order

test.

The Nigrini Cycle is demonstrated on purchasing card data

of a government entity. The specific entity was the victim

of fraud in the prior year and management wanted an

analysis of the current transactions to give some assurance

that the current year’s data was free of further fraud. The

focus was on fraud as opposed to waste and abuse. The first

test was the data profile, shown in Figure 1-01.


23rd


NOTES

Figure 1-01 shows the data profile with the dollar amounts

and counts for the card purchases.

The data profile in Figure 1-01 shows that there were

approximately 95,000 transactions totaling $39 million.

The total should be compared to the payments made to the

card issuer. It is puzzling that there are no credits. This

might be because there is a field in the data table indicating

whether the amount is a debit or credit that was deleted

before the analysis. It might also signal that cardholders

aren’t too interested in getting credits where credits are due.

The data profile also shows that about one-third of the

charges are for amounts of $50 and under. Card programs

are there to make it easy for employees to pay for small

business expenses. The data profile shows one large

invoice for $3,102,000. The review showed that this

amount was actually in Mexican pesos, making the

transaction worth about $250,000. This transaction was

investigated and was a special circumstance where the

Mexican vendor needed to be paid with a credit card. This


23rd


NOTES finding showed that the Amount field was in the source

currency, not in U.S. dollars. Another query showed that

very few other transactions were in other currencies, and so

the Amount field was still used “as is.” The second high-

level overview was a periodic graph, shown in Figure 1-02.

Figure 1-02 shows the monthly totals for card purchases.

The monthly totals are shown in Figure 1-02. Because the

“$3,102,000” purchase was an abnormal event, this number

was excluded from this graph. The graph shows that

August and September had the largest transaction totals.

The entity’s fiscal year ends on September 30. The

August/September spike might be the result of employees

making sure that they’re spending money that is “in the

budget.” The average monthly total is $3 million. The two


23rd


NOTES spikes averaged $1.18 million, which is a significant

amount of money. An earlier example of abuse was a

cardholder buying unnecessary helmets in one fiscal year,

only to return them the next fiscal year and use the funds

for other purchases. The transactions for 2011 should be

reviewed for this type of scheme. In another card analysis,

a utility company found that it had excessive purchases in

December, right around the holiday season. This suggested

that cardholders might be buying personal items with their

corporate cards.

The data profile and the periodic graph are high-level tests

that are well-suited to purchasing cards. The high-level

overview could also include a comparative analysis of the

descriptive statistics, which would require data for two

consecutive years.

The Benford’s Law tests work well on card transactions. It

would seem that the upper limit of $2,500 on card

purchases would make the test invalid, but this isn’t the

case because most of the purchases are below $1,000, and

the $1,000-plus strata is dwarfed by the under $1,000

purchases. Also, the $2,500 limit can be breached if the

purchase is authorized. The purchase might also be in

another currency, and the analysis can be run on the

“transaction currency” as opposed to the amounts after

converting to U.S. dollars. The first-order test results are

shown in Figure 1-03.


23rd


NOTES

Figure 1-03 shows the results of (a) all the card purchases.

The first-order results in the first panel of Figure 1-03 show

a large spike at 36. A review of the number duplication

results (by peeking ahead) shows a count of 5,903 amounts

in the $3.60 to $3.69 range. These transactions were almost

all for FedEx charges, and it seems that FedEx was used as

the default mail carrier for all documents larger than a

standard first-class envelope. While this was presumably

not fraud, it might be wasteful because U.S. Postal Service

first-class mail is cheaper for small documents. It is also

noteworthy that a government agency would prefer a

private carrier over the USPS. Another test was run on all

purchases of $10 and higher, and the results are in the

second panel of Figure 1-03.


23rd


NOTES

Figure 1-03 also shows the results of (b) card purchases

that are $10 and higher.

The first-order test in the second panel in Figure 1-03

shows a reasonably good fit to Benford’s Law. There is,

however, a large spike at 24, which is the largest spike on

the graph. Also, there is a relatively large spike at 99, in

that the actual proportion is about double the expected

proportion. The spike at 24 is there because card users are

buying with great gusto for amounts that are just less than

the maximum allowed for the card. The first-order test

allows us to conclude that there are excessive purchases in

this range because we can compare the actual amount to an

expected proportion. The number duplication test will look

at the 24 purchases in some more detail. The 99 purchases

showed many payments for seminars delivered over the


23rd


NOTES Internet (i.e., webinars), and it seemed reasonable that the

seminars would be priced just below a psychological

boundary. There were also purchases of computer and

electronic goods priced at just under $100. This pricing

pattern is normal for the computer and electronics industry.

The purchases also included a payment to a camera store

for $999.95. This might be an abusive purchase. The

procurement rules stated that a lower priced good should be

purchased when it will perform essentially the same task as

an expensive item. The camera purchase was made in

August, which was in the two-month window preceding the

end of the fiscal year.

The summation gives all the amounts with first-two digits

10, 11, 12, … 99. The test identifies amounts with the same

first-two digits that are large relative to the rest of the

population. The results so far have highlighted the large

$3.102 million purchase, and the fact that there is an excess

of transactions just below the $2,500 threshold. The

summation graph is shown in Figure 1-04.


23rd


NOTES

Figure 1-04 shows the results of the summation test applied

to the card data.

The summation test in Figure 1-04 shows that there is a

single record, or a group of records with the same first-two

digits, that are large when compared to the other numbers.

The spike is at 31. An access query was used to select all

31 records and to sort the results by Amount descending.

The query identified the 3,102,000 pesos transaction.

The summation test was run on the Amounts greater than or

equal to $10. The summation test could be run on all the

positive amounts in a dataset. The expected sum for each

digit combination was $433,077 ($38,976,906/90). The 25

sum is $2.456 million. The difference is about $2 million.

The drill-down query showed that there were eight


23rd


NOTES transactions for about $24,500, and about 850 transactions

for about $2,450, each summing to about $2,250,000.

There is a large group of numbers that are relatively large

and have first-two digits of 25 in common. So, not only is

the spike on the first-order graph significant, but these

transactions are also larger than expected with respect to

their magnitudes.

The last-two digits test is usually only run as a test for

number invention. The number invention tests are usually

not run on accounts payable data or other types of

payments data because any odd last-two digits results will

be noticeable from the number duplication test. For

purchase amounts this test will usually simply show that

many numbers end with 00. This should also be evident

from the number duplication test. The results are shown in

Figure 1-05.


23rd


NOTES

Figure 1-05 shows the results of the last-two digits test

applied to the card data.

The result of the last-two digits test is shown in Figure 1-

05. There is a large spike at 00, which is as expected. The

00 occurs in amounts such as $10.00 or $25.00. An

interesting finding is the spike at 95. This was the result of

2,600 transactions with the cents amounts equal to 95 cents,

as in $99.95.

The last-two digits test was run on the numbers equal to or

larger than $10. If the test was run on all the amounts, there

would have been large spikes at 62 and 67 from the FedEx

charges for $3.62 and $3.67. The large spike in Panel A of

Figure 1-06 was for amounts of $3.62 and $3.67, which

have last-two digits of 62 and 67, respectively. The 62 and

67 spikes would not be there because of fraud, but rather


23rd


NOTES because of the abnormal duplications of one specific type

of transaction.

The second-order test looks at the relationships and patterns

found in data and is based on the digits of the differences

between amounts that have been sorted from smallest to

largest (ordered). These digit patterns are expected to

closely approximate the expected frequencies of Benford’s

Law. The second-order test gives few, if any, false

positives in that if the results are not as expected (close to

Benford’s Law), then the data does have some

characteristic that is rare and unusual, abnormal, or

irregular. The second-order results are shown in Figure 1-

06.


23rd


NOTES Figure 1-06 shows the second-order results of the card

purchases amounts.

The result of the second-order test is shown in Figure 1-06.

The graph has a series of prime spikes (10, 20, … 90) that

have a Benford-like pattern and a second series of minor

spikes (11–19, 21–29, …) that follow another Benford-like

pattern. The prime spikes are relatively large. These results

are as expected for a large dataset with numbers that are

tightly clustered in a small ($1 to $2,500) range. The

second-order test doesn’t indicate any anomaly here, and

this test usually doesn’t indicate any anomaly except in

rare, highly anomalous situations.

The number duplication test analyzes the frequencies of the

numbers in a dataset. This test indicates which numbers

were causing the spikes in the first-order test. This test has

had good results when run against bank account numbers,

and has also been used with varying levels of success on

inventory counts, temperature readings, health care claims,

airline ticket refunds, airline flight liquor sales, electricity

meter readings, and election counts. The results are shown

in Figure 1-07.


23rd


NOTES

Figure 1-07 shows the results of the number duplication

test.

The number duplication results in Figure 1-07 show four

amounts, all below $4.00, in the first four positions.

Another query showed that 99.9 percent of these amounts

were for FedEx charges. While the charges might be

wasteful, they were presumably not fraudulent. A second

number duplication test was run on the numbers below

$2,500. This would give some indication as to how

“creative” the cardholders were when trying to stay at or

below the $2,500 maximum allowed. Purchases could

exceed $2,500 if authorized. The “just below $2,500” table

is shown in Figure 1-08.


23rd


NOTES

Figure 1-08 shows the purchase amounts in the $2,495 to

$2,500 range.

The $2,495 to $2,500 transactions in Figure 1-10 show

some interesting patterns. The large count of “at the

money” purchases of $2,500 shows that this number has

some real financial implications. Either suppliers are

marginally reducing their prices so that the bill can be paid

easily and quickly, or some other factors are at play.


23rd


NOTES Another possible reason is that cardholders are splitting

their purchases and the excessive count of $2,500

transactions includes partial payments for other larger

purchases. Card transaction audits should select the $2,500

transactions for a close perusal. Also of interest in Figure 1-

08 is the set of five transactions for exactly $2,499.99 and

the 42 transactions for exactly $2,499. There are also 21

other transactions in the $2,499.04 to $2,499.97 range. It is

surprising that people think that they are the only ones that

are capable of gaming the system. The review of the eight

transactions of $2,497.04 showed that these were all items

purchased from GSA Global Supply, a purchasing program

administered by the General Services Administration. It

seems that even the federal government itself takes the card

limit into account when setting prices.

Conclusions

The Nigrini Cycle is the start of the suite of risk-based

auditing tests designed to detect fraud, and to test both the

effectiveness of controls and transaction accuracy. The use

of audit techniques that cover the entire population reduces

detection risk and increases the chances of finding unusual

transactions.

Recent applications of Benford’s Law include:

Testing the accuracy of census data

The link to the Fibonacci sequence

Changing numbers to a different base

Tax evasion caused by the tax tables

Financial statement fraud

Authenticity of ledger balances

The accuracy of bankruptcy filing data

Identification of Ponzi schemes

Invention of charitable gift amounts


23rd


NOTES It is surprising that people are still surprised by the

discovery of fraud. The financial press and popular press

regularly report on only the largest cases. It seems that

when people are given the opportunity to commit fraud,

many do indeed commit the act. A few general comments

to take note of include:

Forensic analytics is only one part of the forensic

investigations process. An entire investigation cannot

be completed with the computer alone. The

investigation would usually include a review of paper

documents, interviews, reports and presentations, and

concluding actions.

It is best to collect and analyze the data at the start of

the investigation, and long before the suspect suspects

that an investigation is underway. In proactive fraud

detection, the data is automatically analyzed before the

suspect catches any wind of an investigation.

Incomplete and inaccurate data might give rise to

incorrect and incomplete insights. Data should be

checked for completeness and accuracy before being

analyzed.


23rd


NOTES References

Benford, F., “The Law of Anomalous Numbers,”

Proceedings of the American Philosophical Society 78

(1938), pages 551–572.

Berger, A., Bunimovich, L.A., and Hill, T.P., “One-

Dimensional Dynamical Systems and Benford’s Law,”

Transactions of the American Mathematical Society 357 (1)

(2005), pages 197–219.

Berger, A., and Hill, T.P., “Newton’s Method Obeys

Benford’s Law,” American Mathematical Monthly 114,

(August/September 2007), pages 588–601.

Durtschi, C., Hillison, W., and Pacini, C., “The Effective

Use of Benford’s Law in Detecting Fraud in Accounting

Data,” Journal of Forensic Accounting 5 (2004), pages 17–

33.

Hoyle, D., M. Rattray, R. Jupp, and Brass, A., “Making

Sense of Microarray Data Distributions,” Bioinformatics 18

(4) (2002), pages 576–584.

Jang, D., Kang, J., Kruckman, A., Kudo, J., and Miller,

S.J., “Chains of Distributions, Hierarchical Bayesian

Models, and Benford’s Law,” Journal of Algebra, Number

Theory: Advances and Applications: Forthcoming (2009).

Kontorovich, A.V., and Miller, S.J., “Benford’s Law,

Values of L-Functions, and the 3x+1 Problem,” Acta

Arithmetica 120 (2005), pages 269–297.

Miller, S. J., and Nigrini, M.J., “The Modulo 1 Central

Limit Theorem and Benford’s Law for Products,”

International Journal of Algebra 2 (3) (2008), pages 119–

130.


23rd


NOTES Nigrini, M.J., Forensic Analytics: Methods and Techniques

for Forensic Accounting Investigations (New Jersey: John

Wiley & Sons, 2011).

Nigrini, M.J., Benford’s Law: Applications for Forensic

Accounting, Auditing, and Fraud Detection (New Jersey:

John Wiley & Sons, 2011).

Nigrini, M.J., and Miller, S.J., “Benford’s Law Applied to

Hydrology Data—Results and Relevance to Other

Geophysical Data,” Mathematical Geology 39 (5) (2007),

pages 469–490.

Raimi, R., “The First Digit Problem,” American

Mathematical Monthly 83 (August/September 1976), pages

521–538.

TECHNOLOGY YOU CAN USE AGAINST THOSE ... - Fraud · PDF fileWHO USE TECHNOLOGY BENFORD’S...

Documents

Transcript of TECHNOLOGY YOU CAN USE AGAINST THOSE ... - Fraud · PDF fileWHO USE TECHNOLOGY BENFORD’S...