Detecting the dubious digits: Benford's law in forensic accounting

3
june2007 81 Detecting the dubious digits: Detecting the dubious digits: Benford’s law in forensic Benford’s law in forensic accounting accounting Kuldeep Kumar and Sukanto Bhattacharya introduce an unexpected quality in natural numbers that can help to detect faked data In 1881, an American mathematician by the name of Simon Newcomb noticed, rather to his surprise, that the first few pages of a book of log-tables were much dirtier and more dog-eared than the later pages. The first pages contained the logarithms of numbers beginning with 1, such as 15 792, 14 289, 19 621. The last pages held the logarithms of numbers like 95 712 and 98 124. The first half of the book—correspond- ing to numbers beginning 2, 3 and 4—was also dirtier than the second half, where the loga- rithms of numbers beginning with 6, 7 and 8 were found. Obviously people were using the first pages more than they were the last ones 1 . This flew in the face of common sense. Surely num- bers were equally and evenly distributed across the spectrum, so people should be looking them up equally evenly. The chance that a number picked at random starting with 1 and the chance of it starts with 9, or with any digit in between should be the same: one-ninth, or approximately 11%; but Newcomb found that this did not seem to be true. People were looking up the 1s far more frequently. Is it in the nature of things that numbers beginning with 1 are much more com- mon? Newcomb noted his bizarre observation, but got no further. Almost half a century after Newcomb’s dis- covery, Frank Benford, a mathematical physicist, happened to stumble upon the same phenom- enon while going through a large assemblage of numerical data from disparate sources. Un- like his predecessor, Benford vigorously pursued this phenomenon and published his findings in a number of scholastic papers. Thus the phenom- enon came to be known as “Benford’s law” and Newcomb’s name sank into obscurity. Benford collected numerical data on a wide variety of subjects including the area of rivers, population concentration, physical and math- ematical constants, birth and death rates, and even numbers which appeared in articles in the Readers’ Digest—all of which appeared to cor- roborate the phenomenon. More numbers began with 1 than with anything else. The probability the first digit was less than 5 was significantly greater than that it was greater than 5; and the probability of occurrence diminished logarithmi- cally from the digits 1 to 9. Having gone through a vast volume of data, Benford derived a loga- rithmic expression for the probability distribu- tion function and estimated the individual prob- abilities that each of the nine digits was the first significant digit in a randomly observed naturally occurring number as in Table 1 and Figure 1 2 . The defining characteristic of Benford’s law is that it is observable only in naturally occurring numbers—not in numbers that have been artifi- cially concocted. Thus, Benford’s law may be re- garded as a veritable signature of Nature—some- thing that cannot be replicated manually. The human brain assumes that natural numbers come distributed linearly. The human brain is wrong. Apart from mathematical nicety, this actually makes Benford’s law extremely useful in detect- ing fraudulent data, in finance or in science. Mark Nigrini has pioneered application of Benford’s law to cases of tax evasion and other types of financial fraud. One of his examples from the stock market is illustrated in Figure 2 3 . As- sume that, in the course of a bull run, a market index starts with 1000 and grows at 20% a year. For the next 15 years the market index values will be as shown in Table 2. As may be seen, 40% of the data starts with the digit 1. Although the observed relative fre- quencies do not correspond exactly to those predicted by Benford’s law, it is quite obvious that they are strongly biased towards the lower digits. In the context of financial fraud detection, the more an observed set of accounting data de- The St Lawrence river: 1900 miles and 25th longest river in the world. River lengths obey Benford’s law

Transcript of Detecting the dubious digits: Benford's law in forensic accounting

june2007 81

D e t e c t i n g t h e d u b i o u s d i g i t s : D e t e c t i n g t h e d u b i o u s d i g i t s : B e n f o r d ’ s l a w i n f o r e n s i c B e n f o r d ’ s l a w i n f o r e n s i c a c c o u n t i n ga c c o u n t i n gKuldeep Kumar and Sukanto Bhattacharya introduce an unexpected quality in natural numbers that can help to

detect faked data

In 1881, an American mathematician by the name of Simon Newcomb noticed, rather to his surprise, that the fi rst few pages of a book of log-tables were much dirtier and more dog-eared than the later pages. The fi rst pages contained the logarithms of numbers beginning with 1, such as 15 792, 14 289, 19 621. The last pages held the logarithms of numbers like 95 712 and 98 124. The fi rst half of the book—correspond-ing to numbers beginning 2, 3 and 4—was also dirtier than the second half, where the loga-rithms of numbers beginning with 6, 7 and 8 were found. Obviously people were using the fi rst pages more than they were the last ones1. This fl ew in the face of common sense. Surely num-bers were equally and evenly distributed across the spectrum, so people should be looking them up equally evenly. The chance that a number picked at random starting with 1 and the chance of it starts with 9, or with any digit in between should be the same: one-ninth, or approximately 11%; but Newcomb found that this did not seem to be true. People were looking up the 1s far more frequently. Is it in the nature of things that numbers beginning with 1 are much more com-mon? Newcomb noted his bizarre observation, but got no further.

Almost half a century after Newcomb’s dis-covery, Frank Benford, a mathematical physicist, happened to stumble upon the same phenom-enon while going through a large assemblage of numerical data from disparate sources. Un-like his predecessor, Benford vigorously pursued this phenomenon and published his fi ndings in a number of scholastic papers. Thus the phenom-enon came to be known as “Benford’s law” and Newcomb’s name sank into obscurity.

Benford collected numerical data on a wide variety of subjects including the area of rivers, population concentration, physical and math-ematical constants, birth and death rates, and even numbers which appeared in articles in the Readers’ Digest—all of which appeared to cor-roborate the phenomenon. More numbers began

with 1 than with anything else. The probability the fi rst digit was less than 5 was signifi cantly greater than that it was greater than 5; and the probability of occurrence diminished logarithmi-cally from the digits 1 to 9. Having gone through a vast volume of data, Benford derived a loga-rithmic expression for the probability distribu-tion function and estimated the individual prob-abilities that each of the nine digits was the fi rst signifi cant digit in a randomly observed naturally occurring number as in Table 1 and Figure 12.

The defi ning characteristic of Benford’s law is that it is observable only in naturally occurring numbers—not in numbers that have been artifi -cially concocted. Thus, Benford’s law may be re-garded as a veritable signature of Nature—some-thing that cannot be replicated manually. The human brain assumes that natural numbers come distributed linearly. The human brain is wrong.

Apart from mathematical nicety, this actually makes Benford’s law extremely useful in detect-ing fraudulent data, in fi nance or in science.

Mark Nigrini has pioneered application of Benford’s law to cases of tax evasion and other types of fi nancial fraud. One of his examples from the stock market is illustrated in Figure 23. As-sume that, in the course of a bull run, a market index starts with 1000 and grows at 20% a year. For the next 15 years the market index values will be as shown in Table 2.

As may be seen, 40% of the data starts with the digit 1. Although the observed relative fre-quencies do not correspond exactly to those predicted by Benford’s law, it is quite obvious that they are strongly biased towards the lower digits.

In the context of fi nancial fraud detection, the more an observed set of accounting data de-

The St Lawrence river: 1900 miles and 25th longest river in the world. River lengths obey Benford’s law

4(2)_11 toolkit_BenfordsLaw.indd 81 10/05/2007 12:34:25

june200782

viates from the pattern predicted by Benford’s law, the higher is the chance that the data have been manipulated.

Interesting research work in fi nancial fraud detection has applied Benford’s law with soft computing technologies such as artifi cial neural networks to classify numerical data into manipu-lated and non-manipulated categories4. Another published research paper has sought to design a computational algorithm based on Benford’s law for fi nancial investigators and auditors with the aim of simplifying the audit sampling process.5 In terms of actual application, forensic account-ants have successfully used Benford’s law to track down shady transactions in one of the biggest fi nancial frauds of recent times, which ultimately led to the collapse of a major international bank with losses running into millions6.

According to Benford, whereas humans are accustomed to counting arithmetically as 1, 2, 3, 4 …, Nature counts geometrically as e0, ex, e2x, e3x, e4x … where e is the mathematical con-stant approximately equal to 2.718 that acts as the base of the natural logarithm. This leads to skewing of the fi rst-digit frequencies in naturally occurring numbers but not in numbers that have been made up. Moreover, Benford’s distribution displays the useful mathematical characteris-tic known as base invariance7. A base invariant distribution is unaffected by the form in which the data is expressed. In other words, the law holds irrespectively of whether the data are ex-pressed in binary, decimal, exponential or any other form. It also holds no matter what units the data are expressed in: it is as valid in miles as in kilometres as in rods, poles or perches. If

the lengths of the world’s hundred longest rivers are tabulated in kilometres, 47 of them begin with the digit 1; if they are tabulated in miles, 57 of them begin with the digit 1. Benford’s law is followed in both cases. He found that it also applied to atomic weights and to the house-numbers in street addresses. Both are essentially naturally occurring.

When it comes to fraudulent manipulation of accounts, an employee making up a series of false fi gures to boost his expenses will try to make them look “natural” by spreading the fi rst digits evenly between 0 and 9. His fi gures will not of course fi t Benford’s law, and so may be detected. Concocted numbers used to “balance the books” in more complex frauds are also es-sentially non-natural, because the ultimate ob-jective is to correct for the irregularities created by misappropriation of funds. Figures made up

to “plug the gaps” caused by fraud will result in possible differences in the observed fi rst-digit frequencies from those predicted by Benford’s law.

In fi nancial or in scientifi c data, failure to fi t Benford’s law does not necessarily imply fraud—it merely provides some statistical evidence that the data may have been manipulated. Benford’s law may be a guide but it can also mislead. Data that have been rounded up or down may not conform to Benford’s law. The data must range over at least one order of magnitude, and prefer-ably three or four. Thus, height data of patients, even if genuine, will not follow Benford’s law. All adult humans have heights which, in the metric system, begin with the digit 1 or 2. Humans 3 m tall, or 0.3 m tall, simply do not exist.

Benford’s law is a consequence of random-ness—so data that cluster around one value as, for example, in a normal distribution, will not necessarily obey it.

Benford’s law also fl ows from base invari-ance—the fact that the same degree of random-ness must exist no matter what units the data are expressed in. It does not apply to data which have no units—dimensionless numbers. Lottery draws, for example, do not follow Benford’s law. If winning numbers are selected truly at random from a range of 1 to 99, the winning tickets will be evenly distributed, with as many begin-ning with 9 as with 1. Similarly, random-number

Nigrini's stock market example

0.000.050.100.150.200.250.300.350.400.45

1 2 3 4 5 6 7 8 9

First significant digit of stock market index value

Obs

erve

d re

lativ

e fr

eque

ncy

Figure 2. Relative frequencies of fi rst digits in the hypothetical market index value for Nigrini’s stock market example

Table 2. Observed fi rst digit frequencies of the hypothetical market index value for Nigrini’s stock market example

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1200 1440 1728 2074 2488 2986 3583 4300 5160 6192 7430 8916 10700 12840 15408

Table 1. Probability distribution of the fi rst signifi cant digit in naturally occurring numbers as predicted by Benford’s law

1 2 3 4 5 6 7 8 9

0.301 0.176 0.125 0.097 0.079 0.067 0.058 0.051 0.046

00.10.20.30.4

1 2 3 4 5 6 7 8 9

First significant digit

Prob

abili

ty o

foc

curr

ence

Benford’s law is a signature of numbers that occur in nature.

4(2)_11 toolkit_BenfordsLaw.indd 82 10/05/2007 12:34:47

june2007 83

generators, whether electronic or consisting of numbered balls falling out of a container, do not follow Benford’s law.

Since the initial studies of the fi rst-digit phenomenon by Benford, statisticians and math-ematicians have been studying it at length and have established similar laws for second and subsequent digits. However, in terms of practi-cal use in fi nancial fraud detection to date, it is Benford’s original fi rst-digit law that clearly dominates the others.

In conclusion, Benford’s fi rst-digit law presents a counter-intuitive but nevertheless easy-to-implement data mining technique where the primary objective is to examine the authen-ticity or otherwise of a set of accounting data from an auditor or fi nancial investigator’s per-spective. A goodness-of-fi t test applied to an observed accounting data set in order to unearth statistical evidence of non-conformity with Ben-ford’s law cannot be considered conclusive, but it is one of several investigation tools that need to be utilised by a forensic accountant in detect-ing fraud.

References1. Newcomb, S. (1881) Note on the frequency

of use the different digits in natural numbers. American Journal of Mathematics, 4, 39–40.

2. Benford, F. (1938) The law of anomalous numbers. Proceedings of the American Philosophical Society, 78, 551–572.

3. Nigrini, M. (1999) I’ve got your number. Journal of Accountancy, 187, 79–83.

4. Busta, B. and Weinberg, R. (1998) Using Benford’s law and neural networks as a review procedure. Managerial Auditing Journal, 13, 356–366.

5. Kumar, K. and Bhattacharya, S. (2003) Benford’s Law and its application in fi nancial fraud detection. Advances in Financial Planning and Forecasting, 11, 57–70.

6. Dalal, C. (2000) Numbers that do not add

up. Business India, Jan., 10–23.7. Hill, T. P. (1998) The fi rst digit

phenomenon. American Science, 86, 358–363.

Kuldeep Kumar is at the Faculty of Business, Bond University, Australia. Sukanto Bhattacharya is at the Department of Business Administration, Alaska Pacifi c University, USA.

River Ganges: 1560 miles and 39th-longest river in the world

River Danube: 1771 miles and 29th-longest river in the world

4(2)_11 toolkit_BenfordsLaw.indd 83 10/05/2007 12:34:50