Post on 19-Jul-2015
How to Lie with Statistics
by
Darrell Huff
Summery by Chapters (1-5)
H.G. Wells once remarked, "Statistical thinking will one day be as necessary for
efficient citizenship as the ability to read and write." Wells' observation might
have emanated from the significance and the consequent prevalence that
Statistics and its offshoots had started to gain in the early nineteenth century.
Where some people were enamored of Statistical thinking, there were some who
always looked at the published numbers and painfully driven correlations with a
skeptical eye. Darrell Huff, in 1954, decided to equip the ordinary folks with handy
information about statistical pitfalls. 'How to lie with Statistics' is Huff's compass
to a lay man to help him keep number-wielding charlatans at bay. In Huff's own
words, the crooks already know these tricks; honest men must learn them in self-
defense.
Chapter 1
The sample with a built-in bias: the origin of the statistics problems –
the sample. Any statistic is based on some sample (because the whole population
can’t be tested) and every sample has some sort of bias, even if the person
wanting the statistic tries hard to not create any. The built-in bias comes from the
respondents not replying honestly, the market researcher picking a sample that
gives better numbers, personal biases based on the respondent’s perception of
the market researcher, data not being available at a certain past time are a few of
the biases that creep in when building a statistic. Anything that is a nice round
number or very specific is unlikely to be scientifically accurate. Those who use
those precise figures haven’t done an appropriate sample, and they create bad
samples in all kinds of ways. If the sample is large enough and selected properly,
it’ll represent the whole better, if the sample is too small or the creator too
biased, the conclusion will be false but appear scientific. Assume that there is
always a bias in the sample.
Chapter 2 The well-chosen average: the word “average” has a loose meaning. People
use averages to trick and influence public opinion or sell products. Not many
people know that there are 3 averages,
Arithmetic Mean – sum of quantities / number of quantities
Median – the middle point of the data which separates the data, the midpoint
when data is sorted
Mode – the data point that occurs the most in a given set of data
Some average fall so close together that it isn’t vital to distinguish among them,
but the mode average is the most revealing because it shows the most common
occurrence in a data set.
Chapter 3
The little figures that are not there: This chapter is about how the sample
data is picked up in a way to prove the results – People usually make inadequate
samples. And instead of creating an honest headline, they omit the size of their
sample. But picking the sample data right can mean picking a sample size that
gives the kind of results we are looking for or a smaller number of trials. Huff,
warns us of the data that is missing from the sample. He said that the law of
average is useful for descriptions and predictions. The size of sample depends on
how large the population is and how varied the population is. Sometimes the
number of samples can be deceptive. To avoid being fooled, we should figure out
the significance. When important figures are missing from an average or graph, it
wouldn’t be trustable. If the creator doesn’t explain the numbers, the range or
shows any data that deviates than they are playing dirty.
Chapter 4
Much ado about practically nothing: In here, Huff stated that any product
of sampling method will have statistical error. Sample can be taken to represent
to whole field of what is a measured and that can be represented by the figures.
There are two measures for measuring error – Probable Error and Standard Error.
The probable error measures the error in the measurement based on how much
off is your measurement device. For example, if we are using a measuring scale
that is 3 inches off a foot, then our measurement across trials is +/- 3. This kind of
difference becomes important when there is business decisions taken based on a
positive or negative result. Most statisticians use the standard error which takes
in about two-thirds of the cases. We can only calculate the standard error by
knowing the sample’s size. Sometimes people make a big ado about a difference
that is demonstrable but tiny and unimportant.
Chapter 5 The Gee-Whiz graph: This one is something that we see quite often. How to
manipulate a graph so that it shows an inflated / deflated picture (based on what
we are plotting on the graph). Some tricks include – miss out the measure of the
axis, don’t label the axis leaving only numbers and hence letting the reader make
his/her own assumptions.