How to Lie with Statistics

Post on 19-Jul-2015

26 views 0 download

Transcript of How to Lie with Statistics

How to Lie with Statistics

by

Darrell Huff

Summery by Chapters (1-5)

H.G. Wells once remarked, "Statistical thinking will one day be as necessary for

efficient citizenship as the ability to read and write." Wells' observation might

have emanated from the significance and the consequent prevalence that

Statistics and its offshoots had started to gain in the early nineteenth century.

Where some people were enamored of Statistical thinking, there were some who

always looked at the published numbers and painfully driven correlations with a

skeptical eye. Darrell Huff, in 1954, decided to equip the ordinary folks with handy

information about statistical pitfalls. 'How to lie with Statistics' is Huff's compass

to a lay man to help him keep number-wielding charlatans at bay. In Huff's own

words, the crooks already know these tricks; honest men must learn them in self-

defense.

Chapter 1

The sample with a built-in bias: the origin of the statistics problems –

the sample. Any statistic is based on some sample (because the whole population

can’t be tested) and every sample has some sort of bias, even if the person

wanting the statistic tries hard to not create any. The built-in bias comes from the

respondents not replying honestly, the market researcher picking a sample that

gives better numbers, personal biases based on the respondent’s perception of

the market researcher, data not being available at a certain past time are a few of

the biases that creep in when building a statistic. Anything that is a nice round

number or very specific is unlikely to be scientifically accurate. Those who use

those precise figures haven’t done an appropriate sample, and they create bad

samples in all kinds of ways. If the sample is large enough and selected properly,

it’ll represent the whole better, if the sample is too small or the creator too

biased, the conclusion will be false but appear scientific. Assume that there is

always a bias in the sample.

Chapter 2 The well-chosen average: the word “average” has a loose meaning. People

use averages to trick and influence public opinion or sell products. Not many

people know that there are 3 averages,

Arithmetic Mean – sum of quantities / number of quantities

Median – the middle point of the data which separates the data, the midpoint

when data is sorted

Mode – the data point that occurs the most in a given set of data

Some average fall so close together that it isn’t vital to distinguish among them,

but the mode average is the most revealing because it shows the most common

occurrence in a data set.

Chapter 3

The little figures that are not there: This chapter is about how the sample

data is picked up in a way to prove the results – People usually make inadequate

samples. And instead of creating an honest headline, they omit the size of their

sample. But picking the sample data right can mean picking a sample size that

gives the kind of results we are looking for or a smaller number of trials. Huff,

warns us of the data that is missing from the sample. He said that the law of

average is useful for descriptions and predictions. The size of sample depends on

how large the population is and how varied the population is. Sometimes the

number of samples can be deceptive. To avoid being fooled, we should figure out

the significance. When important figures are missing from an average or graph, it

wouldn’t be trustable. If the creator doesn’t explain the numbers, the range or

shows any data that deviates than they are playing dirty.

Chapter 4

Much ado about practically nothing: In here, Huff stated that any product

of sampling method will have statistical error. Sample can be taken to represent

to whole field of what is a measured and that can be represented by the figures.

There are two measures for measuring error – Probable Error and Standard Error.

The probable error measures the error in the measurement based on how much

off is your measurement device. For example, if we are using a measuring scale

that is 3 inches off a foot, then our measurement across trials is +/- 3. This kind of

difference becomes important when there is business decisions taken based on a

positive or negative result. Most statisticians use the standard error which takes

in about two-thirds of the cases. We can only calculate the standard error by

knowing the sample’s size. Sometimes people make a big ado about a difference

that is demonstrable but tiny and unimportant.

Chapter 5 The Gee-Whiz graph: This one is something that we see quite often. How to

manipulate a graph so that it shows an inflated / deflated picture (based on what

we are plotting on the graph). Some tricks include – miss out the measure of the

axis, don’t label the axis leaving only numbers and hence letting the reader make

his/her own assumptions.