Torturing numbers - Descriptive Statistics for Growers (2013)

Post on 12-Apr-2017

172 views 0 download

Transcript of Torturing numbers - Descriptive Statistics for Growers (2013)

Torturing Numbers

Dr. Jason S.T. DeveauApplication Technology Specialist

OMAFRA, Simcoe Station

A Grower’s Guide to Descriptive Statistics

"If you torture the data long enough, it will confess" – Ronald Harry Coase, Economist

why do we need statistics?• Descriptive statistics are math tools we use to:

Describe data

Find trends in data against variation

Determine if a sample represents a population

Draw conclusions about data

describing data• In 1950, 25 university graduates were asked what

they earned in their first year of work

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

$2,000

$2,000

$10,000

$5,000

$2,000

$2,000

$5,000

$2,000

$3,700

$3,700

$3,700

$2,000

$2,000

$2,000

$2,000

$2,000

$2,000

• What do these data tell you?

describing data• Here is the same data ordered from greatest to

least and weighted to show how many times each value occurs in the data set

• Now what do the data tell you?

• What is the average income?

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

describing data• BEWARE! The reported ‘average’ might depend

on what you are meant to see. Which would you use on your taxes?

MEAN (arithmetic average)

MEDIAN (midpoint in range)

MODE (most frequent)

• So, to really understand the data set you need more than just the ‘average’

spread and variability• You need to know the spread of the data

• This histogram shows the ages of smart people that attend spray demos

• Is it typical for 90 year olds to attend spray demos?

spread and variability• When the mean and median are the same, you

have a special situation called a ‘normal’ curve

• On this symmetrical curve, the variability can be described using standard deviations (SD)

spread and variability• SD is a way to determine how far a data point is

from the mean• You can now

say that 90 year olds fall more than 2 SD from the mean, or that they make up less than 2.5% of the data set

spread and variability• If we collapse the whole data set to one bar, we

can show the mean with some measure of variability (std dev, std error, etc.)

• Without some indication of variability, you cannot effectively compare two data sets

spread and variability• Often, data sets are skewed. Here is the effect of a

new herbicide on quackgrass.

• Means and standard deviations don’t help here…

spread and variability

Min Q1 Median Q3 Max

• Perhaps the best way to describe any data set is with five numbers: Minimum, Q1, Median, Q3, Maximum. This helps when comparing data sets, and when there are oddities called outliers

25% 25% 25% 25%*

Outlie

r

a sample study

• Researchers want to know which of three fertilizers produce the highest wheat yield in kg/plot

a sample study• They design a study with three treatments and

five replications for each treatment

3 Treatments (Fertilizers 1, 2 and 3)

5 R

eplic

ates

a sample study

• Could a nearby forest or river be a confounding variable?

• Variables like soil type and other local influences may have unexpected impacts…

a sample study• This is why a good study is randomized, to

defeat potentially confounding variables

• Does the sample plot in our study represent all the wheat in all the world?

POPULATION SAMPLE

uncertainty• With all the unknown variables, there will always

be a degree of uncertainty that our sample represents the population

• That’s why the more samples we have, the more confident we are that our study represents the population

confidence

•Any confidence interval could be used, but 95% is often chosen

•This means that 95% of the time, you expect your data represents reality

•BEWARE reports with no confidence interval

two ways to present data

Fertilizer 1 Fertilizer 2 Fertilizer 364.8 56.5 65.860.5 53.8 73.263.4 59.4 59.548.2 61.1 66.355.5 58.8 70.2

• Tables are the preferred way to show data, but graphs paint a quick, easy and seductive picture

drawing conclusions• A presenter may want you to see a relationship

between two variables

• Fertilizer 3 appears to increase the average yield of wheat – but what kind of average is this? How big was the sample? Where is the indication of variability? Where is the confidence interval?

drawing conclusions• A presenter may want you to see a relationship

between two variables

• Fertilizer 3 appears to increase the average yield of wheat – but what kind of average is this? How big was the sample? Where is the indication of variability? Where is the confidence interval?

• Bad stats and bad experimental design may lead to bad conclusions

2 SD

drawing conclusions• Correlation does not imply causation

The more firemen fighting a fire, the bigger the fire is observed to be. Therefore more firemen cause an increase in the size of a fire.

• Often, a presenter wants to lead you to a conclusion. Newspapers, TV and online articles should be scrutinized!

• BEWARE:• “This is not a scientific poll…”• “These results may not be representative of

the population”• “…based on a list of those that responded”• “Data showed a trend but was not statistically

significant” (I’ve used this one!!!)

it’s all in how you show it

it’s all in how you show it• Pies are for eating, and possibly throwing…• It’s very hard to see differences• BEWARE CHARTJUNK!

it’s all in how you show it• Amusing graphics are nothing but distractions• Again, it’s very hard to see differences• BEWARE CHARTJUNK!

it’s all in how you show it• Here is the same population growth data

shown on two scales. Which would you use to demonstrate rapid growth?

• BEWARE tricky scales!

it’s all in how you show it

• BEWARE statements with no context. Here’s a made-up example, but it’s no worse than other ‘factoids’ I’ve encountered

Did you know that even speaking to someone that once sprayed pesticides DOUBLES your chance of getting cancer?!

• Your odds go from 0.000000001:1 to 0.000000002:1

conclusion

• We started by stating that descriptive statistics are tools

• Like any tool, stats can be misused (intentionally or unintentionally)

• Maintain a healthy scepticism and question charts, tables and conclusions where insufficient information is provided

Three statisticians were hunting when they came across a big buck. The first statistician fired, but missed by a meter to the left. The second statistician fired, but missed by a meter to the right.

The third statistician threw down his rifle and cheered “We got it!"

…one last joke

- The Cartoon Guide to Statistics (1993)- Larry Gonick and Woolcott Smith

references

- How to Lie with Statistics (1954)- Darrel Huff

Tom Wolf@nozzle_guy

Jason Deveau@spray_guy

Learn more about spraying

www.sprayers101.com