Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority...

26
Chapter 1 Chapter 1 Why Statistics?

Transcript of Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority...

Page 1: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Chapter 1Chapter 1

Why Statistics?

Page 2: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

2

Learning can result from:Learning can result from:Critical thinkingAsking an authorityReligious experience

However, collecting DATA is the surest However, collecting DATA is the surest way to learn about the worldway to learn about the world

Page 3: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

3

Data in the Sciences are messyData in the Sciences are messy

At first glance, data often look like an incoherent jumble of numbers

How do we make sense of data?

Statistical procedures are tools for Statistical procedures are tools for learning about the world by Learning learning about the world by Learning from Data.from Data.

Page 4: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

4

Real Data!Real Data!To help you understand the power and

usefulness of statistics, we will explore two real and interesting data sets

“The Smoking Study”“The Maternity Study”

Page 5: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

5

The Smoking StudyThe Smoking Study From the University of

Wisconsin Center for Tobacco Research and Intervention

608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking

The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

Page 6: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

6

The Maternity StudyThe Maternity Study From Wisconsin Maternity

Leave and Health Project

244 families provided data on marital satisfaction, child-rearing styles, and other household events

The full data set is

provided on the CD, a description of the data collected in provided in the appendices of the book

Page 7: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

7

VariabilityVariability Why are data messy? Consider a concrete example:

Depression scores (“CESD”) for participants in the Smoking Study

Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value

These data are messy in that the scores are different from one another

VariabilityVariability is the statistical term for the is the statistical term for the degree to which scores (such as the degree to which scores (such as the depression scores) differ from one depression scores) differ from one another.another.

Page 8: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

8

Sources of VariabilitySources of Variability It is easy to see that depression scores are

variable, by why?– Individual differences

Some people are more depressed than others Some people have difficulty reading the and

understanding the questions on the test Some people answer the questions more honestly than

others– Procedure

Differences in the ways the data were collected– Conditions or Treatments

The conditions that are imposed on the participants of the study

Page 9: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

9

Populations and SamplesPopulations and SamplesStatistical Population – a collection or Statistical Population – a collection or

set of measurements of a variable that set of measurements of a variable that share some common characteristicshare some common characteristic

Sample – a subset of measurements Sample – a subset of measurements from a populationfrom a population

Random sample – a sample selected Random sample – a sample selected such that every score in the population such that every score in the population has an equal chance of being includedhas an equal chance of being included

Page 10: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Chapter 2Chapter 2

Frequency Distributions and

Percentiles

Page 11: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Variability (revisited)Variability (revisited)Collecting Data means measuring a

variableThose measurements differ (vary) from

one anotherOne way to organize and summarize a

set of measurements is to construct a frequency distribution

These methods can be applied to both populations and samples

Page 12: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

ExampleExample

5 13 17 20 19 35 21 28 3 22

26 13 30 30 30 32 40 27 14 4

27 33 28 45 29 25 38 35 33 39

5 4 20 24 25 27 16 25 38 9

36 20 18 11 12 23 22 27 32 49

22 30 0 32 4 23 9 29 22 23

YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

Page 13: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

ExampleExample

0 3 4 4 4 5 5 9 9 10

11 13 13 14 16 17 18 19 20 20

20 21 22 22 22 22 23 23 23 24

25 25 25 26 27 27 27 27 28 28

29 29 30 30 30 30 32 32 32 33

33 35 35 36 38 38 39 40 45 49

YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

Page 14: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

A Better Summary?A Better Summary?

ClassInterval

FrequencyRelative

FrequencyCumulativeFrequency

CumulativeProportion

0 - 4 5 .083 5 .083

5 - 9 4 .067 9 .150

10 - 14 5 .083 14 .233

15 - 19 4 .067 18 .300

20 - 24 12 .200 30 .500

25 - 29 12 .200 42 .700

30 - 34 9 .150 51 .850

35 - 39 6 .100 57 .950

40 - 44 1 .017 58 .967

45 – 49 2 .033 60 1.00

Total (n) 60 1.000

YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

Page 15: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Graphing DistributionsGraphing Distributions

Page 16: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

PercentilesPercentilesWe have been focusing on distributions

rather than individual scoresSometimes, individual scores are of great

importanceComputing Percentiles, when n=608

The 50-th percentile is the “middle” score. It is the 304-th sorted score.

The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.

Page 17: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Percentile RankPercentile RankThe percentile rank of a score is the

percent (the proportion times 100) of the measurements in the distribution below that score value

Computing percentile rank for YRSMK:Sort the variable, called YRSMK_sorted The percentile rank of 9 is 50/608 = 0.082, so

it is the 8-th percentileThe percentile rank of 21 is 246/608 =

0.4046053, so it is the 40-th percentile

Page 18: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Graphing DistributionsGraphing DistributionsGraphing distributions is a very

valuable tool for highlighting features of the data– Shape– Range– Central Tendency– Variability

Page 19: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

ShapeShapeWe classify the shape of distributions

in three ways:– Symmetry – is one half a mirror image of

the other half?– Skew – are there high/low frequencies of

low/high scores?– Modality – how many humps or modes?

Page 20: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

SymmetrySymmetry Is one half of the distribution a mirror image of the

other (along a vertical axis)? Three examples of symmetrical distributions:

Page 21: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

SkewSkew Positive – high

frequencies of low values and low frequencies of high values

Negative – low frequencies of low values and high frequencies of high values

Page 22: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

ModalityModalityHow many humps (or modes)?

Unimodal Bimodal

Page 23: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Characterizing ShapeCharacterizing Shape

AsymmetricNegatively Skewed

Bimodal

AsymmetricPositively Skewed

Unimodal

Page 24: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Central Tendency and Central Tendency and VariabilityVariability In addition to shape, distributions differ

in terms of:– Central Tendency - scores near the center

of the distributions; where the scores “tend” to be

– Variability – the degree to which scores differ from one another; the “spread” of the scores

Page 25: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Comparing DistributionsComparing Distributions It is very useful to be able to compare

and contrast (name similarities and differences) of distributions

Distributions can differ in terms of shapes, central tendencies, and variability

Page 26: Chapter 1 Why Statistics?. 2 Learning can result from: Critical thinking Asking an authority Religious experience However, collecting DATA is the surest.

Comparing DistributionsComparing Distributions

How do these distributions differ?