Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned...

56
Ex St 801 Statistical Methods Introducti on

Transcript of Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned...

Page 1: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Ex St 801Statistical Methods

Introduction

Page 2: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Basic Definitions

STATISTICS: Area of science

concerned with extraction of

information from numerical data

and its use in making inference

about a population from data that

are obtained from a sample.

Page 3: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Basic Definitions (cont.)

POPULATION: set representing all measurements of interest to the investigator.

PARAMETER: an unknown

population characteristic of

interest to the investigator.

Page 4: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Basic Definitions (cont.)

SAMPLE: subset of measurements

selected from the population of

interest.

STATISTIC: a sample characteristic

of interest to the investigator.

Page 5: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Some Frequently Used Statistics and Parameters

SAMPLE POPULATION

MEAN y

VARIANCE s2

STANDARDDEVIATION

s

PROPORTION

Page 6: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Basic Definitions (cont.)

STATISTICAL INFERENCE :

making an "INFORMED GUESS" about

a parameter based on a statistic.

(This is the main objective of statistics.)

Page 7: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

STATISTICAL INFERENCE

GATHER DATA

MAKE INFERENCES

POPULATION SAMPLE

PARAMETERS SAMPLE STATISTICS

etc. ,ˆ s, ,s , 2 y .,,,, 2 etc

Page 8: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

More Basic Definitions

• A VARIABLE is a characteristic of an individual or object that may vary for different observations.

• A QUANTITATIVE VARIABLE measures a variable scale.

• A QUALITATIVE VARIABLE categorizes the values of the variable.

Page 9: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

RAISIN BRAN EXAMPLE

• A cereal company claims that the average amount of raisins in its boxes of raisin bran is two scoops.

• A random sample of five boxes was taken off the production line, and an analysis revealed an average of 1.9 scoops per box.

Page 10: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Components of the Problem

• Identify the population

• Identify the sample

• Identify the symbol for the parameter

• Identify the symbol for the statistic

• Is the variable quantitative or qualitative?

Page 11: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

ASPIRIN AND HEART ATTACKS 1

• Twenty thousand doctors participated in a study to determine if taking an aspirin every other day would result in a reduction of heart attacks.

Page 12: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

ASPIRIN AND HEART ATTACKS 2

• The physicians were randomly divided into two groups. The first group (called the treatment group) received an aspirin every other day, while the other group (called the control group) received a placebo.

Page 13: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

ASPIRIN AND HEART ATTACKS 3

• At the end of the study, there had been 104 heart attacks in the treatment group and 189 heart attacks in the control group.

Page 14: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Identifying Components of the Problem

• Identify the population

• Identify the sample

• Identify the symbol for the parameter

• Identify the symbol for the statistic

• Is the variable quantitative or qualitative?

Page 15: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Five Steps in a Statistical Study:

1. Stating the problem

2. Gathering the data

3. Summarizing the data

4. Analyzing the data

5. Reporting the results

Page 16: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Stating the Problem

• Specifically identifying the population to be sampled

• Identifying the parameter(s) being studied

Page 17: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Stating the Problem Example

• A researcher wanted to determine if a vitamin supplement would reduce the rate of certain cancers.

• A large study was conducted in China and the results indicated that people who had the vitamin supplement had a significantly lower cancer rate.

• Do the results of this study apply to Americans? Why or why not?

Page 18: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Gathering the Data

• SURVEYS

–Random Sampling

–Stratified Sampling

–Cluster Sampling

–Systematic sampling

Page 19: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Gathering the Data

• EXPERIMENTS

–Completely Randomized Design

–Randomized Block Design

–Factorial Design

Page 20: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

More Definitions

DESCRIPTIVE STATISTICS:

Organizing and describing sample

information.

(Descriptive Statistics describe how things are.)

Page 21: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Graphical Displays for Qualitative Data

• PIE CHART• BAR CHART

Page 22: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Major Volcanoes in the World

30%

13%

11%

35%

3%

8%Africa

Antarctica

Asia

Europe

North America

South America

Page 23: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Major Volcanoes in the World

0 10 20 30 40 50

Africa

Antarctica

Asia

Europe

North America

South America

Page 24: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Graphical Displays for Quantitative Data

• HISTOGRAM• STEM AND LEAF DISPLAY

Page 25: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Histogram of Major Volcanoes in the World

0

5

10

15

20

25

30

2500 5000 7500 10000 12500 15000 17500 20000

Elevation

Fre

qu

en

cy

Page 26: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Life Expectancies in 33 Developed Nations

CountryLifeExpectancy Country

LifeExpectancy

Austrialia 76.3 Italy 75.5Austria 75.1 Japan 79.1Belgium 74.3 Luxembourg 74.1Britain 75.3 Malta 74.8Bulgaria 71.5 The

Netherlands76.5

Canada 76.5 New Zealand 74.2Czechoslovakia 71.0 Norway 76.3Demark 74.9 Poland 71.0East Germany 73.2 Portugal 74.1West Germany 75.8 Rumania 69.9Finland 74.8 Soviet Union 69.8France 75.9 Spain 76.6Greece 76.5 Sweden 77.1Hungary 69.7 Switzerland 77.6Iceland 77.4 United States 75.0Ireland 73.5 Yugoslavia 71.0Israel 75.2

Page 27: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Histogram of Life Expectanciesin 33 Developed Nations

0

1

2

3

4

5

6

7

8

9

10

71.20 72.80 74.40 76.00 77.60 79.20

Life Expectancy

Fre

qu

ency

Page 28: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Stem-Leaf Display for Elevation

KEY:UNIT = 1000

1 | 2 REPRESENTS

12000

STEM LEAF

0 001111

0 222333

0 444444444455555555

0 6666667777777

0 8888888999999999999

1 0000000000000111111

1 22222222333333

1 44555

1 67777

1 8889999

Page 29: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Construction of a Stem-Leaf Display

• List the stem values, in order, in a vertical column

• Draw a vertical line to the right of the stem values

• For each observation, record the leaf portion of the observation in the row corresponding to the appropriate stem

• Reorder the leaves from the lowest to highest within each stem row

Page 30: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Construction of a Stem-Leaf Display (cont.)

• If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary).

• Provide a key to your stem and leaf coding, so the reader can reconstruct the actual measurements.

Page 31: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Numerical Measures for Summarizing Data

TYPES:1. Measures of CENTRAL TENDENCY2. Measures of VARIABILITY3. Measures OF RELATIVE LOCATION

Page 32: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

The Arithmetic Mean

The ARITHMETIC MEAN of a set of n

measurements (y1, y2, ..., yn ) is equal to

the sum of the measurements divided by

n.

Page 33: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

n

y

y

n

i

i 1

The mathematical notation for the

ARITHMETIC MEAN is:

Page 34: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

The Median

The MEDIAN of a set of n

measurements (y1, y2, ..., yn ) is the

value that falls in the middle position

when the measurements are ordered

from the smallest to the largest.

Page 35: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

RULE FOR CALCULATINGTHE MEDIAN

1 Order the measurements from the

smallest to the largest.

2 A) If the sample size is odd, the

median is the middle

measurement.

B) If the sample size is even, the

median is the average of the two

middle measurements.

Page 36: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Example

A random sample of six values weretaken from a population. These values were:

y1=7, y2=1, y3=10, y4=8, y5=4, and y6=12.

What are the sample mean andsample median for these data?

Page 37: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Sample Mean

n

yyyyyyy 654321

Page 38: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

CALCULATIONS FOR THE SAMPLE MEDIAN

( Ordered Sample)

MEDIAN = ( 7 + 8 ) / 2 = 7.5

y2=1, y5=5, y1=7, y4=8, y3=10, y6=12

Page 39: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Consider the following sample: 4 18 36 39 41 42 43 44 44

45

46 47 48 49 49 50 51 53 54 60

Which measure of central tendency best describes the central location of the data:

THE SAMPLE MEAN OR SAMPLE MEDIAN?

Page 40: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

STEM LEAF 0 4 0 1 1 8 2 2 3 3 69 4 12344 4 567899 5 0134 5 6 0

Page 41: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

MEASUREMENTS OF VARIABILITY

• RANGE• VARIANCE• STANDARD DEVIATION

Page 42: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

DeviationThe DEVIATION of an observation yi from the sample mean is equal to:

Deviations to the left of the sample mean are negative and deviations to the right of the sample mean are positive.

Also, notice that the larger the squared deviation, the further away the observation is from the mean.

)( yyi

Page 43: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Formula for theSample Variance

11

2

1

1

2

1

2

2

n

n

y

y

n

yy

S

n

iin

ii

n

ii

Page 44: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Obs.

1 7 0 0

2 1 -6 36

3 10 3 9

4 8 1 1

5 4 -3 9

6 12 5 25

80

1 7 49

2 1 1

3 10 100

4 8 64

5 4 16

6 12 144

42 374

Y (Y-Y) (Y-Y)2 Obs. Y Y2

y 7

Page 45: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Calculation of Sample Variance

1616

56

42374

5

80

11

2

2

1

1

2

21

2

2

n

n

y

y

Sn

yy

S

n

iin

ii

n

ii

Page 46: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

THE EMPIRICAL RULE

Given a large set of measurements

possessing a mound-shaped histogram, then

• the interval contains approximately 68% of the measurements.

• the interval contains approximately 95% of the measurements.

• the interval contains approximately 99.7% of the measurements.

y s

y 2s

y 3s

Page 47: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Percent of Observations Included between Certain Values of the Standard Deviation

-4 -3 -2 -1 0 1 2 3 4s s s s s s s s

68%

95%

99.7%

Page 48: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Major Volcanoes in the World

Emprical RuleInterval

Pecentage ofObservations Expected to

Fall within the Inteval

Actual Percentage ofObservations Foundwithin the Interval

4912 to 14058 68% 66.6%

339 to 18630 95% 95.7%

-4232 to 23202 99.7% 100%

Page 49: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

TWO MEASURES OF RELATIVE STANDING

• Percentile• Quartile

Page 50: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

The Pth Percentile is the value Xp such that p% of the measurements will fall below that value and (100-p)% of the measurements will fall above that value.

p% (100-p)%

Xp

Page 51: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q1, the second by Q2, and the third (Upper Quartile) by Q3.

Q1 Q2 Q3

25% 25%25%25%

Page 52: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Box and Whisker Plot Life Expectancies in 33 Developed

Nations

Life Expectancy

68

70

72

74

76

78

80

Page 53: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Calculating Fence Values

Lower Inner Fence: Q1 - 1.5 (IQR)

Upper Inner Fence: Q3 + 1.5 (IQR)

Lower Outer Fence: Q1 - 3 (IQR)

Upper Outer Fence: Q3 + 3 (IQR)

Page 54: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

EXAMPLE: Construct a Box-and-Whisker Plot for the elevations of volcanoes in Africa

1,650 5,981 7,745 9,281 10,023 11,400 12,198

13,451 19,340

Median = Q1 = Q2 = IQR =

Lower Inner Fence = Upper Inner Fence = Lower Outer Fence = Upper Outer Fence =

Page 55: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

BOX AND WHISKER PLOTMAJOR VOLCANOES IN AFRICA

Elevation0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Page 56: Ex St 801 Statistical Methods Introduction. Basic Definitions STATISTICS : Area of science concerned with extraction of information from numerical data.

Ex St 801Statistical Methods

The End