3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central...

172
3.1 - 1 3.1 Measure of Center Calculate the mean for a given data set Find the median, and describe why the median is sometimes preferable to the mean Find the mode of a data set Describe how skewness affects these measures of center

Transcript of 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central...

Page 1: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 1

3.1 Measure of Center

Calculate the mean for a given data set

Find the median, and describe why the median is sometimes preferable to the mean

Find the mode of a data set

Describe how skewness affects these measures of center

Page 2: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 2

Measure of Center

Measure of Center

the value at the center or middle of a data set

The three common measures of center are the mean, the median, and the mode.

Page 3: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 3

Mean

the measure of center obtained by adding the data values and then dividing the total by the number of values

What most people call an average also called the arithmetic mean.

Page 4: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 4

Notation

Greek letter sigma used to denote the sum of a set of values.

x is the variable usually used to represent the data values.

n represents the number of data values in a sample.

Page 5: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 5

Example of summation

• If there are n data values that are denoted as:

Then:

nxxx ,,, 21

nxxxx 21

Page 6: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 6

Example of summation

• data

Then:

367

6462625348322521

x

21,25,32,48,53,62,62,64

Page 7: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 7

Sample Mean

x = n

x

is pronounced „x-bar‟ and denotes the mean of a set of sample values

x

Page 8: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 8

Example of Sample Mean

• data

Then:

21,25,32,48,53,62,62,64

9.45875.458

367

8

6462625348322521

x

Page 9: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 9

Notation

µ Greek letter mu used to denote the

population mean

N represents the number of data values in a population.

Page 10: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 10

Population Mean

N µ =

x

Note: here x represents the data values in the

population

Page 11: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 11

Advantages

Is relatively reliable: means of samples

drawn from the same population don‟t vary

as much as other measures of center

Takes every data value into account

Mean

Page 12: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 12

Mean

Disadvantage

Is sensitive to every data value, one

extreme value can affect it dramatically;

is not a resistant measure of center

Example:

21,25,32,48,53,62,62,64 → 9.45x

21,25,32,48,53,62,62,300 → 4.75x

Page 13: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 13

Median

Median

the measure of center which is the middle

value when the original data values are

arranged in order of increasing (or

decreasing) magnitude

Page 14: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 14

Finding the Median

1. If the number of data values is odd, the median is the number located in the exact middle of the list. Its position in the list is:

First sort the values (arrange them in order), the follow one of these

thn

2

1

Page 15: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 15

Finding the Median

2. If the number of data values is even, the median is found by computing the mean of the two middle numbers which are those that lie on either side of the data value in the position:

thn

2

1

Page 16: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 16

Example of Median

• 6 data values:

5.40 1.10 0.42 0.73 0.48 1.10

• Sorted data:

0.42 0.48 0.73 1.10 1.10 5.40

(even number of values – no exact middle)

915.02

1.173.0median

Page 17: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 17

Example of Median

• 7 data values:

5.40 1.10 0.42 0.73 0.48 1.10 0.66

73.0median

• Sorted data:

0.42 0.48 0.66 0.73 1.10 1.10 5.40

Page 18: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 18

Median

Median is not affected by an

extreme value - is a resistant

measure of the center

Example:

21,25,32,48,53,62,62,64

21,25,32,48,53,62,62,300

Median is 50.5 for both data sets.

Page 19: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 19

Median

From Example 3.3, page 91

Page 20: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 20

Mode

the value that occurs with the greatest

frequency

Data set can have one, more than one, or no mode

Page 21: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 21

Mode

Mode is the only measure of central

tendency that can be used with nominal data

Bimodal two data values occur with the same greatest frequency

Multimodal more than two data values occur with the same greatest frequency

No Mode no data value is repeated

Page 22: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 22

a) 5.40 1.10 0.42 0.73 0.48 1.10

b) 27 27 27 55 55 55 88 88 99

c) 1 2 3 6 7 8 9 10

Mode - Examples

Mode is 1.10

Bimodal - 27 & 55

No Mode

Page 23: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 23

• These data values represent weight gain or loss in kg for a random sample of18 college freshman (negative data values indicate weight loss)

11 3 0 -2 3 -2 -2 5 -2 7 2 4 1 8 1 0 -5 2

• Do these values support the legend that college students gain 15 pounds (6.8 kg) during their freshman year? Explain

Page 24: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 24

• Sample Mean

kg 9.118

34

n

x

• Median

-5 -2 -2 -2 -2 0 0 1 1 2 2 3 3 4 5 7 8 11

kg 5.12/)21(median

• Mode: -2

Page 25: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 25

• All of the measures of center are below 6.8 kg (15 pounds)

• Based on measures of center, these data values do not support the idea that college students gain 15 pounds (6.8 kg) during their freshman year

CONCLUSION

Page 26: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 26

• First, enter the list of data values

•Then select 2nd STAT (LIST) and arrow right to MATH option 3:mean( or 4: median(

•and input the desired list

Mean/Median with Graphing Calculator

Page 27: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 27

Example of Computing the Mean Using Calculator

Sorted amounts of Strontium-90 (in millibecquerels) in a simple random sample of baby teeth obtained from Philadelphia residents born after 1979

Note: this data is related to Three Mile Island nuclear power plant

Accident in 1979.

x = n

x = 149.2

Page 28: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 28

Example of Computing the Mean Using Calculator

Median is 150

Page 29: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 29

Symmetric distribution of data is symmetric if the

left half of its histogram is roughly a mirror image of its right half

Skewed

distribution of data is skewed if it is not symmetric and extends more to one side than the other

Skewed and Symmetric

Page 30: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 30

Skewed to the left

(also called negatively skewed) have a longer left tail, mean and median are to the left of the mode

Skewed to the right

(also called positively skewed) have a longer right tail, mean and median are to the right of the mode

Skewed Left or Right

Page 31: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 31

Distribution Skewed Left

• Mean is smaller than the median.

Page 32: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 32

Symmetric Distribution

• Mean, median, mode approximately equal.

Page 33: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 33

Distribution Skewed Right

• Mean is larger than the median.

Page 34: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 34

Example data set

5 5 5 5 5 10 10 10 10 10 10 15 15 15 15 15

20 20 20 20 25 25 25 30 30 30 35 35 40 45

• Mean:

• Median:

Distribution is skewed right.

18.730

560

n

xx

152

1515

Page 35: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 35

3.2 Measures of Variability

The range

What is a deviation?

The standard deviation and the variance.

Page 36: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 36

Why is it important to understand variation?

• A measure of the center by itself can be misleading

• Example:

Two nations with the same median family income are very different if one has extremes of wealth and poverty and the other has little variation among families (see the following table).

Page 37: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 37

Example of variation

Data Set A Data Set B

50,000 10,000

60,000 20,000

70,000 70,000

80,000 120,000

90,000 130,000

MEAN 70,000 70,000

MEDIAN 70,000 70,000

Data set B has more variation about the mean

Page 38: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 38

Histograms: example of variation

Data set B has more variation about the mean (Target).

Page 39: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 39

How do we quantify variation?

Page 40: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 40

Definition

The range of a set of data values is the difference between the maximum data value and the minimum data value.

Range = (maximum value) – (minimum value)

Page 41: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 41

Range = 30 - 6 = 24

Example of range.

Data:

27 28 25 6 27 30 26

Page 42: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 42

Range (cont.)

This shows that the range is very sensitive to extreme values; therefore not as useful as other measures of variation.

Ignoring the outlier of 6 in the previous data set gives data 27 28 25 27 30 26

Range = 30 - 25 = 5

Page 43: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 43

Deviation

The deviation for a given data value is the distance between the data value and the mean, except that the deviation can be negative while a distance is always positive.

Page 44: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 44

Deviation

A deviation for a given data value is the difference between the data value and the mean of the data set. If x is the data value, 1. For a sample, the deviation of x is

2. For a population, the deviation of x is

xx

x

Page 45: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 45

Deviation

The deviation can be positive, negative, or zero. 1. If the data value is larger than the mean, the

deviation will be positive.

2. If the data value is smaller than the mean, the deviation will be negative.

3. If the data value equals the mean, the deviation will be zero.

Page 46: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 46

Example:

• data

Mean

8,5,12,8,9,15,21,16,3

78.109

3162115981258

n

xx

Page 47: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 47

Data Value Deviation

8

5

12

8

9

15

21

16

3

78.278.108

78.578.105

78.278.108

78.178.109

78.778.103

22.578.1016

22.1078.1021

22.478.1015

22.178.1012

Page 48: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 48

Population Variance

N

x2

2

The population variance is the mean of the squared deviations in the population

Page 49: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 49

Population Standard Deviation

N

x2

The population standard deviation is the square root of the population variance.

Page 50: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 50

Sample Variance

1

2

2

n

xxs

The sample variance is

Page 51: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 51

Sample Variance

Note that the sample variance is only approximately the mean of the squared deviations in the sample because we use n-1 instead of n.

Page 52: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 52

Sample Variance

A statistic is an unbiased estimator of a parameter if its mean value equals the parameter it is trying to estimate.

Using n-1 instead of n makes the sample variance an unbiased estimator of the population variance.

Page 53: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 53

Sample Standard Deviation

1

2

n

xxs

The sample standard deviation is the square root of the sample variance.

Page 54: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 54

Steps to calculate the sample standard deviation

1. Calculate the sample mean

2. Find the squared deviations from the sample mean for each sample data value:

3. Add the squared deviations

4. Divide the sum in step 3 by n-1

5. Take the square root of the quotient in step 4

2)( xx

x

Page 55: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 55

Example: Standard Deviation

Given the data set:

8, 5, 12, 8, 9, 15, 21, 16, 3

Find the standard deviation

Page 56: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 56

Example: Standard Deviation

• Find the mean

78.109

3162115981258

n

xx

Page 57: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 57

Data Value Squared Deviations

From the Mean

8

5

12

8

9

15

21

16

3

73.7)78.108( 2

41.33)78.105( 2

73.7)78.108( 2

17.3)78.109( 2

53.60)78.103( 2

25.27)78.1016( 2

45.104)78.1021( 2

81.17)78.1015( 2

49.1)78.1012( 2

Page 58: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 58

Example: Standard Deviation

• Add the squared deviations (last column in the table above)

53.6025.2745.10481.1717.373.749.141.3373.7

57.263

Page 59: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 59

• Divide the sum by 9-1=8:

• Take the square root:

95.328/57.263

74.595.32

7.5s

Example: Standard Deviation

Page 60: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 60

Sample Standard Deviation (Computational Formula)

1

/22

n

nxxs

Page 61: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 61

Example: Standard Deviation

Data:

2 1 1 1 1 1 1 4 1 2 2 1 2 3

3 2 3 1 3 1 3 1 3 2 2

Determine the standard deviation

using the previous formula

Page 62: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 62

Example: Standard Deviation

• We need to find each the following:

n

)( 22 xx

x

Page 63: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 63

Data Table (25 data values)

TOTALS: 47 109

Page 64: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 64

Example: Standard Deviation

• Thus:

25n

109 )( 22 xx

47 x

Page 65: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 65

Example: Standard Deviation

• And:

24

25/47109

1

/ 222

n

nxxs

9.086.0

Page 66: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 66

Standard Deviation - Important Properties

The standard deviation is a measure of variation of all values from the mean.

The value of the standard deviation s is never negative and usually not zero.

The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others).

Unlike variance, the units of the standard deviation s are the same as the units of the original data values.

Page 67: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 67

Example: page 116

Page 68: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 68

Colony A

7361134 range

Colony B

9167158 range

ANSWER

Page 69: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 69

28(b) Which colony has the greater variability according to the range?

Example: page 116

ANSWER: colony B

Page 70: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 70

Use the previous example and calculate the standard deviation for each colony with a calculator.

Page 71: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 71

Data is stored in Lists. Locate and press

the STAT button on the

calculator. Choose EDIT. The calculator

will display the first three of six lists

(columns) for entering data. Simply type

your data and press ENTER. Use your

arrow keys to move between lists.

Page 72: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 72

• Enter STAT then arrow right to CALC to get

• then press ENTER

Page 73: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 73

Calculator Example

When 1-Var Stats appears on the home screen, enter the name of the list containing the data. You can do this by entering List (= 2nd STAT) and choosing which list has the desired data.

Page 74: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 74

• 1-Var Stats

• NOTE: Previous example data will give different values than these.

Page 75: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 75

Colony A

standard deviation = 21.9

Colony B

standard deviation = 26.4

ANSWER

Page 76: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 76

• Compare histograms (SPSS)

Page 77: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 77

3.3 Working with grouped data

• Calculate the weighted mean for a given data set

•Estimate the mean from grouped data

•Estimate the variance and standard deviation from grouped data

Page 78: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 78

Weighted Mean

When data values are assigned different weights, we can compute a weighted mean.

Data values:

Corresponding weights:

nxxxx ,..., , , 321

nwwww ,..., , , 321

Page 79: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 79

Computing the weighted mean

• Multiply each data value by its corresponding weight:

•Sum these products.

•Divide the result by the sum of the weights.

iixw

Page 80: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 80

Weighted Mean

n

nn

i

iiw

www

xwxwxw

w

xwx

21

2211

Page 81: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 81

Example: Weighted Mean

Suppose homework/quiz average is weighted 10%, 2 exams are weighted 60%, and final exam is weighted 30%.

If a student makes homework/quiz average 87, exam scores of 80 and 92, and final exam score 85, compute the weighted average.

Page 82: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 82

Example: Weighted Mean

ANSWER:

8.85

00.1

)85(30.0)92(30.0)80(30.0)87(10.0

Page 83: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 83

Example: Weighted Mean

Page 84: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 84

Employment Hourly Mean Wage

($)

(weight) x (data value)

12,380 60.32 746,761.60

18,580 60.25 1,119,445.00

9,540 59.39 566580.60

35,550 57.98 2,061,189.00

10,130 55.95 566773.50

Weights Data Values

180,86

70.749,060,5

i

iiw

w

xwx

= $58.72

Page 85: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 85

Estimating the mean from grouped data

• Given a frequency distribution, how do we compute the mean?

HEIGHT

(inches)

FREQUENCY

59-60 3

61-62 3

63-64 4

65-66 7

67-68 6

69-70 1

71-72 1

Heights of 25 Women:

Page 86: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 86

Estimating the mean from grouped data

• Assume all sample values are at the class midpoints

Page 87: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 87

Assume all sample values are at the class

midpoints. HEIGHT

(inches)

FREQUENCY

59-60 3

61-62 3

63-64 4

65-66 7

67-68 6

69-70 1

71-72 1

Class midpoints:

59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5

Estimating the mean from grouped data

Page 88: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 88

Estimating the mean from grouped data

• Multiply each class midpoint by its corresponding frequency

•Add the result

•Divide by the sum of the frequencies (total number of data values)

Page 89: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 89

Class midpoints:

59.5, 61.5, 63.5, 65.5, 67.5, 69.5, 71.5

Estimating the mean from grouped data

Frequencies:

3, 3, 4, 7, 6, 1, 1

25

)5.71(1)5.69(1)5.67(6)5.65(7)5.63(4)5.61(3)5.59(3

Estimated mean:

inches 9.6425/5.1621

Page 90: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 90

Estimating the mean from a frequency distribution: Class midpoint = Frequency =

•Note: is “mu-hat” where the hat denotes the fact the mu is not exact, but approximate.

n

nn

i

ii

fff

fmfmfm

f

fm

21

2211

im if

Page 91: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 91

Estimating the variance from a frequency distribution:

i

ii

f

fm

)ˆ(ˆ

22

Estimating the standard deviation from a frequency distribution:

i

ii

f

fm

)ˆ(ˆ

2

Page 92: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 92

HEIGHT

(inches)

Frequency Class

Midpoints

59-60 3 59.5 86.2

61-62 3 61.5 33.9

63-64 4 63.5 7.4

65-66 7 65.5 2.9

67-68 6 67.5 41.8

69-70 1 69.5 21.5

71-72 1 71.5 44.0

inches 9.64ˆ

ii fm 2)ˆ(

inches 1.325

7.237 )ˆ(ˆ

2

i

ii

f

fm

Page 93: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 93

3.4 Measures of Position

Find percentiles for small and large data sets

Calculate z-scores and explain why we use them

Use z-scores to detect outliers.

Page 94: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 94

Let p be any integer between 0 and

100. The pth percentile of a data set is a value for which p percent of the values in the data set are less than or equal to this value.

Percentile

Page 95: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 95

Sort the data from small to large. If you are finding the pth percentile of a

sample of size n, calculate:

which is p percent of n

Steps to find the pth percentile for small data sets

np

i

100

Page 96: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 96

If i is an integer, the pth percentile is the

mean of the data values in positions i and i+1. If i is not an integer, round up and use

the value in this position as the pth percentile.

Steps to find the pth percentile for small data sets (cont)

Page 97: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 97

Example of Finding Percentile

• Find the 25th and 75th percentiles of these 12 data values

36 37 38 39 44 44 47 50 53 57 65 69

Page 98: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 98

Example of Finding Percentile

25% of 12

75% of 12

312100

25

i

912100

75

i

Page 99: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 - 99

Example of Finding Percentile

• The data can be grouped as follows:

36 37 38 39 44 44 47 50 53 57 65 69

25% of the data is below 38.5 (the mean of 38 and 39).

The 25th percentile is 38.5

3rd position

Page 100: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

100

Example of Finding Percentile

• The data can be grouped as follows:

36 37 38 39 44 44 47 50 53 57 65 69

75% of the data is below 55 (the mean of 53 and 57).

The 75th percentile is 55

9th position

Page 101: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

101

Example of Finding Percentile

• Find the 25th and 75th percentiles of these 7 data values:

36 38 39 44 47 50 65

Page 102: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

102

Example of Finding Percentile

25% of 7

75% of 7

75.17100

25

i

25.57100

75

i

Page 103: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

103

Example of Finding Percentile

36 38 39 44 47 50 65

2nd position

The 25th percentile is 38

1.75 round up to position 2

Page 104: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

104

Example of Finding Percentile

36 38 39 44 47 50 65

6th position

The 75th percentile is 50

5.25 round up to position 6

Page 105: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

105

Example of Finding Percentile

Page 132

Page 106: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

106

Example of Finding Percentile

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

Note: 15 data values

Data:

Page 107: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

107

Example of Finding Percentile

16(a) To find position, 5% of 15

16(b) To find position, 95% of 15

1 toup rounded 75.015100

5

i

15 toup rounded 25.1415100

95

i

Page 108: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

108

Example of Finding Percentile

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

16(a) 5th percentile is 2.0 million

16(b) 95th percentile is 14.7 million

Data:

Page 109: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

109

z Score (or standardized value)

the number of standard deviations that a given value x is above or below the mean

Z score

Page 110: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

110

Sample Population

Z score Formulas

s

xxz

xz

Page 111: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

111

Interpreting Z Scores

1. A z-score has no units.

2. Whenever a value is greater than the mean, its z score is positive

3. Whenever a value is less than the mean, its z score is negative

Page 112: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

112

Example of Finding Z Score

Page 132

Page 113: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

113

Example of Finding Z score

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

Using calculator we get that

Data:

4.3 and 1.5 sx

Page 114: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

114

Example of Finding Z score

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

18(a) z-score for fish oil (data value is 4.2)

26.0

4.3

1.52.4

z

Page 115: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

115

Example of Finding Z score

2.0 2.1 2.4 2.8 3.1 3.5 3.8 4.2 4.3 4.4 5.2 7.1 7.7 8.8 14.7

19(a) z-score for Ginseng (data value is 8.8)

11.1

4.3

1.58.8

z

Page 116: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

116

Outliers

An outlier is an extreme data value.

We will define a data value as “extreme” if it is at least three standard deviations from the mean.

Page 117: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

117

Outliers and z Scores

Data values are not unusual (exteme) if

Data values are unusual or outliers if

22 z

3or 3 zz

Page 118: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

118

Interpreting Z Scores

page 131: bell shaped distribution

Page 119: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

119

3.5 Chebyshev‟s Rule and the Empirical Rule

Calculate percentages using Chebyshev‟s Rule

Find percentages and data values using the Empirical Rule

Page 120: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

120

Chebyshev‟s Rule

The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1–1/K2, where K is any positive number greater than 1.

For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.

For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.

Page 121: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

121

Example of Chebyshev‟s Rule

Page 139, problem 11(a)

A data distribution has a mean of 500 and a standard deviation of 100. Suppose we do not know whether the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.

Page 122: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

122

Example of Chebyshev‟s Rule

Page 139, problem 11(a) ANSWER: data values obey First compute k using given

skx 003

700300 x

2 700100500 003 kk

100 and 500 sx

700 skx

Page 123: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

123

Example of Chebyshev‟s Rule

Page 139, problem 11(a) ANSWER: Since k =2, And Chebyshev’s rule says that at least 75% of the data falls between 300 and 700.

4

3

4

11

2

11

11

22

k

Page 124: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

124

The Empirical Rule

For data sets having a distribution that is approximately bell shaped, the following properties apply:

About 68% of all values fall within 1 standard deviation of the mean.

About 95% of all values fall within 2 standard deviations of the mean.

About 99.7% of all values fall within 3 standard deviations of the mean.

Page 125: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

125

For data sets having a distribution that is approximately bell shaped, the following properties apply:

About 68% of all data values obey

About 95% of all data values obey

About 99.7% of all data values obey

11 z

22 z

33 z

The Empirical Rule

Page 126: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

126

The Empirical Rule

Page 127: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

127

The Empirical Rule

Page 128: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

128

The Empirical Rule

Page 129: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

129

The Empirical Rule

Explain these percentages.

page 136: bell shaped distribution

Page 130: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

130

The Empirical Rule

For the green regions: 1. 50% of the data lies to the left of z=0. 2. 34% (half of 68%) of the data lies between z=-1 and

z=0. Therefore, 16% (=50%-34%) of the data is to the left of z=-1.

3. 47.5% (half of 95%) of the data lies between z=-2 and z=0. Therefore, 2.5% (=50%-47.5%) of the data is to the left of z=-2.

4. Subtracting areas gives that 13.5%=16%-2.5% of the data lies between z=-2 and z=-1.

5. Using symmetry, 13.5% of the data also lies between

z=1 and z=2.

Page 131: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

131

Example of Empirical Rule

Page 139, problem 12(a) A data distribution has a mean of 500 and a standard deviation of 100. Assume that the distribution is bell-shaped. (a) Estimate the proportion of data that falls between 300 and 700.

Page 132: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

132

Page 139, problem 12(a) ANSWER: data values obey As in problem 11, we are given so that the data in the interval are within 2 standard deviations of the mean

700300 x

100 and 500 sx

Example of Empirical Rule

Page 133: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

133

Page 139, problem 12(a) ANSWER: Here we are also given that the distribution is bell-shaped. Using the empirical rule, approximately 95% of the data lies between 300 and 700.

Example of Empirical Rule

Page 134: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

134

Note the difference in problems 11 and 12: For problem 11 we are not told that the distributioin is bell-shaped and we can only say that “at least” 75% of data is between 300 and 700 (using Chebyshev’s Rule). We cannot use the Empirical Rule in problem 11.

Empirical Rule vs. Chebyshev‟s Rule

Page 135: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

135

Example

Page 141, problem 24(a) In San Francisco the mean and standard deviation of the wind speed in January is (Note these are population parameters) Assume that the distribution of the wind speed is bell-shaped. (a) Estimate the proportion of times that the wind speed is between 1.2 mph and 13.2 mph.

mph 7.2 andmph 2.7

Page 136: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

136

Let the variable x represent wind speed so that Note that the mean 7.2 is the midpoint of this interval. Calculate how many standard deviations from the mean this interval represents. Use the formula:

2.132.1 x

Example

deviation standard

mean valuenumerical k

Page 137: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

137

Example

83.07.2

7.2 1.2

k 83.0

7.2

7.2 13.2

k

so that the data in the interval are within 0.83 standard deviations of the mean.

ANSWER

The empirical rule implies that less than 68% of the time the windspeed is between 1.2 mph and 13.2 mph.

Note: convice yourself of this by sketching areas below the bell-shaped distribution.

Page 138: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

138

Example

Page 141, problem 24(b)

(b) Estimate the proportion of times that the wind speed is less than 1.2 mph.

Page 139: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

139

Example

Page 141, problem 24(b)

1. Since 0.0 mph is 1 standard deviation below the mean of 7.2 mph, the empirical rule implies that 34% (half of 68%) of the time, the windspeed is between 0.0 mph and 7.2 mph.

2. Subtracting 34% from 50% gives that 16% of the time the windspeed is less than 0.0 mph.

3. Since 1.2 mph is greater than 0.0 mph but less than 7.2 mph, we can say that at least 16% of the time but no more than 50% of the time the windspeed is less than 0.0 mph

Note: convice yourself of this by sketching areas below the bell-shaped

distribution.

Page 140: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

140

3.6 Robust Measures

• Find quartiles and the interquartile range

•Calculate the five number summary of a data set

•Construct a boxplot for a given data set

•Apply robust detection of outliers

Page 141: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

141

Quartiles

Q1 (First Quartile) is the 25th percentile

Q2 (Second Quartile) is the 50th percentile or the median

Q3 (Third Quartile) is the 75th percentile

Are measures of location, denoted Q1, Q2, and Q3, which divide a set of data into four groups with about 25% of the values in each group.

Page 142: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

142

Example of Quartiles

1 Q

• Given the 24 data values (sorted):

36 37 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 69 75

Find , ,

2 Q3 Q

Page 143: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

143

For first quartile (25th percentile), position:

therefore the first quartile is the mean of the data values in positions 6 and 7

624100

25

i

Example of Quartiles

0.422

4341

2 76

1

xx

Q

Page 144: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

144

For second quartile, (50th percentile), position:

therefore the second quartile is the mean of the data values in positions 12 and 13

1224100

50

i

5.532

5453

2 1312

2

xx

Q

Example of Quartiles

Page 145: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

145

For third quartile, (75th percentile), position:

therefore the third quartile is the mean of the data values in positions 18 and 19

1824100

75

i

0.602

6159

2 1918

3

xx

Q

Example of Quartiles

Page 146: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

146

Interquartile Range

The Interquartile Range (IQR) is the difference between the third quartile and the first quartile which measures the spread of the middle 50% of the data:

It is considered a “robust” measure of variability because it is not affected by outliers in the data (bottom 25% and top 25% of data are ignored).

13IQR QQ

Page 147: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

147

0.60 and 0.42 31 QQ

Example of IQR

• Given the 24 data values (sorted):

36 37 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 69 75

we found that

0.180.420.60IQR 13 QQ

Page 148: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

148

0.60 and 0.42 31 QQ

Example of IQR

• Introduce outliers into previous data set:

2 5 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 100 200

we still have:

0.180.420.60IQR 13 QQ

Page 149: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

149

For a set of data, the 5-number

summary consists of the

minimum value; the first quartile

Q1; the median (or second

quartile Q2); the third quartile,

Q3; and the maximum value.

5-Number Summary

Page 150: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

150

Example of Five Number Summary

Given Data (sorted):

128 130 133 137 138 142 142 144 147 149

151 151 151 155 156 161 163 163 166

Page 151: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

151

• Minimum data value is 128

• First quartile location

• round up to get

138 51 xQ

Example of Five Number Summary

75.419100

25

i

Page 152: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

152

• Second quartile location

• round up to get

149 102 xQ

Example of Five Number Summary

5.919100

50

i

Page 153: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

153

• Third quartile location

• round up to get

156 153 xQ

Example of Five Number Summary

25.1419100

75

i

Page 154: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

154

• Max data value is 166

• Five Number Summary

min max

128 138 149 156 166

Example of Five Number Summary

3 Q2 Q1 Q

Page 155: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

155

Using a five number summary,

a data value is an outlier if

1. It is located 1.5(IQR) or more

below the first quartile

2. It is located 1.5(IQR) or more

above the third quartile

Robust Detection of Outliers

Page 156: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

156

• Given the data set: 2 5 37 39 39 41 43 44 44 47 50 53

54 55 56 56 57 59 61 61 65 69 100 200

has a five number summary:

2 42.0 53.5 60.0 200

0.180.420.60IQR 13 QQ

Robust Detection of Outliers

Page 157: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

157

Calculate: 1.5(IQR)=27

1.5(IQR) below the first quartile:

42 - 27=15

1.5(IQR) above the third quartile:

60 + 27=87

Therefore, 2,5,100,200 are outliers

Robust Detection of Outliers

Page 158: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

158

A boxplot (or box-and-whisker-

diagram) is a graph of a data set

that consists of a line extending

from the minimum value to the

maximum value, and a box with

lines drawn at the first quartile,

Q1; the median; and the third

quartile, Q3.

Boxplot

Page 159: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

159

Example of Boxplot

Page 160: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

160

Sorted amounts of Strontium-90 (in millibecquerels) in a random sample of baby teeth obtained from Philadelphia residents born after 1979 Note: this data is related to Three Mile Island nuclear power plant Accident in 1979.

128 130 133 137 138 142 142 144 147 149

151 151 151 155 156 161 163 163 166 172

Example of Boxplot

Page 161: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

161

• Five Number Summary

128 140 150 158.5 172

• Boxplot?

Next slide: page 148 constructing a boxplot by hand

or calculator or SPSS

Example of Boxplot

Page 162: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

162

Constructing a Boxplot

Page 148 (see example 3.41)

Page 163: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

163

Example of Boxplot

• Boxplot

Page 164: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

164

• Enter the data in a list:

Calculator Five Number Summary

Page 165: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

165

• Go to STAT - CALC and choose 1-Var Stats

Calculator Five Number Summary

Page 166: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

166

• On the HOME screen, when 1-Var Stats appears, type the list containing the data.

Calculator Five Number Summary

Page 167: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

167

• Arrow down to the five number summary (last five items in the list)

Calculator Five Number Summary

Page 168: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

168

• CLEAR out the graphs under y = (or turn them off).

• Enter the data into the calculator lists. (choose STAT, #1 EDIT and type in entries)

Calculator Boxplot

Page 169: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

169

• Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the second box-and-whisker icon is highlighted, and that the list you

will be using is indicated next to Xlist.

Calculator Boxplot

Page 170: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

170

• To see the box-and-whisker plot, press ZOOM and #9 ZoomStat. Press the TRACE key to see on-screen data about the box-and-whisker plot.

Calculator Boxplot

Page 171: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

171

Boxplot - Symmetric Distribution

Normal Distribution: Heights from a Random Sample of Women

1223 QQQQ NOTE:

value)data(min value)data(max 23 QQ

Page 172: 3.1 Measure of Centerjga001/Chapter 3.pdf · 3.1 - 21 Mode Mode is the only measure of central tendency that can be used with nominal data Bimodal two data values occur with the same

3.1 -

172

Boxplot - Skewed Distribution

Skewed Distribution: Salaries (in thousands of dollars) of NCAA Football Coaches