Central Tendency

30
1 Biostatistics Measures of Central Tendency

Transcript of Central Tendency

Page 1: Central Tendency

1

Biostatistics

Measures of Central Tendency

Page 2: Central Tendency

2

Central Tendency

Central Tendency

Page 3: Central Tendency

3

Measures of Central Tendency Central Tendency – Definition: The most

common value (for nominal variables) or the value around which cases tend to cluster (for ordinal and interval-ratio variables)

Central Tendency – Simplified Definition: A number that represents what is “typical”, “average”, or “in the middle”

Measures of Central Tendency Mode Median Mean

The one that is to use depends on the situation

Page 4: Central Tendency

4

Mode Definition: The most frequently occurring value

of a variable

Levels of Measurement Nominal Ordinal Interval-ratio

Comment: The mode is a value, not a frequency!

Page 5: Central Tendency

5

Mode in a Frequency Distribution Mode is the value with the largest frequency or

percentage

Colour of eyes Frequency (f) Percentage (%)Brown 247 50.2%Black 145 29.5%Grey 37 7.5%Blue 63 12.8%Total 492 100.0%

Mode = “O”

Page 6: Central Tendency

6

Mode in a Bar Graph The mode is the value with the tallest bar

Mode = “brown”

brown black grey AB0

100

200

300

Page 7: Central Tendency

7

Mode in a Pie Chart The mode is the value with the largest slice

Mode = “brown”

brown

black

grey

blue

Page 8: Central Tendency

8

Mode in a Histogram The mode is the value with the tallest bar

Mode = 0Colour

Brown

Black

Grey Blue

Page 9: Central Tendency

9

Mode: Potential Problems Problem 1: Mode might not fall near the center

of the distribution for an interval-ratio variable

The mode is here We’d like the mode to be here

Page 10: Central Tendency

10

Mode: Potential Problems

Problem 2: There might be more than one mode Bimodal: Two modes

Body Modification Frequency(f)

Percentage(%)

Brown 155 31.5%

Grey 121 24.6%

Blue 61 12.4%

Black 155 31.5%

Total 492 100.0%

Two Modes:“brown ”

and“blue ”

Page 11: Central Tendency

11

Median Definition: The middle number in the distribution

of a variable when its values are placed in order

Levels of Measurement Ordinal Interval-ratio

Comment: The median divides the distribution of a variable in half Half of the cases will be above the middle number Half of the cases will be below the middle number

Page 12: Central Tendency

12

Determining the Median Interval-Ratio Variable

Odd Number of Cases: The median is the middle number

Even Number of Cases: The median is the average of the two middle numbers

Page 13: Central Tendency

13

Median: Example Data

Person number

Age in years

1 432 57

3 45

4 68

5 51

6 26

7 13

8 1

9 24

Student number

Hours / week

1 02 103 124 125 56 127 18 129 510 11

Table 1. age of people Table 2. duration of study per student

Page 14: Central Tendency

14

Median: Interval-Ratio Variable, Odd Number of Cases

Example: age of people (Table 1)

Steps: Illustrated on next slide Step 1: Arrange the values in order from smallest to largest Step 2: Assign “case numbers” from 1 to N Step 3: Find the middle case by adding 1 to N and diving by 2

Here, (N+1)/2 = (9+1)/2 = 10/2 = 5 The fifth case (not the number 5!) has the median

Step 4: Find the value corresponding to the middle case Here the 5th case has a value of 43

Note: The median of 43 divides the distribution in half Four cases are above the median Four cases are below the median

Page 15: Central Tendency

15

Median: Interval-Ratio Variable, Odd Number of Cases Step 1: Arrange the values in order from smallest to largest

Step 2: Number the values from 1 to 9

Step 3: Find the middle case: (9+1)/2 = 10/2 = 5

Step 4: Find the value for the 5th case – this is the median = 43

Note: 4 cases are above, and 4 are below, the median

Variable Value 1 13 24 26 43 45 51 57 68

Case Number 1 2 3 4 5 6 7 8 9

Page 16: Central Tendency

16

Median: Interval-Ratio Variable, Even Number of Cases

Example: Metallica CDs Owned (Table 2)

Steps: Illustrated on next slide Step 1: Arrange the values in order from smallest to largest Step 2: Assign “case numbers” from 1 to N Step 3: Find the middle case by adding 1 to N and diving by 2

Here, (N+1)/2 = (10+1)/2 = 11/2 = 5.5 The median is the average of the 5th and 6th cases

Step 4: Find the average of the two middle cases Here the 5th case has a value of 10 and the 6th case has a value of 11 The median is (10+11)/2 = 10.5

Note: The median of 10.5 divides the distribution in half Five cases are above the median Five cases are below the median

Page 17: Central Tendency

17

Median: Interval-Ratio Variable, Even Number of Cases

Step 1: Arrange the values in order from smallest to largest

Step 2: Number the values from 1 to 10

Step 3: Find the middle case: (10+1)/2 = 11/2 = 5.5 (average of the 5 th and 6th cases)

Step 4: Find average of the 5th and 6th cases – the median is (10+11)/2 = 10.5

Note: 5 cases are above, and 5 are below, the median

Variable Value 0 1 5 5 10 11 12 12 12 12

Case Number 1 2 3 4 5 6 7 8 9 10

Page 18: Central Tendency

18

Median in a Frequency Distribution Median: Value of the variable where the cumulative

percentage is 50% Here the cumulative percentage hits 50% at a value of 3 So the median number of hours of study is 3

Hours of study Frequency Percentage Cumulative Percentage

0 137 23.2% 23.2%

1 56 9.5% 32.7%

2 48 8.1% 40.8%

3 75 12.7% 53.5%

4 42 7.1% 60.6%

5 15 2.5% 63.1%

6 68 11.5% 74.6%

7 150 25.4% 100.0%

Total 591 100.0%

Page 19: Central Tendency

19

Mean Definition: The average obtained by summing

the values of a variable divided by the number of cases

Level of Measurement: Interval-ratio

Comments It incorporates all values of a variable

Unlike the mode and median It can be misleading when there are outlying

(extreme) values

Page 20: Central Tendency

20

Mean: Formula for a Data Table

Ẋ = Ʃ x n

Ẋ represents the mean ∑ tells us to sum or add up x represents each value of the variable n represents the number of cases

Y

Page 21: Central Tendency

21

Mean: Calculating for a Data Table

Example: hours of study

Ẋ = Ʃ x = 0 + 1+ 5+5+10+11+12+12+12+12 = 6.8 n 10 People in this sample study for an average

of 6.8 hours

Page 22: Central Tendency

22

Mean: Formula for a Frequency Distribution

Ẋ = Ʃ f * x n

Ẋ represents the mean ∑ tells us to sum or add up f represents the frequency for each value of the variable x represents each value of the variable f·*x tells us to multiply the frequency (f) by the value (Y) n represents the number of cases

Page 23: Central Tendency

23

Mean:Calculating for a Frequency Distribution Example: hours of study

Hours of study(x) Frequency (f) f·x

0 137 137·0 = 0

1 56 56·1 = 56

2 48 48·2 = 96

3 75 75·3 = 225

4 42 42·4 = 168

5 15 15·5 = 75

6 68 68·6 = 408

7 150 150·7 = 1,050

N = 591

Page 24: Central Tendency

24

Mean:Calculating for a Frequency Distribution

Example: hours of study(continued)

Ẋ = 0+ 56+96+225+ 168+ 75+408+1078 = 2078 591 591 People in this sample study for an average of

3.52 hours

Page 25: Central Tendency

25

Outlying Value

Definition: A value that is very small or large relative to other values of the variable

Effect of Outlying ValueMode: Usually has no effectMedian: Usually has no effectMean: May have an effect

Page 26: Central Tendency

26

Outlying Value:Potential Effect on Mean Example: hours of study

Suppose one student study for 112 (instead of 12) hours

Ẋ = Ʃ x = 0 + 1+ 5+5+10+11+12+12+12+112 = 168 n 10 10

The mean is now 16.8

Page 27: Central Tendency

27

Determining Skewness:Using the Mean and Median Procedure

Compare the mean and median orSubtract the median from the mean

Mean – Median

Symmetric DistributionComparison: Mean equals medianMean – Median: Difference is zero

Page 28: Central Tendency

28

MeanMedianMode

Page 29: Central Tendency

29

Determining Skewness:Using the Mean and Median Positively Skewed Distribution

Comparison: Mean greater than medianMean – Median: Difference is positive

Greater than zero

Negatively Skewed DistributionComparison: Mean smaller than medianMean – Median: Difference is negative

Less than zero

Page 30: Central Tendency

30

Choosing a Measure of Central Tendency Nominal Variable: Mode only

Ordinal Variable: Median is best Mode is also possible

Interval-Ratio Variable Symmetric Distribution: Any will work

Mean is typically used Positively or Negatively Skewed Distribution: Median

is best