This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use...

125
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License .Your use of this material constitutes acceptance of that license and the co nditions of use of materials on this site. Copyright 2006, The Johns Hopkins University and William Brieger. All rights reserve d. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independentl y review all materials for accuracy and efficacy. May contain materials owned by oth ers. User is responsible for obtaining permissions for use from third parties as nee ded. JOHNS HOPKINS BLOOMBERG SCHOOL of PUBLIC HEALTH

Transcript of This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use...

Page 1: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site.

Copyright 2006, The Johns Hopkins University and William Brieger. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Page 2: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Describing Data: Part I

John McGreadyJohns Hopkins University

Page 3: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Lecture Topics

Types of data Measures of central tendency for continuous data Sample versus population Visually describing continuous data Underlying shape of the “true distribution of continuous data

3

Page 4: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section A

Types of Data

Page 5: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

5

Steps in a Research Project

1. Planning2. Design3. Data collection4. Data analysis5. Presentation6. Interpretation

Page 6: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 6

Biostatistics Issues

Design of studies – Sample size – Role of randomization

Page 7: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 7

Biostatistics Issues

Variability – Important patterns in data are obscured by variability – Distinguish real patterns from random variation

Page 8: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

8

Biostatistics Issues

Inference – Draw general conclusions from limited data – For example: Survey Summarize

Page 9: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 9

1954 Salk Polio Vaccine Trial

School ChildrenRandomized

Vaccinated

N=200, 745

Placebo

N=201, 229Polio Cases: Vaccine 82

Placebo 162

Page 10: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

10

1954 Salk Polio Vaccine TrialReference: Meier, P. (1972), “The Biggest

Public Health Experiment Ever: The 1954 Field Trial of the Salk Poliomyelitis Vaccine,” In: Statistics: A Guide to the Unknown, J. Tanur (Editor) Holden-Day.

Page 11: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 11

DesignFeatures of the Polio Trial

Comparison group Randomized Placebo controls Double blind Objective—the groups should be equivalent except for the factor (vaccine) being investigated

Page 12: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

12

DesignFeatures of the Polio Trial

Question –Could the results be due to chance?

Page 13: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

13

Imbalance

There were almost twice as many polio cases in the placebo compared to the vaccine group

Page 14: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

14

Could We Get Such GreatImbalance by Chance?

Polio cases – Vaccine—82 – Placebo—162 – p-value = ? Statistical methods tell us how to make these probability calculations

Page 15: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 15

Types of Data

Binary (dichotomous) data

– Yes/No

– Polio—Yes/No

– Cure—Yes/No

–Gender—Male/Female

Page 16: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 16

Types of Data

Categorical data(place individuals in categories) – Race/ethnicity nominal – Country of birth (no ordering) – Degree of agreement ordinal (ordering)

Page 17: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 17

Types of Data

Continuous data (finer measurements) – Blood pressure – Weight – Height – Age

Page 18: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

18

Types of Data

There are different statistical methods for different types of data

Page 19: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

19

Example: Binary Data

To compare the number of polio cases in the two treatment arms of the Salk Polio vaccine, you could use . . . – Fisher’s Exact Test – Chi-Square test

Page 20: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

20

Example: Continuous Data

To compare blood pressures in a clinical trial evaluating two blood pressure-lowering medications, you could use . . . – 2-sample T-test – Wilcoxon Rank Sum Test (nonparametric)

Page 21: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section A

Practice Problems

Page 22: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

22

Data Types

State the data type of each of the following:

Homicide rate (deaths/100,000) High school graduate (Y/N) Hair color Hospital expenditures (yearly, in dollars) Smoking status (none, light, heavy) Coronary heart disease (Y/N)

Page 23: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section A

Practice Problem Solutions

Page 24: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 24

Solutions

Homicide rate (deaths/100,000)

Page 25: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 25

Solutions

Homicide rate (deaths/100,000)

Continuous

Page 26: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 26

Solutions

High school graduate (Y/N)

Page 27: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 27

High school graduate (Y/N)

Solutions

Dichotomous(Binary)

Page 28: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 28

Hair color

Solutions

Page 29: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 29

Hair color

Solutions

Categorical(Nominal)

Page 30: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 30

Hospital expenditures (yearly, in dollars)

Solutions

Page 31: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 31

Hospital expenditures (yearly, in dollars)

Solutions

Continuous

Page 32: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 32

Smoking status (none, light, heavy)

Solutions

Page 33: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 33

Smoking status (none, light, heavy)

Solutions

Categorical(Ordinal)

Page 34: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 34

Coronary heart disease (Y/N)

Solutions

Page 35: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

35

Coronary heart disease (Y/N)

Solutions

Dichotomous

(Binary)

Page 36: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Section B

Measures of Central TendencySample Versus Population

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Page 37: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

37

Summarizing and DescribingContinuous Data

Measures of the center of data – Mean – Median

Page 38: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

38

Sample Mean XThe Average or Arithmetic Mean

Add up data, then divide by sample size (n) The sample size n is the number of observations (pieces of data)

Page 39: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 39

Example

n = 5 Systolic blood pressures (mmHg) X1 = 120 X2 = 80 X3 = 90 X4 = 110 X5 = 95

Page 40: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

40

Example

X =120 + 80 90

5

+ 110 + 95= 99mmH

g

+

Page 41: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

41

Notes on Sample Mean X Formula

n

∑ Xi

X = i=1

n

Page 42: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

42

Summation Sign

In the formula to find the mean, we use the “summation sign” — ∑ This is just mathematical shorthand for “add up all of the observations”

n

∑i = 1

Xi

= X1+ X2

+ X3+ ......

+ Xn

Page 43: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Notes on Sample Mean

Also called sample average or arithmetic mean Sensitive to extreme values – One data point could make a great change in sample mean Why is it called the sample mean? – To distinguish it from population mean 43

Page 44: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 44

Population Versus Sample

Population—The entire group you want information about

– For example: The blood pressure of all 18- year-old male college students in the United States

Page 45: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 45

Population Versus Sample Sample—A part of the population from which we actually collect information and draw conclusions about the whole population – For example: Sample of blood pressures N=five 18-year-old male college students in the United States

Page 46: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 46

Population Versus Sample The sample mean X is not the population mean µ

Page 47: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Population Versus Sample

Continued 47

Population Sample Population mean µ Sample mean X

Page 48: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

48

Population Versus Sample

We don’t know the population mean µ but would like to know it We draw a sample from the population We calculate the sample mean X How close isX to µ? Statistical theory will tell us how close X is to µ

Page 49: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

49

Statistical Inference

Statistical inference is the process of trying to draw conclusions about the population from the sample

Page 50: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 50

Sample Median

The median is the middle number

80 90 95 110 120

Page 51: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 51

Sample Median The sample median is not sensitive to extreme values – For example: If 120 became 200, the median would remain the same, but the mean would change to 115

80 90 95 110 200

Page 52: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

52

Sample Median If the sample size is an even number

80 90 95 110 120 125

Median95 + 110 = 102.5 mmHg

2

Page 53: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section B

Practice Problems

Page 54: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 54

Practice Problems

The following data is the annual income (in 1000s of dollars) taken from nine students in the Hopkins internet-based MPH program:

37 102 34 12 111 56 72 17 33

Page 55: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

55

Practice Problems

1. Calculate the sample mean income2. Calculate the sample median income3. What population could this sample represent?4. Which would change by a larger amount— the mean or median—if the 34 were replaced by 17, and the 12 replaced by a 31?

Page 56: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section B

Practice Problem Solutions

Page 57: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 57

Solutions

1. Calculate the sample mean income Remember:

Where n=9 and x1 through x9 represent the nine observed values of income

x =

n

∑xI

i =1

n

Page 58: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 58

Solutions

x= 37 +102 + 34 +12 +111+56+72+17+33 9

x = 474 =

9 52.67

– The mean income in our sample is

$52,670

Page 59: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 59

Solutions

2. Calculate the sample median income – To calculate the sample median, we must first order our data from the lowest value to the highest value

Page 60: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 60

Solutions

– The ordered data set appears below

12 17 33 34 37 56 72 102 111

Page 61: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

61

Solutions

– Because we have nine values (an odd number), we can just pick the middle value to calculate the median

12 17 33 34 37 56 72 102 111

Median = 37

Page 62: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

62

Solutions

3. What population could this sample represent?– It could be representative off all Johns Hopkins Internet-based MPH students; it would need to be made clear that this is a random sample in order for it to be called representative

Page 63: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

63

Solutions

4. Which would change by a larger amount—the mean or the median—if the 34 were replaced by 17, and the 12 replaced by a 31? – Notice that both changes do nothing to change the position of the median; therefore, the only statistic that would change is the mean

Page 64: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section C

Visually Displaying ContinuousData: Histograms;

The Underlying “PopulationDistribution”

Page 65: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

65

Pictures of DataContinuous Variables

Histograms – Means and medians do not tell whole story – Differences in spread (variability) – Differences in shape of the distribution

Page 66: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 66

How to Make a Histogram

Consider the following data collected from the 1995 Statistical Abstracts of the United States – For each of the 50 United States, the proportion of individuals over 65 years of age has been recorded

Page 67: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 67

How to Make a HistogramState % State % State % State %

AL 12.9 IN 12.8 NE 14.1 SC 11.9

AK 4.6 IA 15.4 NV 11.3 SD 14.7

AZ 13.4 KS 13.9 NH 11.9 TN 12.7 AR 14.8 KY 12.7 NJ 13.2 TX 10.2 CA 10.6 LA 11.4 NM 11.0 UT 8.8

CO 10.1 ME 13.9 NY 13.2 VT 12.1 CT 14.2 MD 11.2 NC 12.5 VA 11.1 DE 12.7 MA 14.1 ND 14.7 WA 11.6 FL 18.4 MI 12.4 OH 13.4 WV 15.4 GA 10.1 MN 12.5 OK 13.6 WI 13.4 HI 12.1 MI 12.5 OR 13.7 WY 11.1 ID 11.6 MO 14.1 PA 15.9 IL 12.6 MT 13.3 RI 15.6

Source: Statistical Abstracts of the United States, 1995

Page 68: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 68

How to Make a HistogramState % State % State % State %

AL 12.9 IN 12.8 NE 14.1 SC 11.9

AK 4.6 IA 15.4 NV 11.3 SD 14.7

AZ 13.4 KS 13.9 NH 11.9 TN 12.7 AR 14.8 KY 12.7 NJ 13.2 TX 10.2 CA 10.6 LA 11.4 NM 11.0 UT 8.8

CO 10.1 ME 13.9 NY 13.2 VT 12.1 CT 14.2 MD 11.2 NC 12.5 VA 11.1 DE 12.7 MA 14.1 ND 14.7 WA 11.6 FL 18.4 MI 12.4 OH 13.4 WV 15.4 GA 10.1 MN 12.5 OK 13.6 WI 13.4 HI 12.1 MI 12.5 OR 13.7 WY 11.1 ID 11.6 MO 14.1 PA 15.9 IL 12.6 MT 13.3 RI 15.6

Source: Statistical Abstracts of the United States, 1995

Page 69: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 69

How to Make a HistogramState % State % State % State %

AL 12.9 IN 12.8 NE 14.1 SC 11.9

AK 4.6 IA 15.4 NV 11.3 SD 14.7

AZ 13.4 KS 13.9 NH 11.9 TN 12.7 AR 14.8 KY 12.7 NJ 13.2 TX 10.2 CA 10.6 LA 11.4 NM 11.0 UT 8.8

CO 10.1 ME 13.9 NY 13.2 VT 12.1 CT 14.2 MD 11.2 NC 12.5 VA 11.1 DE 12.7 MA 14.1 ND 14.7 WA 11.6 FL 18.4 MI 12.4 OH 13.4 WV 15.4 GA 10.1 MN 12.5 OK 13.6 WI 13.4 HI 12.1 MI 12.5 OR 13.7 WY 11.1 ID 11.6 MO 14.1 PA 15.9 IL 12.6 MT 13.3 RI 15.6

Source: Statistical Abstracts of the United States, 1995

Page 70: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 70

How to Make a HistogramCount the number of observations in

each class Here are the observations:Class Count Class Count Class

Count

4.0–5.0 1 9.0–10.0 0 14.0–15.0 7

5.0–6.0 0 10.0–11.0 5 15.0–16.0 4

6.0–7.0 0 11.0–12.0 9 16.0–17.0 0

7.0–8.0 0 12.0–13.0 12 17.0–18.0 0

8.0–9.0 1 13.0–14.0 10 18.0–19.0 1

Page 71: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

71

How to Make a Histogram

Divide range of data into intervals (bins) of equal width Count the number of observations in each class Draw the histogram Label scales

Page 72: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 72

Histogram

A histogram of the percent of residents older than 65 years in the 50 United States.

Page 73: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 73

Histogram

A histogram of the percent of residents older than 65 years in the 50 United States.

Page 74: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

74

Histogram

A histogram of the percent of residents older than 65 years in the 50 United States.

Page 75: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 75

Pictures of Data: Histograms

Recall the blood pressure data on a sample of 113 men

Page 76: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 76

Pictures of Data: Histograms

Histogram of the Systolic Blood Pressure for 113 men. Each bar spans a width of five mmHg on the horizontal axis. The height of each bar represents the number of individuals with SBP in that range.

Page 77: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 77

Pictures of Data: Histograms

Histogram of the Systolic Blood Pressure for 113 men. Each bar spans a width of five mmHg on the horizontal axis. The height of each bar represents the number of individuals with SBP in that range.

Page 78: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

78

Pictures of Data: Histograms

Yet another histogram of the same BP information on 113 men. Here, the bin width is one mmHg, perhaps giving more detail than is necessary.

Page 79: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

79

Other Examples

Another way to present the data in a histogram is to label the y-axis with relative frequencies as opposed to counts. The height of each bar represents the percentage of individuals in the sample with BP in that range. The bar heights should add to one.

Page 80: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

80

Intervals How many intervals (bins) should you have in a histogram? – There is no perfect answer to this – Depends on sample size n

– Rough rule of thumb: # Intervals ≈ n n Number of Intervals 10 About 3 50 About 7100 About 10

Page 81: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 81

Shapes of the Distribution Three common shapes of frequency distributions

Symmetrical and bell shaped

Positively skewed or skewed to the right

Negatively skewed or skewed to the left

A B C

Page 82: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

82

Shapes of the DistributionThree less common shapes of frequency distributions

A B C

Bimodal Reverse

J-shaped

Uniform

Page 83: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

83

Distribution Characteristics Mode—Peak(s) Median—Equal areas point Mean—Balancing point

Mode Media

n

Mean

Page 84: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 84

Distribution Characteristics Right skewed (positively skewed) – Long right tail – Mean > Median

Mode Media

n

Mean

Page 85: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 85

Shapes of Distributions Left skewed (negatively skewed) – Long left tail – Mean < Median

Mean

Median

Mode

Page 86: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 86

Shapes of Distributions Symmetric (right and left sides are mirror images) – Left tail looks like right tail – Mean = Median = Mode

Mean

Median

Mode

Page 87: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 87

Shapes of Distributions

Outlier – An individual observation that falls outside the overall pattern of the graph

Page 88: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

88

Shapes of Distributions

Outlier

Florida

Alaska

Page 89: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 89

The Histogram and theProbability Density

The same histogram of BP measurements from a sample of 113men. We are going to compare this to a histogram for a largersample, and for the entire population.

Page 90: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 90

The Histogram and theProbability Density

The same histogram of BP measurements from a sample of 113men. We are going to compare this to a histogram for a largersample, and for the entire population.

Page 91: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 91

The Histogram and theProbability Density

The Probability Density for BP values in the entire population of men—because the population is infinite, there are no bars,and the y-axis can not have actual counts.

Page 92: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

92

The Histogram and theProbability Density

The probability density is a smooth idealized curve that shows the shape of the distribution in the population Areas in an interval under the curve represent the percent of the population in the interval

Page 93: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section C

Practice Problems

Page 94: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 94

Practice Problems

What kind of shape do you think the distribution for the following data would have? Hospital length of stay for 1,000 randomly selected patients at Johns Hopkins

Blood pressure in all women

Page 95: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

95

Practice Problems

Annual income for all JHU Internet-based MPH students (refer to results of last Practice Problems, assume a random sample) Number of hours of sporting events watched last week by a sample of 350 men and 350 women

Page 96: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section C

Practice Problem Solutions

Page 97: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 97

Solutions

Hospital length of stay for 1,000 randomly selected patients at Johns Hopkins

Page 98: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 98

Solutions

Hospital length of stay for 1,000 randomly selected patients at Johns Hopkins

Page 99: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 99

Solutions

Blood pressure in all women

Page 100: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 100

Solutions

Blood pressure in all women

Page 101: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 101

Solutions

Annual income for all JHU Internet-based

MPH students

Page 102: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 102

Solutions

Annual income for all JHU Internet-based MPH students

Median

Mean

Page 103: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 103

Solutions

Number of hours of sporting events watched last week by a sample of 350 women and 350 men

Page 104: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 104

Solutions

Number of hours of sporting events watched last week by a sample of 350 women and 350 men

Page 105: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 105

Solutions

Number of hours of sporting events watched last week by a sample of 350 women and 350 men

Mode for

Women?

Mode for

Men?

Page 106: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

JOHNS HOPKINSBLOOMBERGSCHOOL of PUBLIC HEALTH

Section D

Stem and Leaf Plots, Box Plots

Page 107: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 107

Sample 113 Men

Suppose we took another look at our random sample of 113 men and their blood pressure measurements One tool for “visualizing” the data is the histogram

Page 108: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 108

Histogram: BP for 113 males

Page 109: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

109

Sample 113 Men: Stem and Leaf

Another common tool for visually displaying continuous data is the “stem and leaf”

plot Very similar to a histogram – Like a “histogram on its side” – Allows for easier identification of individual values in the sample

Page 110: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 110

Stem and Leaf: BP for 113 Males

Page 111: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 111

Stem and Leaf: BP for 113 Males

“Stems”

Page 112: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 112

Stem and Leaf: BP for 113 Males

“Leaves”

Page 113: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 113

Stem and Leaf: BP for 113 Males

Page 114: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 114

Stem and Leaf: BP for 113 Males

Page 115: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 115

Sample 113 Men: Stem andBoxplot

Another common visual display tool is the boxplot – Gives good insight into distribution shape in terms of skewness and outlying values – Very nice tool for easily comparing distribution of continuous data in multiple groups—can be plotted side by side

Page 116: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 116

Boxplot: BP for 113 MalesBoxplot of Systolic Blood Pressures

Sample of 113 Men

Page 117: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 117

Boxplot: BP for 113 Males

Sample MedianBlood Pressure

Boxplot of Systolic Blood Pressures Sample of 113 Men

Page 118: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 118

Boxplot: BP for 113 Males

75th Percentile

25th Percentile

Boxplot of Systolic Blood Pressures Sample of 113 Men

Page 119: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 119

Boxplot: BP for 113 Males

Largest Observation

Smallest Observation

Boxplot of Systolic Blood Pressures Sample of 113 Men

Page 120: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

120

Hospital Length of Stay for 1,000

Patients Suppose we took a sample of discharge records from 1,000 patients discharged from a large teaching hospital How could we visualize this data?

Page 121: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

121

Histogram: Length of Stay

Length of Stay (Days)

Fre

qu

ency

Page 122: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 122

Boxplot: Length of StayHo spital LO S (D ays) fo r 1,000 Patients

Page 123: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 123

Boxplot: Length of StayHo spital LO S (D ays) fo r 1,000 Patients

Page 124: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 124

Boxplot: Length of Stay

Largest Non-Outlier

Smallest Non-Outlier

Ho spital LO S (D ays) fo r 1,000 Patients

Page 125: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License.Your use of this material constitutes acceptance of that license.

Continued 125

Boxplot: Length of Stay

Large Outliers

Ho spital LO S (D ays) fo r 1,000 Patients