Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan...

57
Biostatistics I Biostatistics I PubH 6450 PubH 6450 Fall 2005 Fall 2005
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan...

Page 1: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

Biostatistics IBiostatistics IPubH 6450PubH 6450

Fall 2005Fall 2005

Page 2: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 2

PubH 6450 – Biostatistics IPubH 6450 – Biostatistics IInstructor: Susan TelkeInstructor: Susan Telke

email: email: [email protected]@biostat.umn.edu (office hours: (office hours: 3:20pm – 3:20pm – 4:20pm (T and TH), location – lecture 4:20pm (T and TH), location – lecture hall or by appointment, hall or by appointment, location -A349 Mayolocation -A349 Mayo building)building)

Teaching Assistants: Teaching Assistants: Pei Li – email: Pei Li – email: [email protected]@biostat.umn.edueduXiaoxiao Kong – email: Xiaoxiao Kong – email: [email protected]@biostat.umn.edueduJianmin Liu – email: Jianmin Liu – email: [email protected]@biostat.umn.edueduXiaobo Liu – email: Xiaobo Liu – email: [email protected]@biostat.umn.edueduRan Li – email: Ran Li – email: [email protected]@biostat.umn.edueduJia Xu – email: Jia Xu – email: [email protected]@biostat.umn.edueduJay Pottala – email: Jay Pottala – email: [email protected]@biostat.umn.edu

Page 3: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 3

Book for 6450Book for 6450

Introduction to the Practice of Introduction to the Practice of Statistics -Statistics -(Moore and McCabe)(Moore and McCabe)

Page 4: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 4

Web PageWeb Page

http://www.biostat.umn.edu/~susant/PH6450DESC.htmlhttp://www.biostat.umn.edu/~susant/PH6450DESC.html

Information on the web:Information on the web:1.1. General class informationGeneral class information

2.2. SyllabusSyllabus

3.3. Course notes (updated weekly)Course notes (updated weekly)

4.4. HomeworkHomework

5.5. Computer HelpComputer Help

Page 5: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 5

Computer LabsComputer Labs

Mayo C381 (Biostatistics Lab)Mayo C381 (Biostatistics Lab)

Teaching Assistants will have Teaching Assistants will have computer sessions located in the computer sessions located in the mayo lab to help you with your mayo lab to help you with your homework assignmentshomework assignments..

Deihl Hall (Medical Library)Deihl Hall (Medical Library)

Page 6: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 6

PC SASPC SAS

Primary computing environment will be Primary computing environment will be

PC SASPC SAS PC SAS is available in computing lab MAYO C381PC SAS is available in computing lab MAYO C381 PC SAS can be purchased at the bookstore (one PC SAS can be purchased at the bookstore (one

year agreement is about $50).year agreement is about $50). SAS (not PC SAS) is available using the UNIX SAS (not PC SAS) is available using the UNIX

version of SAS by telnet to the biostat workstation version of SAS by telnet to the biostat workstation saturn.saturn.

Page 7: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 7

Exams and HomeworkExams and Homework There will be weekly homework There will be weekly homework

assignmentsassignments There will be two midterms and one final There will be two midterms and one final

exam.exam. Students who get an “A” on all exams get an Students who get an “A” on all exams get an

“A” in the course.“A” in the course. For all other students the midterms account For all other students the midterms account

for 25% each and the final accounts for 30% for 25% each and the final accounts for 30% of the course grade. The remaining 20% is of the course grade. The remaining 20% is based on homework (best 9)based on homework (best 9)

Page 8: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 8

Introduction to PubH Introduction to PubH 64506450

The study of The study of statisticsstatistics explores the explores the collection, organization, analysis and collection, organization, analysis and interpretation of numerical data.interpretation of numerical data.

When the focus of the analysis is on the When the focus of the analysis is on the biological and health sciences it is called biological and health sciences it is called BiostatisticsBiostatistics..

Page 9: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 9

Trial by Jury:Trial by Jury:A Familiar ScenarioA Familiar Scenario

You have a crime.You have a crime. You have a suspect.You have a suspect. A police investigation collects evidence A police investigation collects evidence

against the suspect.against the suspect. A prosecutor presents summarized A prosecutor presents summarized

evidence to a jury.evidence to a jury.

Page 10: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 10

Trial by Jury:Trial by Jury:The ProcessThe Process

The Jury reaches a verdict based on their The Jury reaches a verdict based on their judgment of the evidence presented.judgment of the evidence presented.

Rules for determining a verdict:Rules for determining a verdict:• The accused is innocent until proven guiltyThe accused is innocent until proven guilty• The evidence must be sufficient to convict The evidence must be sufficient to convict

beyond all reasonable doubtbeyond all reasonable doubt• Decision must be unanimousDecision must be unanimous

Page 11: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 11

Trial by Jury:Trial by Jury:The NeedThe Need

Why is the Trial by Jury process needed?Why is the Trial by Jury process needed?

The truth is unknown or uncertain because The truth is unknown or uncertain because of :of :

• VariabilityVariability: Every case is different.: Every case is different.• Incomplete information: Some evidence Incomplete information: Some evidence

may be missing.may be missing.

Page 12: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 12

Trial by Jury:Trial by Jury:RationaleRationale

Trial by Jury is the way our society deals Trial by Jury is the way our society deals with uncertainties related to criminal with uncertainties related to criminal justice.justice.

Its goal is to minimize errors/mistakes Its goal is to minimize errors/mistakes within the limits of human understanding.within the limits of human understanding.

It is impossible to eliminate all mistakes in It is impossible to eliminate all mistakes in verdicts made based on uncertain, verdicts made based on uncertain, incomplete evidence.incomplete evidence.

Page 13: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 13

Trial by Jury:Trial by Jury:Dealing with UncertaintyDealing with Uncertainty

A A hypothesishypothesis (assumption) is stated: (assumption) is stated: “Every person is innocent until proven “Every person is innocent until proven guilty”guilty”

DataData is collected: Evidence against the is collected: Evidence against the hypothesis – not against the suspect.hypothesis – not against the suspect.

A verdict is reached based on the A verdict is reached based on the evidence about whether the hypothesis evidence about whether the hypothesis should be rejected. (If should be rejected. (If hypothesis rejectedhypothesis rejected – verdict is guilty)– verdict is guilty)

Page 14: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 14

Trial by Jury:Trial by Jury:Elements of a Successful Elements of a Successful

TrialTrial A probable A probable causecause (a crime and a suspect). (a crime and a suspect). A thorough A thorough investigationinvestigation (by police). (by police). An efficient An efficient presentationpresentation (by D.A.’s office (by D.A.’s office

attorneys – organization and attorneys – organization and summarization of evidence).summarization of evidence).

A fair & impartial A fair & impartial assessmentassessment by the jury. by the jury.

Page 15: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 15

Trial by Jury:Trial by Jury:How does this relate to How does this relate to

Biostatistics?Biostatistics? A probable A probable causecause: The crime is lung cancer & : The crime is lung cancer &

the suspect is cigarette smoking.the suspect is cigarette smoking. A thorough A thorough investigationinvestigation: A : A clinic trialclinic trial or or case case

control studycontrol study to gather information. to gather information. An efficient An efficient presentationpresentation: Using biostatistics : Using biostatistics

tools to organize and summarize data.tools to organize and summarize data. A fair & impartial A fair & impartial assessmentassessment by the jury: by the jury:

Making proper Making proper statistical inferencestatistical inference based on based on data collected.data collected.

Page 16: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 16

Areas of BiostatisticsAreas of Biostatistics

Experimental Designs:Experimental Designs:How will the data be collected?How will the data be collected?Descriptive Statistics:Descriptive Statistics:Organization of dataOrganization of dataSummary statistics of dataSummary statistics of dataEffective graphical representation of dataEffective graphical representation of dataStatistical InferenceStatistical InferenceThe science of drawing statistical conclusions from The science of drawing statistical conclusions from

specific data using a knowledge of probability.specific data using a knowledge of probability.

Page 17: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 17

Goals …Goals …

By the end of the course you should be able to use By the end of the course you should be able to use the following aspects of statistical thinking:the following aspects of statistical thinking:

Critically read the literature in your field that makes Critically read the literature in your field that makes use of statistical analysis.use of statistical analysis.

Read about new statistical techniques and Read about new statistical techniques and understand how they may apply to your field.understand how they may apply to your field.

Create and analyze descriptive statistics based on Create and analyze descriptive statistics based on data.data.

Develop hypotheses and use appropriate statistics to Develop hypotheses and use appropriate statistics to evaluate these hypotheses.evaluate these hypotheses.

Page 18: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 18

The Language of The Language of Statistics:Statistics:DefinitionsDefinitions PopulationPopulation: The entire group of people, : The entire group of people,

animals or things about which we want animals or things about which we want information. (e.g. population of the U.S.)information. (e.g. population of the U.S.)

IndividualsIndividuals(units): The objects described (units): The objects described by a set of data. (e.g. People)by a set of data. (e.g. People)

SampleSample: A part of the population from : A part of the population from which we actually collect information, used which we actually collect information, used to draw conclusions about the whole to draw conclusions about the whole population. (e.g. sample=1000 people)population. (e.g. sample=1000 people)

Page 19: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 19

The Language of The Language of Statistics:Statistics:DefinitionsDefinitions

VariableVariable: Any characteristic of an : Any characteristic of an individual. A variable can take different individual. A variable can take different values for different individuals. Also, a values for different individuals. Also, a variable can take different values for the variable can take different values for the same individual at different times. (e.g. same individual at different times. (e.g. Height, age, gender)Height, age, gender)

Page 20: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 20

Two “Types” of VariablesTwo “Types” of Variables

Quantitative VariableQuantitative Variable: measures that are : measures that are recorded on a naturally occurring numerical recorded on a naturally occurring numerical scale. Operations such as adding and scale. Operations such as adding and “averaging” make sense. (e.g. Height, time, test “averaging” make sense. (e.g. Height, time, test scores)scores)

Qualitative VariableQualitative Variable (Categorical): Variables that (Categorical): Variables that are classified into one of a group of categories. are classified into one of a group of categories. Arithmetic operations do NOT make sense with Arithmetic operations do NOT make sense with this type of variable. (e.g. Geographical this type of variable. (e.g. Geographical location, gender)location, gender)

Page 21: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 21

Examples:Examples:

Age in yearsAge in years ID #ID # Temperature in degreesTemperature in degrees Political partyPolitical party Smoking statusSmoking status Length in cmLength in cm GenderGender Blood pressureBlood pressure

Page 22: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 22

Group Work!Group Work!

Page 23: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 23

Two Methods for Two Methods for Describing Sets of DataDescribing Sets of Data

Exploratory Data analysis: Exploratory Data analysis: examining data examining data in order to describe their main features.in order to describe their main features.

GraphicalGraphical

NumericalNumerical

Page 24: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 24

Displaying Displaying DistributionsDistributions with Graphswith Graphs

DistributionDistribution: The distribution of a variable : The distribution of a variable tells us what values it takes on and how tells us what values it takes on and how often it takes on these values.often it takes on these values.

Page 25: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 25

Describing Categorical Describing Categorical Variables with GraphsVariables with Graphs

Bar GraphsBar Graphs

Percent of Children Living in Crack/cocaine Households

0%

10%

20%

30%

40%

50%

60%

70%

80%

Black White AmericanIndian

Other

Percent of Children Living inCrack/cocaine Households

BlackBlack 70%70%

WhiteWhite 18% 18%

American American IndianIndian

8%8%

OtherOther 4%4%

NOTE: 668 children living in crack/cocaine households were categorized based on race

Page 26: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 26

Describing Categorical Describing Categorical Variables with GraphsVariables with Graphs

Pie ChartPie ChartPercent of Children Living in Crack/cocaine Households

Black White

American Indian Other

BlackBlack 70%70%

WhiteWhite 18% 18%

American American IndianIndian

8%8%

OtherOther 4%4%

NOTE: 668 children living in crack/cocaine households were categorized based on race

Page 27: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 27

Describing Quantitative Describing Quantitative DataData

StemplotsStemplots HistogramsHistograms Time PlotsTime Plots Box Plots (section 1.2)Box Plots (section 1.2)

Page 28: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 28

StemplotsStemplots

Quick easy way to see distribution of 40 or Quick easy way to see distribution of 40 or less data pointsless data points

How to make a stemplotHow to make a stemplot Create LeafCreate Leaf Order DataOrder Data Arrange StemsArrange Stems Place LeavesPlace Leaves

Page 29: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 29

Stemplots:Stemplots:An ExampleAn Example

Average Monthly Temperature. Source: Average Monthly Temperature. Source: World World AlmanacAlmanac 1996 p.180 1996 p.180

JAJANN

FFEEBB

MMAARR

AAPPRR

MMAAYY

JJUUNN

JJUULL

AAUUGG

SSEEPP

OOCCTT

NNOOVV

DDEECC

DuluthDuluth 66 1212 2323 3838 5050 5959 6565 6363 5454 4444 2828 1414

MinneapolisMinneapolis 1111 1818 2929 4646 5959 6868 7373 7171 6161 5050 3333 1919

Page 30: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 30

Stemplots:Stemplots:An ExampleAn Example

00 66

11 1248912489

22 389389

33 3838

44 4646

55 0049900499

66 13581358

77 1313

6,11,12,14,18,19,23,28,29,6,11,12,14,18,19,23,28,29,

33,38,44,46,50,50,54,59,59,33,38,44,46,50,50,54,59,59,

61,63,65,68,71,7361,63,65,68,71,73

Page 31: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 31

HistogramsHistograms

Histograms are useful to display the Histograms are useful to display the distribution of large amounts of data.distribution of large amounts of data.

Steps for creating a histogramSteps for creating a histogram1.1. Divide range into classes of equal widthDivide range into classes of equal width

2.2. Count number of observations in each classCount number of observations in each class

3.3. Draw histogram Draw histogram

Page 32: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 32

Histogram:Histogram:An ExampleAn Example

Weights of 92 Penn State Students:Weights of 92 Penn State Students: FemalesFemales

140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108 110 150 108

MalesMales140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 140 145 160 190 155 165 150 190 195 138 160 155 153 145 170 175 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 175 170 180 135 170 157 130 185 190 155 170 155 215 150 145 155 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 155 150 155 150 180 160 135 160 130 155 150 148 155 150 140 180 190 145 150 164 140 142 136 123 155190 145 150 164 140 142 136 123 155

Page 33: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 33

Histogram of WeightsHistogram of Weights

Weights of Penn State Students

0

5

10

15

20

25

<110 110-119

120-129

130-139

140-149

150-159

160-169

170-179

180-189

190+

Weight in Lbs.

Nu

mb

er

of

Stu

de

nts

All Students

Page 34: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 34

Number of IntervalsNumber of Intervals

There is no clear-cut rule on the number of There is no clear-cut rule on the number of intervals or classes that should be used.intervals or classes that should be used.

Too many intervals – the data may not be Too many intervals – the data may not be summarized enough for a clear summarized enough for a clear visualization of how they are distributed.visualization of how they are distributed.

Too few intervals – the data may be over-Too few intervals – the data may be over-summarized and some of the details of the summarized and some of the details of the distribution may be lost.distribution may be lost.

Page 35: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 35

Pictures of Data: HistogramsPictures of Data: Histograms Blood pressure data on a sample of 113 menBlood pressure data on a sample of 113 men

Histogram of the Systolic Blood Pressure for 113 men. Each bar spans a width of 5 mmHg on the horizontal axis. The height of each bar represents the number of individuals with SBP in that range.

05

1015

20N

um

be

r o

f M

en

80 100 120 140 160Systolic BP (mmHg)

Page 36: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 36

Another histogram of the blood pressure of 113 men. In this graph, each bar has a width of 20 mmHg, and there are a total of only 4 bars making it hard to characterize the distribution of blood pressures in the sample.

Pictures of Data: Pictures of Data: HistogramsHistograms

020

4060

Nu

mb

er

of

Me

n

80 100 120 140 160Systolic BP (mmHg)

Page 37: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 37

Yet another histogram of the same BP information on 113 men. Here, the bin width is 1 mmHg, perhaps giving more detail than is necessary.

Pictures of Data: Pictures of Data: HistogramsHistograms

02

46

Nu

mb

er

of

Me

n

80 100 120 140 160Systolic BP (mmHg)

Page 38: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 38

Width of IntervalsWidth of Intervals

Without some specific reason (i.e. showing Without some specific reason (i.e. showing infant death) the intervals should all be the infant death) the intervals should all be the same width.same width.

Common width =W=Common width =W=

R = range of the dataR = range of the data k = the number of intervalsk = the number of intervals

k

R

Page 39: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 39

Consideration when Consideration when Determining WidthDetermining Width

Width should be chosen so that it is convenient Width should be chosen so that it is convenient to use or easy to recognize (multiples of 5 or 1).to use or easy to recognize (multiples of 5 or 1).

The beginning of the first interval must be low The beginning of the first interval must be low enough so that the first interval includes the enough so that the first interval includes the smallest observation. smallest observation.

If the data has If the data has xx decimal places, the interval decimal places, the interval limits should also have limits should also have xx decimal places. decimal places.

Page 40: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 40

Data ExampleData Example

Weight in pounds of 57 school children at a Weight in pounds of 57 school children at a day-care center:day-care center:

6868 6363 4242 2727 3030 3636 2828 3232 79792727

2222 23 23 2424 2525 4444 6565 4343 2323 7474 5151

3636 42 42 2828 3131 2828 2525 4545 1212 5757 5151

12 12 32 32 4949 3838 4242 2727 3131 5050 3838 2121

16 16 24 24 6969 4747 2323 2222 4343 2727 4949 2828

2323 19 19 4646 3030 4343 4949 1212

Page 41: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 41

Data Example – Step 1Data Example – Step 1

From the data we have:From the data we have: Minimum = 12Minimum = 12 Maximum = 79Maximum = 79 R = 79-12 = 67R = 79-12 = 67

If we use k=5 and 15 we get:If we use k=5 and 15 we get: W= 69/5 = 13.4W= 69/5 = 13.4 W= 69/15 = 4.5 W= 69/15 = 4.5

Since the dataset is not large, we will Since the dataset is not large, we will choose w=10 to have fewer intervals.choose w=10 to have fewer intervals.

Page 42: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 42

Data Example – Step 2Data Example – Step 2

Next we have to construct the intervals.Next we have to construct the intervals. With w = 10 and minimum=12 choose the With w = 10 and minimum=12 choose the

first interval to start at 10.first interval to start at 10.INTERVALS (in lbs): INTERVALS (in lbs): 10-1910-19

20-2920-29 30-3930-39 40-4940-49 50-5950-59 60-6960-69 70-7970-79

Page 43: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 43

Data Example – Step 3Data Example – Step 3

Examine the values one at a time and tally the Examine the values one at a time and tally the number in each interval.number in each interval.

Page 44: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 44

Data Example – Step 4Data Example – Step 4

Calculate Relative Frequencies:Calculate Relative Frequencies:

Relative freq. = Relative freq. = frequency in intervalfrequency in interval

# obs in dataset# obs in dataset

Page 45: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 45

Frequency TableFrequency TableWeight Weight Interval Interval

(lbs)(lbs)

FrequencyFrequency Relative Relative Frequency Frequency

(%)(%)

10-1910-19

20-2920-29

30-3930-39

40-4940-49

50-5950-59

60-6960-69

70-7970-79

TotalTotal

Page 46: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 46

Frequency TableFrequency TableWeight Weight Interval Interval

(lbs)(lbs)

FrequencyFrequency Relative Relative Frequency Frequency

(%)(%)

10-1910-19 55

20-2920-29 1919

30-3930-39 1010

40-4940-49 1313

50-5950-59 44

60-6960-69 44

70-7970-79 22

TotalTotal 5757

Page 47: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 47

Frequency TableFrequency TableWeight Weight Interval Interval

(lbs)(lbs)

FrequencyFrequency Relative Relative Frequency Frequency

(%)(%)

10-1910-19 55 8.88.8

20-2920-29 1919 33.333.3

30-3930-39 1010 17.517.5

40-4940-49 1313 22.822.8

50-5950-59 44 7.07.0

60-6960-69 44 7.07.0

70-7970-79 22 3.53.5

TotalTotal 5757 100100

Page 48: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 48

HistogramHistogramWeights of Daycare Children

0

2

4

6

8

10

12

14

16

18

20

10-19 20-29 30-39 40-49 50-59 60-60 70-79

Weight Range

Nu

mb

er

of

Ch

ild

ren

•Horizontal scale represents the value of the variable

•The vertical scale represents the frequency or relative frequency in each interval

•Rectangular bars are joined together

Page 49: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 49

Consider DistibutionsConsider Distibutions If the data are homogeneous, the graphs If the data are homogeneous, the graphs

usually show a unimodal pattern with one usually show a unimodal pattern with one peak in the middle. peak in the middle.

The plots can be used to determine if the The plots can be used to determine if the data is symmetric. A symmetric distribution data is symmetric. A symmetric distribution is one in which the distribution has the same is one in which the distribution has the same shape on both sides of the peak. shape on both sides of the peak.

Page 50: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 50

Shapes of the Shapes of the DistributionDistribution

Three common shapes of frequency Three common shapes of frequency distributions: distributions:

Symmetrical and bell shaped

Positively skewed or skewed to the right

Negatively skewed or skewed to the left

A B C

Page 51: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 51

Shapes of the Shapes of the DistributionDistribution

Three less common shapes of frequency Three less common shapes of frequency distributions: distributions:

Bimodal ReverseJ-shaped

Uniform

A B C

Page 52: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 52

Page 53: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 53

TimeplotsTimeplots

Data is displayed over time. Data is displayed over time. Data may show seasonal, yearly or Data may show seasonal, yearly or

changes in environment over time.changes in environment over time. Timeplot data can give different Timeplot data can give different

impressions depending on the scales used impressions depending on the scales used on the x and y axis.on the x and y axis.

Page 54: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 54

Timeplots:Timeplots:An ExampleAn Example

Time series data can display effects of Time series data can display effects of changes in government policy. The table changes in government policy. The table shows the data on motor vehicle deaths in shows the data on motor vehicle deaths in the U.S. (death rate per 100 million miles the U.S. (death rate per 100 million miles driven).driven).

Page 55: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 55

Timeplots:Timeplots:An ExampleAn Example

YearYear RateRate YearYear RateRate YearYear RateRate YearYear RateRate

19601960 5.15.1 19701970 4.74.7 19801980 3.33.3 19901990 2.12.1

19621962 5.15.1 19721972 4.34.3 19821982 2.82.8 19921992 1.71.7

19641964 5.45.4 19741974 3.53.5 19841984 2.62.6 19941994 2.72.7

19661966 5.55.5 19761976 3.23.2 19861986 2.52.5 19961996 1.71.7

19681968 5.25.2 19781978 3.33.3 19881988 2.32.3 19981998 1.61.6

Page 56: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 56

Timeplots:Timeplots:An ExampleAn Example

Death Rates Over T ime

0

1

2

3

4

5

6

1960 1964 1968 1972 1976 1980 1984 1988 1992 1996

Year

Dea

th R

ate/

100

Mill

ion

M

iles

Page 57: Biostatistics I PubH 6450 Fall 2005. 2 June 2015 2 PubH 6450 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: 3:20pm.

April 18, 2023 57

Timeplots:Timeplots:An ExampleAn Example

1.1. During these years, safety requirements During these years, safety requirements for motor vehicles became stricter and for motor vehicles became stricter and interstate highways replaced old roads.interstate highways replaced old roads.

2.2. In 1974 the national speed limit was In 1974 the national speed limit was lowered to 55 miles an hour. In the mid lowered to 55 miles an hour. In the mid 1980’s most states raised speed limits to 1980’s most states raised speed limits to 65 miles an hour. Some say lower 65 miles an hour. Some say lower speed limits saved lives. Is this evident speed limits saved lives. Is this evident in our plot?in our plot?