MD 5108 Biostatistics for Basic Research Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764...
-
Upload
jeremy-williamson -
Category
Documents
-
view
214 -
download
1
Transcript of MD 5108 Biostatistics for Basic Research Lecturer: Dr K. Mukherjee Office: S16-06-100 Tel: 874 2764...
MD 5108Biostatistics for Basic Research
Lecturer: Dr K. Mukherjee
Office: S16-06-100
Tel: 874 2764
Email: [email protected]
• explore and present data using tables, charts and graphs• ability to do simple statistical calculations with a calculator• carry out data analysis using a statistical package such as SPSS• pick the right procedure for analysing a set of data• interpret results correctly and report findings• avoid misuse and abuse of statistics• understand statistical contents of papers in medical journals• judge claims and statements critically• discuss and communicate ideas in a quantitative manner
ObjectivesObjectivesTo train practitioners of the biomedical sciences in
the use and interpretation of statistical data analysis.
Teaching approach• nonmathematical introduction• explanation of concepts rather than proofs• emphasis on methodology and procedures• emphasise use of statistical package rather than manual calculation• emphasis on choosing the right procedure• emphasis on correct interpretation of results• examples from clinical research literature
Topic 1: What is statistics?“A branch of mathematics dealing with the analysis and interpretation of masses of numerical data” Merrian-Webster Dictionary
“The field of study that involves the collection and analysis of numerical facts or data of any kind” Oxford Dictionary
“The study of how information should be employed to reflect on, and give guidance for action, in a practical situation involving uncertainty” Vic Barnett
Biostatistics: Application of statistical methods to biological, medicine and health sciences
Why the need for Statistics in Biomedicine ?
Two main reasons:• Variation
– attributes differ not only among individuals but also within the same individual over time
• Sampling– biomedical research projects mostly carried out on
small numbers of study subjects– challenging problem to project results from small
samples studies to individuals at large
Biological Variation
Necessitates the use of statistical methods in biomedicine to put numerical data into a context by which we can better judge their meaning
From sample to population
Statistical methods used to produce statistical inferences about a population based on information from a sample derived from that population
Population
sample
inductive statistical methods
Altman (1991) Practical Statistics for Medical Research, Chapman and Hall.
Bailar & Mosteller (1986) Medical Uses of Statistics, NEJM Books.
Many studies have been done
on misuse of statistics in medicine
From Altman (1991)
Schor and Karten (1966, J. Am. Med. Assoc.):
• 149 papers classed as “analytical studies” in 3 issues of 11 most frequently read medical journals
• assessment criteria:Validity with respect to:
• Design of experiment?
• Type of analysis performed?
• Applicability of statistical test used?
Findings of Schor and Karten:
• 28% of papers acceptable
• 68% deficient but acceptable if reviewed
• 4% unsalvageable
Lesson:CARE
must be exercised when reading scientific papers in biomedical journals!Knowledge of basic biostatistics is required
“ There are three kinds of lies: lies, damned lies and statistics” Benjamin Disraeli
“ It is easy to lie with statistics, but it is easier to lie without them” Frederick Mosteller
“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” H.G. Wells
1. Descriptive statistical methods data collection and organization summarizing data and describing its characteristics presentation and publication 2. Exploratory data analysis play around and get a feel of the data preliminary analysis, often graphical looking for patterns and possible relationships are assumptions satisfied? which model and procedure to use?
Types of statistical methods
3. Inductive (inferential) statistical methods
• estimation, confidence intervals• hypothesis testing• prediction, forecasting• classification
Statistical inferences about a population based on information from a sample derived from that population Population
sample
inductive statistical methods
Sources of data, the raw materials of statistics Routinely kept records, e.g., hospital medical records Surveys Experiments Clinical trials Data base Published reports
Topic 2: Types of data
Any characteristic that can be measured or classified into categories is called a variable
(1) Qualitative variables cannot be measured numerically categorical in nature, e.g., gender categories must not overlap and must cover all possibilities
Types of variables
Nominal variables (No inherent ordering of categories) M/F, Yes/No Blood group (A, B, AB, O) Ethnic group (Chinese, Malay, Indian, Others)
Ordinal variables (Categories are ordered in some sense) response to treatment: unimproved, improved, much improved pain severity: no pain, slight pain, moderate pain, severe pain
(2) Quantitative variables can be measured numerically, e.g., weight, height, concentration can be continuous or discrete a continuous variable can take on any value (subject to
precision of measuring instrument) within some range or interval, e.g., weight, height, blood pressure, cholesterol level a discrete variable is usually a count of something and hence takes on integer values only, e.g., number of admissions to NUH Variable types and measurement types have implications on how data should be displayed or summarized determines the kind of statistical procedures that should be used
Variable
Qualitativeor categorical
Quantitativemeasurement
Nominal(not ordered)e.g. ethnic group
Ordinal(ordered)e.g. response to treatment
Discrete(count data)e.g. numberof admissions
Continuous(real-valued)e.g. height
Types of variables
Measurement scales
SUMMARY
Let data speak for itself Get a good feel of the data before formal analysis Graphs and plots easier to understand and interpret Reveal patterns in data which may shed light on the appropriate model/analysis to use
e.g., Skewed or symmetric distribution Multiple peaks / mode Are there any outliers ? Relatioship between variables.
Topic 3: Presenting data graphicallyAdvantages of graphical data display
Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA
0
5
10
15
20
25
30
35
Region
% o
f wor
ld s
pend
ings
Bar chart for world pharmaceutical spendings, 1997
Graphs for categorical data
USA (34, 34.0%)
Australasia ( 1, 1.0%)
Europe (29, 29.0%)Af rica ( 1, 1.0%)
Japan (16, 16.0%)
Latin Americ ( 8, 8.0%)
Middle East ( 2, 2.0%)
SE Asia & Ch ( 7, 7.0%)
Canada ( 2, 2.0%)
Pie chart for world pharmaceutical spendings, 1997
Africa
Australasia
Canada Europe
Japan
Latin America Middle East
SE Asia & Chin
USA
0
10
20
30
40
50
60
70
80
90
100
% o
f wor
ld s
pend
ing
Segmented bar chart for world pharmaceutical spending, 1997
Africa
Australasia
Canada Europe
Japan
Latin America Middle East
SE Asia & Chin
USA
100
90
80
70
60
50
40
30
20
10
0
Sum
of %
of w
orld
spe
ndin
g
USA (34, 34.0%)
Europe (29, 29.0%)
Australasia ( 1, 1.0%)
Japan (16, 16.0%)
Af rica ( 1, 1.0%)
Canada ( 2, 2.0%)
SE Asia & Ch ( 7, 7.0%)
Middle East ( 2, 2.0%)
Latin Americ ( 8, 8.0%)
World pharmaceutical spending, 1997
Africa Australasia Canada Europe Japan Latin America Middle East SE Asia & China USA
0
5
10
15
20
25
30
35
Region
% o
f wor
ld s
pend
ing
Bar chart for world pharmaceutical spendings, 1997
Comparison of methods Bar charts can be read more accurately and offer better distinction between close together values Pie charts especially useful for showing percentage distribution Pie charts can display large and small % simultaneously without scale break A single bar chart is preferable to a single segmented bar chart A series of segmented bar charts is easier to read than a series of pie charts or ordinary bar charts
PharmacistsNursesDoctorsDentists
6000
5000
4000
3000
2000
1000
0
Profession
Num
ber
of w
orke
rs
Bar chart for number of health professionals
Private
Public
PharmacistsNursesDoctorsDentists
6000
5000
4000
3000
2000
1000
0
Profession
Num
ber
of w
orke
rs
Stacked bar chart for number of health professionals
Variation of the basic bar chart
Private
Public
Dentists Doctors Nurses Pharmacists
0
1000
2000
3000
4000
Profession
Num
ber
of w
orke
rsClustered bar chart for number of health professionals
Private
Public
PharmacistsNursesDoctorsDentists
100
90
80
70
60
50
40
30
20
10
0
Profession
Per
cent
by
sect
orSegmented bar charts by profession
Private
Public
PharmacistsNursesDoctorsDentists
4000
3000
2000
1000
0
Profession
Num
ber
of w
orke
rs
Clustered bar chart for number of health professionals
Private
Public
Dentists Doctors Nurses Pharmacists
0
1000
2000
3000
4000
5000
6000
Profession
Num
ber
of w
orke
rs
Stacked bar chart for number of health professionals
Private
Public
Dentists Doctors Nurses Pharmacists
0
10
20
30
40
50
60
70
80
90
100
Profession
Per
cent
by
sect
or
Segmented bar charts by profession
Dentists
Doctors
Nurses Pharmacists
Private Public
0
1000
2000
3000
4000
Sector
Num
ber
of w
orke
rs
Clustered bar charts of number of health professionals
Plotting by sector rather than by profession Look at the data from a different angle Highlight different aspects of the data
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
6000
5000
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rsStacked bar charts by sector
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
100
90
80
70
60
50
40
30
20
10
0
Sector
Per
cent
with
in s
ecto
rPercentage bar charts by sector
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
100
90
80
70
60
50
40
30
20
10
0
Sector
Per
cent
with
in s
ecto
rSegmented bar charts by sector
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rsClustered bar chart of number of health professionals
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
6000
5000
4000
3000
2000
1000
0
Sector
Num
ber
of w
orke
rs
Stacked bar charts by sector
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
100
90
80
70
60
50
40
30
20
10
0
Sector
Per
cent
with
in s
ecto
r
Percentage bar charts by sector
Dentists
Doctors
Nurses Pharmacists
PublicPrivate
100
90
80
70
60
50
40
30
20
10
0
Sector
Per
cent
with
in s
ecto
r
Segmented bar charts by sector
A back to back bar chart
Source: JAMA, 1978, vol 239, no 21
Comparison of methodsStacked bar chart is also a bar chart for the combined dataSome of the bars in a stacked bar chart are not alignedBars in clustered bar charts are aligned but it is harder to visualize how the component bars would stack upBack to back bar charts are applicable when there are 2 groups only, the aggregated bars are not alignedSeries of stacked or segmented bar charts useful in showing time trend
Time Trend
Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0
Stacked bar chart of yearly mortality rate per 1000 births
Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.
Response under two treatments
Response to Treatment
NonePartial
Complete
Total
A
3159
27
B
22230
54
Treatment
A
B
CompletePartialNone
30
20
10
0
Response to treatment
Fre
quen
cy
A misleading bar chart
By design, there are twice as many patients receiving treatment B
None
Partial
Complete
BA
100
90
80
70
60
50
40
30
20
10
0
Treatment
With
in tr
eatm
ent p
erce
ntag
e
treatmentResponse to
Can compare the response type percentages for the two treatments
NonePartialComplete
A B
0
10
20
30
40
50
60
70
80
90
100
Treatment
With
in tr
eatm
ent p
erce
ntag
e
treatmentResponse to
Stacked bar charts for percentage figures
Graphs for quantitative data Histogram Frequency polygon Box plot
HistogramDivide the range of the data into a suitably chosen number of intervals/bins, all of the same widthThe number of observations that fall within each interval is plotted
Relative frequency histogramPlot the proportions of observations that fall within the class intervals
Wild & Seber (2000) Chance Encounters, Wiley.
40 60 80 100 120 140 160 180 200 220
0
10
20
SysVol
Fre
quen
cy
Heart Attack PatientsHistogram of End-Systolic Volume for 45 Male
40 60 80 100 120 140 160 180 200 220
0
10
20
30
40
SysVol
Per
cent
Relative frequency polygon for SysVol
Comparison of methods
Histogramgood at revealing distributional shape such as symmetry, skewness, number of peaks etcdifficult to superimpose or draw side by side
Frequency polygons can be superimposed for easy comparison
Wild & Seber (2000, p.59)
Can be superimposed
Pagano & Gauvreau (1999)
Wild & Seber (2000)
The median is the middle value (if n is odd) or the average of the two middle values (if n is even), it is a measure of the “center” of the data
Quartiles: dividing the set of ordered values into 4 equal parts
Q1 Q2 Q3
first 25% second 25% third 25% fourth 25%
Q2 = second quartile = median
Median and quartiles
Sort the data in increasing order
IQR = Interquartile range = 13 QQ
Box plotDraw a box from the lower quartile to the upper quartile and a line to mark the position of the medianExtend from both edges of the box by 1.5 IQR, pull back the lines until they hit observationObservations more than 1.5 IQR away from the lower or upper quartile are marked out as outside values for further investigation and checking
How a boxplot is constructed (Wild & Seber, 2000, p.73)
5-Number Summary: min, lower quartile, median, upper quartile, max
20015010050
SysVol
a measure of the size of the heartDotplot for SysVol = End-systolic volume,
22012020
SysVol
Boxplot for SysVol
Advantages of box plotquick visual summary of a data setcapture prominent features like location, spread, skewness and outlierscan easily draw a series of box plots side by side; not so for histograms
Brand nameType Taste $/oz $/lbProt Cal Sod Prot/FatHappy Hill SupersBeef Bland 0.11 14.23 186 495 1Georgies Skinless BeefBeef Bland 0.17 21.7 181 477 2Special Market's Premium BBeef Bland 0.11 14.49 176 425 1Spike's BeefBeef Medium 0.15 20.49 149 322 1Hungry Hugh's Jumbo BeefBeef Medium 0.1 14.47 184 482 1Great Dinner BeefBeef Medium 0.11 15.45 190 587 1RJB Kosher BeefBeef Medium 0.21 25.25 158 370 2Wonder Kosher Skinless BeeBeef Medium 0.2 24.02 139 322 2Happy Fats Jumbo BeefBeef Medium 0.14 18.86 175 479 1Midwest BeefBeef Medium 0.14 18.86 148 375 1General Kosher BeefBeef Medium 0.23 30.65 152 330 1Wall's Kosher Beef Lower FBeef Medium 0.25 25.62 111 300 3Hickory Natural SmokeBeef Medium 0.07 8.12 141 386 2Smith BeefBeef Medium 0.09 12.74 153 401 1Premium BeefBeef Medium 0.1 14.21 190 645 1Family StoreSkinless BeefBeef Medium 0.1 13.39 157 440 1Sam's Kosher BeefBeef Medium 0.19 22.31 131 317 2Hammer BeefBeef Medium 0.11 19.95 149 319 1Athens BeefBeef Medium 0.19 22.9 135 298 2Regents Kosher BeefBeef Scrumpt. 0.17 19.78 132 253 2Really Big Meat Bland 0.12 14.86 173 458 2Biggest JumboMeat Bland 0.12 17.32 191 506 1Home MadeMeat Bland 0.12 15.2 182 473 1Martha's Jumbo DinnerMeat Bland 0.1 14.01 190 545 1Hammer PremiumMeat Bland 0.11 13.92 172 496 2Willie's WienersMeat Bland 0.13 18.24 147 360 1Premium Hot DogsMeat Medium 0.1 14.12 146 387 1Airport WienersMeat Medium 0.09 11.83 139 386 2Judy's Favorite JumbosMeat Medium 0.11 15.41 175 507 1Stick Lean Supreme JumboMeat Medium 0.15 17.4 136 393 3Stick JumboMeat Medium 0.13 17.32 179 405 1Fat Jack JumboMeat Medium 0.1 15.61 153 372 1Thin Jack VealMeat Medium 0.18 20.4 107 144 3Top Grade Hot DogsMeat Medium 0.09 12.65 195 511 1Blended w/Chicken&BeefMeat Scrumpt. 0.07 11.17 135 405 1Heaven MadeMeat Scrumpt. 0.08 11.75 140 428 1Baked and SmokedMeat Scrumpt. 0.06 9.49 138 339 1Smart Person ChickenPoultry Bland 0.08 10.21 129 430 2Woods Park ChickenPoultry Medium 0.05 6.37 132 375 2Tony TurkeyPoultry Medium 0.07 8.42 102 396 3Rose Garden TurkeyPoultry Medium 0.08 9.37 106 383 3Low Fat TurkeyPoultry Medium 0.08 9 94 387 4Special Market's TurkeyPoultry Medium 0.07 8.07 102 542 5Caloryless TurkeyPoultry Medium 0.09 9.39 90 359 5Heaven Made Lower FatPoultry Medium 0.06 6.59 99 357 4McDowell's Jumbo ChickenPoultry Medium 0.07 8.43 107 528 2
DatasetHotdogs
Graphical Analysis of the “Hotdogs” data.
Parallel Box plots Can Be Quite Revealing
Reduction in concentration through timeHigher during winter monthsSkewed toward higher valueSpread increases with level
1969 1972
(Parallel histograms much harder to visualise)
Rice (1995) Mathematical Statistics & Data Analysis, Duxbury Press.