Overview of Linear Models Webinar : Tuesday, May 22, 2012
description
Transcript of Overview of Linear Models Webinar : Tuesday, May 22, 2012
1
Overview of Linear ModelsWebinar: Tuesday, May 22, 2012
Deborah Rosenberg, PhDResearch Associate ProfessorDivision of Epidemiology and BiostatisticsUniversity of IL School of Public Health
Training Course in MCH Epidemiology
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
2
Training Course in MCH EPI, 2012
Course Topics Focusing on Multivariable Regression
• Model Building Approaches• Modeling Ordinal and Nominal Outcomes• Multilevel Modeling• Trend Analysis• Population Attributable Fraction• Propensity Scores• Modeling Risk Differences
We need to have some perspective ...
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Introduction
So, let's keep this in mind:
"...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human RightsJonathan M. MannThe Hastings Center Report , Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13Published by: The Hastings Center
3
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Introduction
Multivariable analysis implies acknowledging and accounting for the intricacies of the real world reflected in the relationships among a set of variables
Multivariable analysis is complex, particularly with observational as opposed to experimental data.
The accuracy of estimates from multivariable analysis and therefore the accuracy of conclusions drawn and any public health action taken is dependent on the application of appropriate analytic methods.
4
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Introduction
The challenge for an MCH epidemiologist goes beyond carrying out complex multivariable analysis to include:
advocating for and facilitating the routine incorporation of complex multivariable methods into the work of public health agencies, and
guiding interpretation of findings working to design reporting templates working to build dissemination strategies working to link findings with action plans or
policy recommendations5
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
6
Basic Components of Any Statistical Analysis
1. Sample statistic(s) (observed value(s))
2. Population parameter(s) (expected value(s))
3. Sample Size
4. Sample variance(s)/standard error(s)
5. Critical values from the appropriateprobability distribution
, p, r
Review of the Basics
, ,
n
z, t, chi-square, F
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
7
Review of the Basics
The study design and the sampling strategy—cohort, case-control, cross-sectional, longitudinal, etc. will have an impact on the statistical analysis that can be carried out:
Which measures of occurrence can be reported Which measures of association can be reported How will standard errors for confidence intervals and
statistical testing be calculated
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
8
Review of the Basics
Measures of Occurrence
Means summarize continuous variables and are assumed to follow a normal distribution.
Proportions summarize discrete variables and are assumed to follow the Binomial distribution.
Some proportions are also said to be Poisson distributed if the numerator is very small compared to the denominator.
Rates, also based on discrete variables, are typically said to be Poisson distributed.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
9
Review of the Basics
Measures of AssociationDifference Measures
Between two or more means Between two or more proportions (attributable risk) Between a mean & a standard Between a proportion & a standard
Ratio Measures Relative Risk / Relative Prevalence Odds Ratio Rate Ratio / Hazard Ratio
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
10
Review of the Basics
The 2x2 table—framework for constructing the ratio measures
RR and RPa
a bc
c d
a
cn
or
2
n rr
pp
1 1
2
1
2
Disease or Other Health Outcome Yes No
Yes a b a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a+b+c+d N
OR abcd
adbc
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
11
Review of the Basics
Assessing the Accuracy of StatisticsWe use probability distributions to evaluate how close or far from the “truth” our statistics are by calculating a range of values which includes the “true” population value with a given probability. This range is a confidence interval, and can be calculated around both measures of occurrence, e.g. incidence or prevalence, and measures of association, e.g. odds ratios or relative risks..
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
12
Review of the Basics
Tests of Statistical Significance
Confidence intervals around measures of association provide evidence for or against equality.
Statistical tests go beyond this by generating a specific probability that a given difference we see in our sample is due solely to chance imposed by the sampling process.
This probability is the p-value.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
13
Review of the Basics
We again use probability distributions to formally test hypotheses about sample statistics.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Review of the Basics
Multivariable modeling should be the culmination of an analytic strategy that includes articulating a conceptual framework and carrying out preliminary analysis.
BEFORE any multivariable modeling—• Select variables of interest• Define levels of measurement, sometimes more than
once, for a given variable• Examine univariate distributions• Examine bivariate distributions
14
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Review of the Basics
BEFORE any multivariable modeling—
• Perform single factor stratified analysis to assess confounding and effect modification
• Rethink variables and levels of measurement• Perform multiple factor stratified analysis with
different combinations of potential confounders / effect modifiers
These steps should never be skipped!
15
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
16
With confounding, the association between a risk factor and a health outcome is the same (or close to the same) in each stratum, but the adjusted association differs from the crude.
With effect modification, the association between a risk factor and a health outcome varies from stratum to stratum.
Confounding Effect Modification
Compare crude v. adjusted OR/RR
Compare stratum-specific OR/RR
No statistical testing Statistical testing
Review of the Basics
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
17
Review of the Basics
Assessing Effect Modification
• Stratified Analysis: Are the stratum-specific measures of association different (heterogeneous)?
• Regression Analysis: Is the beta coefficient resulting from the multiplication of two variables large?
Regardless of the method, if the stratum-specific estimates differ, then reporting a weighted average will mask the important stratum-specific differences.
Stratum-specific differences can be statistically tested.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
18
Review of the Basics
Assessing Confounding
• Standardization: Does the standardized measure differ from the unstandardized measure?
• Stratified Analysis: Does the adjusted measure of association differ from the crude measure of association?
• Regression Analysis: Does the beta coefficient for a variable in a model that includes a potential confounder differ from the beta coefficient for that same variable in a model that does not include the potential confounder?
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
19
Review of the Basics
Assessing Confounding
Regardless of the method, if the adjusted estimate differs from the crude estimate of association, then confounding is present.
Determining whether a difference between the crude and adjusted measures is meaningful is a matter of judgment, since there is no formal statistical test for the presence of confounding.
By convention, epidemiologists consider confounding to be present if the adjusted measure of association differs from the crude measure by >= 10%
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
20
Review of the BasicsMoving toward Multivariable Modeling:
Jointly Assessing a Set (but which set?) of Variables
“A sufficient confounder group is a minimal set of one or more risk factors whose simultaneous control in the analysis will correct for joint confounding in the estimation of the effect of interest. Here, 'minimal' refers to the property that, for any such set of variables, no variable can be removed from the set without sacrificing validity.”Kleinbaum, DG, Kupper, LL., Morgenstern,H. Epidemiologic Research: Principles and Quantitative Methods, Nostrand Reinhold Company, New York, 1982, p 276.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
21
Linear Models: General Considerations
The most common regression models used to analyze health data express the hypothesized association between risk or other factors and an outcome as a linear (straight line) relationship:
Dependent Var. = ------Independent Variables------
This equation is relevant to any linear model; what differentiates one modeling approach from another is
the structure of the outcome variable, and the corresponding structure of the errors.
iikk2i21i10i XXXOutcome
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
22
Linear Models: General Considerations
The straight linerelationship includesan intercept and oneor more slope parameters.
The differences between the actual datapoints and the regression line are the errors.
iikk2i21i10i XXXOutcome
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Linear Models: General Considerations
Regression analysis is an alternative to and an extension of simpler methods used to test hypotheses about associations:
For means, regression analysis is an extension of t-tests and analysis of variance.
For proportions or rates,, regression analysis is an extension of chi-square tests from contingency tables – crude and stratified analysis.
23
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Linear Models: General Considerations
Why not just do stratified analysis? Why Use Regression Modeling Approaches?
Unlike stratified analysis, regression approaches:
1. more efficiently handle many variables and the sparse data that stratification by many factors may imply
2. can accommodate both continuous and discrete variables, both as outcomes and as independent variables. 24
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Linear Models: General Considerations
Unlike stratified analysis, regression approaches:
3. allow for examination of multiple factors (independent variables) simultaneously in relation to an outcome (dependent variable)—all variables can be considered "exposures" or "covariates" depending on the hypotheses
4. provide more flexibility in assessing effect modification and controlling confounding.
25
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Linear Models: General Considerations
The Purpose of Modeling
Sometimes, regression modeling is carried out in order to assess one association; other variables are included to adjust for confounding or account for effect modification. In this scenario, the focus is on obtaining the ‘best’ estimate of the single association.
Sometimes, regression modeling is carried out in order to assess multiple, competing exposures, or to identify a set of variables that together predict the outcome.
26
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
27
Linear Model: General Considerations
The utility of regression models is their ability to simultaneously handle many independent variables.
Models may be quite complex, including both continuous and discrete measures, and measures at the individual level and/or at an aggregate level such as census tract, zip code, or county.
Interpretation of the slopes or “beta coefficients” can be equally complex as they reflect measures of occurrence (means, proportions, rates) or measures of association (odds ratios, relative risks rate ratios) when used singly or in combination.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
28
Linear Models: General Considerations
The Traditional, 'Normal' Regression Model
This model has the following properties: The outcome "Y" is continuous & normally distributed. The Y values are independent. The errors are independent, normally distributed; their
sum equals 0, with constant variance across levels of X. The expected value (mean) of the Y's is linearly related to
X (a straight line relationship exists).
iikk2i21i10i XXXY
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
29
Linear Models: General Considerations
When the outcome variable is not continuous and normally distributed, a linear model cannot be written in the same way, and the properties listed above no longer pertain.
For example, if the outcome variable is a proportion or rate:
The errors are not normally distributed The variance across levels of X is not constant. (By
definition, p(1-p) changes with p and r changes with r). The expected value (proportion or rate) is not linearly
related to X (a straight line relationship does not exist).
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
30
When an outcomeis a proportion or rate,its relationship witha risk factors is not linear.
Linear Models: General Considerations
0.0
0.2
0.4
0.6
0.8
1.0
Proportion with the outcome
x
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
31
Linear Models: General Considerations
General Linear ModelsHow can a linear modeling approach be applied to the many health outcomes that are proportions or rates?
The normal, binomial, Poisson, exponential, chi-square, and multinomial distributions are all in the exponential family.
Therefore, it is possible to define a “link function” that transforms an outcome variable from any of these distributions so that it is linearly related to a set of independent variables; the error terms can also be defined to correspond to the form of the outcome variable.
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
32
Linear Models: General Considerations
General Linear Models
Some common link functions:•identity (untransformed)•natural log•logit•cumulative logit•generalized logit
The interpretation of the parameter estimates—the beta coefficients—changes depending on whether and how the outcome variable has been transformed (which link function has been used).
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
33
Linear Models:General Considerations
Linear equation
The logit link function:(logistic regression)
Non-linear equation
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
34
Linear Models:General Considerations
The natural log link function:log-binomial or Poisson regression with count data
Non-linear model
The linear model Xbbrln
ern
countnecount
10
Xbb
Xbb
10
10
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
35
Linear Models: General Considerations
'Normal' Regression—Link=Identity, Dist=Normal
Logistic Regression—Link=Logit, Dist=Binomial
Log-Binomial or Poisson Regression with Count Data—Link=Log, Dist=Binomial or Dist=Poisson
kk22110 XXX lnor ln
iikk2i21i10i XXXY
kk22110 XXX1
ln
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Ordinal and Nominal Model For an ordinal outcome For a nominal outcomewith k+1 categories with k+1 categories
Both the numerator and Fixed denominatordenominator change (reference) category
http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html 36
1k21
1k21k21
21
2121
1
11
p1plnOddsln
p1plnOddsln
p1plnOddsln
1k21
kk
1k21
22
1k21
11
p1plnOddsln
p1plnOddsln
p1plnOddsln
Linear Models: General Considerations
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Some Models with Correlated Errors
Mixed Models
♦ Multilevel/clustered data♦ Repeated measures/longitudinal data♦ Matched data♦ Time series analysis♦ Spatial analysis
37
Linear Models: General Considerations
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Some Other Multivariable Statistical Approaches
● Survival Analysis—censored dataParametricSemi-parametric / proportional hazards
● Structural Equation Modeling / mediation analysis—exploring causal pathways
● Bayesian modeling38
Linear Models: General Considerations
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
39
Regression Modeling Results
Measures of Occurrence Predicted Values: Crude, Adjusted, or Stratum-Specific
The predicted values are points on the regression line given particular values of the set of independent variables
‘Normal’ model yields meansLogistic model yields ln(odds)Binomial / Poisson models yield ln(proportions / rates)
Linear Models: General Considerations
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
40
Linear Models:General Considerations
Regression Modeling Results
Measures of AssociationBeta coefficients: Crude, Adjusted, or Stratum-Specific
The measures of association are comparisons of points on the regression line at differing values of the independent variables
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
41
Linear Models:General Considerations
Regression Modeling ApproachesMeasures of Association
‘Normal” regressionDifferences between means
Log-Binomial or Poisson regressionDifferences between log proportions:
Relative Risk / Relative Prevalence
Logistic regression(binary, cumulative, generalized)
Differences between log odds:Odds Ratio(s) for—
a single binary outcome a set of binary outcomes an ordinal outcome
Binomial RegressionDifferences between proportions:
Risk Differences / Attributable Risks
Poisson regression (person-time data)
Differences between log rates: Rate Ratio
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
42
Regression Modeling Results
Measures of Association
General Form of Confidence Intervals and Hypothesis Testing for a Simple Comparison—
a Single Beta Coefficient
Linear Models: General Considerations
Beta ObservedError StandardBeta Expected Beta ObservedStatisticTest
12/11 b.e.szdiffbCI
43
Common Linear Regression Models
Examples with Smoking and Birthweight
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
44
B ValueA Valueb
B ValuebbA Valuebb
YY
1
1010
B ValuexA Valuex
‘Normal’ Regression
Predicted Values (Means): Predicted values use the
entire regression equation,including the intercept.
Measures of Association (Differences Between Means):
When comparing two predicted values—ameasure of association—the intercept terms cancel out.
A ValuebbY 10A Valuex
B ValuebbY 10B Valuex
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
‘Normal’ Regression
in SAS/* Continuous Birthweight, OLS Regression */proc reg data=one; model dbirwt = smoking; run;proc reg data=one; model dbirwt = smoking late_no_pnc; run;
/* Continuous Birthweight, Regression Using ML */proc genmod data=one; model dbirwt = smoking / link=identity dist=normal; run;proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=normal; run;
45
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
‘Normal’ Regression
Descriptive Statistics and Simple t-test for Smoking and Birthweight
46
The TTEST Procedure Variable: DBIRWT (Birth Weight Detail in Grams) smoking N Mean Std Dev Std Err DF t Value Pr > |t| yes 9259 3155.9 575.6 5.9824 no 71549 3352.7 568.2 2.1244 Diff (1-2) -196.9 569.1 6.2854 80806 -31.32 <.0001
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
47
'Normal' Regression
“dbirwt” = Birthweight (grams) from vital records
Sum of Mean Source DF Squares Square F Value Pr > F Model 1 317799174 317799174 981.24 <.0001 Error 80806 26171015889 323875 Corrected Total 80807 26488815063 Root MSE 569.09987 R-Square 0.0120 Dependent Mean 3330.17903 Adj R-Sq 0.0120 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3352.73853 2.12758 1575.84 <.0001 smoking 1 -196.88822 6.28538 -31.32 <.0001
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
'Normal' Regression
model dbirwt = smoking;
Predicted value for smokers: Mean birthweight = 3155.85 = 3352.74–196.89(1)
Predicted value for non-smokers:Mean birthweight = 3352.74 = 3352.74–196.89(0)
Measure of Association / comparison of predicted values:Difference between means = 3155.85-3352.74 = -196.8995% CI = -196.89 +/- 1.96*6.29 = (-184.6, -209.2)
48
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
49
'Normal' Regression with OLS in SAS
Sum of Mean Source DF Squares Square F Value Pr > F Model 2 375465933 187732967 580.92 <.0001 Error 80805 26113349130 323165 Corrected Total 80807 26488815063 Root MSE 568.47605 R-Square 0.0142 Dependent Mean 3330.17903 Adj R-Sq 0.0142 Coeff Var 17.07044 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 3364.04030 2.28746 1470.64 <.0001 smoking 1 -190.55737 6.29636 -30.26 <.0001 late_no_pnc 1 -71.69983 5.36744 -13.36 <.0001
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
50
Logistic Regression
Predicted ValuesWhen the outcome is a proportion with a logistic transformation, the predicted values are log odds
Dichotomous Independent Variable Coded 1 and 0:
In general:
10
101x
1x
bb
1bbp1
pln
0
100x
0x
b
0bbp1
pln
A Valuebbp1
pln 10
A Valuex
A Valuex
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
51
Logistic Regression
Measures of Association—Beta Coefficients—Differences Between Log Odds, and the Odds Ratio
Dichotomous Independent Variable Coded 1 and 0
1
1
1010
0x
0x
1x
1x
b01b0bb1bb
p1pln
p1pln
1
1
10
10
1010
b
01b
0bb
1bb
0bb1bb
e
eee
e
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
52
Measures of Association—Beta Coefficients—Differences Between Log Odds, and the Odds Ratio
In General, The beta coefficient is the change in the logit for every unit change in X.
For an ordinal or continuous variable, the test of the beta coefficient will be a test of linear trend.
Logistic Regression
B ValueA Valueb
B ValuebbA Valuebbp1
plnp1
pln
1
1010
B Valuex
B Valuex
A Valuex
A Valuex
B ValueA Valueb
B Value bb
A Value bb
B Value bbA Value bb
1
10
10
1010
eee
e
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
53
Confidence Intervals for Estimated Odds Ratios from a Logistic Regression Model
For dichotomous variables coded 1 and 0:
In general, for a single beta coefficient:
where "diff" is the difference of interest in the values of the independent variable being analyzed
Logistic Regression
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Logistic Regression in SAS
/* Dichotomous Birthweight, Logistic Regression */proc logistic order=formatted data=one; model lbw = smoking;run;proc logistic order=formatted data=one; model lbw = smoking late_no_pnc;run;
proc genmod data=one; model lbw = smoking / link=logit dist=bin; estimate 'Crude OR smoking' smoking 1 / exp;run;proc genmod data=one; model lbw = smoking late_no_pnc / link=logit dist=bin; estimate 'AOR smoking' smoking 1 / exp; estimate 'AOR Late_no_pnc' late_no_pnc 1 / exp;run;
54
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
55
First looking at acontingency table using proc freq in SASCrude Associationbetween Smokingand Low Birthweight
Logistic Regression smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 938 ‚ 8321 ‚ 9259 ‚ 10.13 ‚ 89.87 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 4046 ‚ 67503 ‚ 71549 ‚ 5.65 ‚ 94.35 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 4984 75824 80808 (Asymptotic) 95% (Exact) 95% Risk ASE Confidence Limits Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Row 1 0.1013 0.0031 0.0952 0.1075 0.0952 0.1076 Row 2 0.0565 0.0009 0.0549 0.0582 0.0549 0.0583 Total 0.0617 0.0008 0.0600 0.0633 0.0600 0.0634 Difference 0.0448 0.0033 0.0384 0.0511 Type of Study Value 95% Confidence Limits ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Case-Control (Odds Ratio) 1.8807 1.7455 2.0264 Cohort (Col1 Risk) 1.7915 1.6743 1.9169
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
56
Output from proc logistic
Logistic Regression
88.1e
eee
6317.0
016317.0
06317.08244.2
16317.08244.2
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8144 0.0162 30236.4216 <.0001 smoking 1 0.6317 0.0381 275.5238 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.881 1.746 2.026
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
57
Logistic Regression
Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457
Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731
Is there is evidence of any confounding or effect modification?
Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555
Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253
Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979
Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
58
Logistic Regression
Output from proc logistic: Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8622 0.0176 26390.9295 <.0001 smoking 1 0.6064 0.0382 251.5499 <.0001 late_no_pnc 1 0.2739 0.0362 57.3687 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.834 1.701 1.977 late_no_pnc 1.315 1.225 1.412
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
Predicted ValuesWhen the outcome is a proportion with a natural log
transformation, the predicted values are log proportions
In general
59
10
10
bb1bbrate/proportionln
0
10
bbbrate/proportionln
A Valuebb
AValuebbrate/proportionln
10
10
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
Measures of Association—Beta Coefficients—Differences Between Log Proportions/rates, and the Relative Prevalence / Relative Risk
Dichotomous Independent Variable Coded 1 and 0
60
01b0bb1bb
plnpln
1
1010
0x1x
01b
0bb
1bb
0bb1bb
1
10
10
1010
eee
e
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
In General, the beta coefficient is the change in the log proportion / rate for every unit change in X.
61
BValueA Valueb BValuebbA Valuebb
plnpln
1
1010
BValuexA Valuex
BValueA Valueb
BValuebb
A Valuebb
BValuebbA Valuebb
1
10
10
1010
eee
e
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
The more common the outcome, the greater the difference in the binomial and Poisson standard errors
When the outcome is rare (e.g. per 10,000, per 100,000), the binomial and Poisson standard errors will be almost identical
62
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
63
Binomial and Poisson Regression
For infant mortality, calculated per 1,000 live births, what difference will using the binomial or Poisson distribution make?
Suppose the IMR is 7 per 1,000, or 0.007:
0.002151500
0.006951500
0.9930.007s.e. binomial
0.00216
15000.007s.e. Poisson
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
/* Dichotomous Birthweight, Log-Binomial Regression */proc genmod data=one; model lbw = smoking / link=log dist=bin; estimate 'Crude RP smoking' smoking 1 / exp; run;proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=bin; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run;
/* Dichotomous Birthweight, Poisson Regression */proc genmod data=one; model lbw = smoking / link=log dist=poisson; estimate 'Crude RP smoking' smoking 1 / exp; run;proc genmod data=one; model lbw = smoking late_no_pnc / link=log dist=poisson; estimate 'ARP smoking' smoking 1 / exp; estimate 'ARP Late_no_pnc' late_no_pnc 1 / exp; run;
64
Binomial and Poisson Regression in SAS
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
Output from proc genmod
65
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -2.8727 0.0153 -2.9026 -2.8427 35389.4 <.0001 smoking 1 0.5831 0.0345 0.5154 0.6507 285.37 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Mean Mean L'Beta Standard Label Estimate Confidence Limits Estimate Error Alpha Crude RP smoking 1.7915 1.6743 1.9169 0.5831 0.0345 0.05 Exp(Crude RP smoking) 1.7915 0.0618 0.05
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
66
Binomial and Poisson Regression
Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457
Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731
Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555
Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253
Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979
Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Binomial and Poisson Regression
Output from proc genmod
67
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -2.9174 0.0166 -2.9500 -2.8848 30804.5 <.0001 smoking 1 0.5593 0.0347 0.4913 0.6272 260.34 <.0001 late_no_pnc 1 0.2548 0.0333 0.1894 0.3201 58.39 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Contrast Estimate Results Mean Mean L'Beta Standard Label Estimate Confidence Limits Estimate Error Alpha ARP smoking 1.7494 1.6345 1.8724 0.5593 0.0347 0.05 Exp(ARP smoking) 1.7494 0.0606 0.05 ARP Late_no_pnc 1.2901 1.2085 1.3773 0.2548 0.0333 0.05 Exp(ARP Late_no_pnc) 1.2901 0.0430 0.05
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
68
Binomial and Poisson Regression
Comparison between Binomial and Poisson Results
Binomial
Poissson
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Cumulative and Generalized Logit
/*vlbw, mlbw, and normal bw as an ordinal variable*/proc logistic order=formatted data=one; model bwcat = smoking;run;
/*vlbw, mlbw, and normal bw as a nominal variable*/proc logistic order=formatted data=one; model bwcat (ref='normal bw') = smoking / link=glogit;run;
Since this is logistic regression, predicted values are log(odds) and the measures of association—the beta coefficients—are differences between the log odds ratios, which when exponentiated are odds ratios.
69
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Cumulative and Generalized Logit
Output fromproc logistic:Ordinal Birthweight
70
Value bwcat Frequency 1 vlbw 897 2 mlbw 4087 3 normal bw 75824 Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 13.6865 1 0.0002 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept vlbw 1 -4.5844 0.0343 17824.9677 <.0001 Intercept mlbw 1 -2.8138 0.0162 30242.4897 <.0001 smoking 1 0.6262 0.0381 270.3336 <.0001 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.871 1.736 2.016
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Cumulative and Generalized Logit
Output fromproc logistic:Nominal Birthweight
71
Value bwcat Frequency 1 vlbw 897 2 mlbw 4087 3 normal bw 75824 Analysis of Maximum Likelihood Estimates Standard Wald Parameter bwcat DF Estimate Error Chi-Square Pr > ChiSq Intercept vlbw 1 -4.4827 0.0364 15160.6058 <.0001 Intercept mlbw 1 -3.0234 0.0179 28618.1754 <.0001 smoking vlbw 1 0.3540 0.0944 14.0650 0.0002 smoking mlbw 1 0.6865 0.0410 280.0005 <.0001 Odds Ratio Estimates Point 95% Wald Effect bwcat Estimate Confidence Limits smoking vlbw 1.425 1.184 1.714 smoking mlbw 1.987 1.833 2.153
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Risk Differences
/* Dichotomous Birthweight, Modeling Risk Differences */proc genmod data=one; model dbirwt = smoking / link=identity dist=bin;run;proc genmod data=one; model dbirwt = smoking late_no_pnc / link=identity dist=bin;run;
Since the outcome variable is a proportion , but it is not transformed in any way, the predicted values are the proportions themselves, and the measures of association—the beta coefficients—are the differences in the proportions, or "risk" differences.
72
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
73
Risk Differences
Risk Diff 0.0582 0.0438-0.0727 Risk Diff 0.0387 0.0316-0.0457
Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731
Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555
Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253
Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979
Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Risk Differences
Output form proc genmod
Crude and Adjusted Risk Differences
74
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 0.0565 0.0009 0.0549 0.0582 4288.51 <.0001 smoking 1 0.0448 0.0033 0.0384 0.0511 189.37 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000
Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 0.0540 0.0009 0.0522 0.0558 3506.35 <.0001 smoking 1 0.0428 0.0033 0.0364 0.0492 172.41 <.0001 late_no_pnc 1 0.0165 0.0025 0.0117 0.0213 45.21 <.0001 Scale 0 1.0000 0.0000 1.0000 1.0000
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
75
Linear Models: General Considerations
Conceptual Framework Level of measurement
of the outcome variable
Unit of Analysis
Error Structure /Distribution
Hypothesis formulation Continuous Dichotomous Polytomous-nominal Polytomous-ordinal Individual Aggregate Individual and aggregate Uncorrelated Correlated imposed by study
design or by ‘natural’ structure of the data
t
-3 -2 -1 0 1 2 3
0.00.1
0.20.3
0.4
Density of Student's t with 10 d.f.
x
0 5 10 15
0.00.1
0.20.3
0.40.5
0.6
Chi-Square Densities
1 d.f.
2 d.f.
3 d.f.5 d.f.
8 d.f.
Disease or Other Health Outcome Yes No
Yes
a
b
a + b (n1)
Exposure or Person, Place,
or Time Variable No c d
c + d (n2)
a + c (m1)
b + d (m2)
a + b + c + d N
Until next week...
Again, let's keep this in mind...
"...technical expertise and methodology are not substitutes for conceptual coherence. Or, as one student remarked a few years ago, public health spends too much time on the "p" values of biostatistics and not enough time on values." Medicine and Public Health, Ethics and Human RightsJonathan M. MannThe Hastings Center Report , Vol. 27, No. 3 (May - Jun., 1997), pp. 6-13Published by: The Hastings Center
76