Introduction to SAS proc mixedstaff.pubhealth.ku.dk/~jufo/courses/rm2016/procmixed.pdf ·...
Transcript of Introduction to SAS proc mixedstaff.pubhealth.ku.dk/~jufo/courses/rm2016/procmixed.pdf ·...
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Faculty of Health Sciences
Introduction to SAS proc mixedAnalysis of repeated measurements, 2017
Julie FormanDepartment of Biostatistics, University of Copenhagen
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Outline
Data in wide and long format
Descriptive statistics
Analysis of response profiles (FLW section 5.8)
Reading the output from proc mixed
Baseline adjustment
2 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Preparing data for analysis
Most often raw data is stored in the wide format (e.g. in Excell).I one row per subjectI several columns with the outcomes for different occations
Example:
id sex age group aix0 aix1 aix21 1 57 0 10.5 17.5 25.02 1 48 0 -2.5 8.0 8.53 2 54 1 18.0 24.0 23.5...
To fit a linear mixed model with any statistical software datamust be in the so-called long format . . .3 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
The long formatI Each row contains only one observation of the outcome.I A time-variable identifies the time of measurement.I An id-variable identifies measurements from same subject.
Obs id sex age group week aix1 1 1 57 0 0 10.52 1 1 57 0 12 17.53 1 1 57 0 24 25.04 2 1 48 0 0 -2.55 2 1 48 0 12 8.06 2 1 48 0 24 8.57 3 2 54 1 0 18.08 3 2 54 1 12 24.09 3 2 54 1 24 23.5
10 4 2 46 1 0 26.0...4 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
From wide to long format
Data is transformed from the wide to the long format with:
DATA ckd (DROP = aix-aix2);SET ckd_wide;week = 0; aix = aix0; OUTPUT;week = 12; aix = aix1; OUTPUT;week = 24; aix = aix2; OUTPUT;RUN;
Note: We keep the baseline variable aix0 for the ANCOVA.
5 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Outline
Data in wide and long format
Descriptive statistics
Analysis of response profiles (FLW section 5.8)
Reading the output from proc mixed
Baseline adjustment
6 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Spaghettiplots
The spaghettiplots from the lecture were made with:
PROC SGPANEL DATA=ckd;PANELBY group;SERIES x = week y = aix / GROUP=id;RUN;
Note: Applies to data in the long format.
7 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Summary statistics and pairwise scatterplots
PROC SORT DATA=ckd_wide;BY group;RUN;
ODS GRAPHICS ON;PROC CORR DATA=ckd_wide PLOT=MATRIX(HISTOGRAM) NOPROB;BY group;VAR aix0-aix2;RUN;
Note: Applies to data in the wide format.
8 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Plotting averages over time
The plot of group-time-averages were made with:
PROC MEANS DATA=ckd NWAY;CLASS group week;VAR aix;OUTPUT OUT=ckdmeans MEAN=average;RUN;
PROC SGPLOT DATA=ckdmeans;SERIES x = week y = average / GROUP = group markers;RUN;
Note: Applies to data in the long format.
9 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Outline
Data in wide and long format
Descriptive statistics
Analysis of response profiles (FLW section 5.8)
Reading the output from proc mixed
Baseline adjustment
10 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Syntax: Analysis of response profiles
PROC MIXED DATA=ckd PLOTS=all;CLASS id week (ref=’0’) group (ref=’0’);MODEL aix = week group group*week
/ SOLUTION CL DDFM=KR OUTPM=ckdfit;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;
I Syntax is similar to PROC GLM with a MODEL specifying the(linear) relation between outcome and covariates.
I Categorical variable must be declared with CLASS.I The model for the covariance (UN=ustructured) is specified
in a separate REPEATED-statement.I Fitted values and residuals are saved in a dataset ckdfit.I Use the PLOTS-option to get some residual plots.
11 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
The option DDFM=KENWARDROGERS (aka KR)
(or DDFM=SATTERTHWAITE).
A technical option intended to improve the statistical performanceof the t-tests and F-tests.
I It has no effect on balanced data.I In unbalanced situations (i.e for almost all observational
studies and in case of missing observations) degrees offreedom are computed by a more complicated formulae.
I The computations may require a little more time,but in most cases this will not be noticable.
When in doubt, use it!12 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Estimated response profiles
Use the output data (ckdfit) to plot the estimated profiles:
PROC SORT DATA=ckdfit;BY group week id;RUN;
PROC SGPLOT DATA=ckdfit;SERIES x = week y = pred / GROUP = group MARKERS;RUN;
13 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Alternative model specifications
The same model can be phrased differently to highlight differencesbetween groups at specific time points or changes over time.
To compare change over time between groups:I Include both main effects and the interaction term.
MODEL aix = time group time*group / SOLUTION CL;
To get mean differences between groups at each time point:I Omit the main effect of group and the intercept.
MODEL aix = time time*group / NOINT SOLUTION CL;
To get the means for all combinations of group and time.I Include only the interaction term and omit the intercept.
MODEL aix = time*group / NOINT SOLUTION CL;
Note: Usually combined with LSMEANS (on the next slide)14 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
LSMEANS
Estimates the means for all time and group combination, and allpossible differences between them (DIFF-option).
PROC MIXED DATA=ckd;CLASS id week group;MODEL aix = group*week / NOINT DDFM=KR;LSMEANS group*week / DIFF SLICE=week CL;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;
I NOINT means that the model does not include an intercept(so there is no need to specifiy reference groups)
I Use SLICE=week to test for overall differences betweenmultiple groups at each time separately (like one-wayANOVA).
15 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Outline
Data in wide and long format
Descriptive statistics
Analysis of response profiles (FLW section 5.8)
Reading the output from proc mixed
Baseline adjustment
16 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)First we get a summary of what data and methods proc mixed hasused. (some we have specified and other are SAS’ defaults)The Mixed Procedure
Model Information
Data Set WORK.CKDDependent Variable aixCovariance Structure UnstructuredSubject Effect idEstimation Method REMLResidual Variance Method NoneFixed Effects SE Method Kenward-RogerDegrees of Freedom Method Kenward-Roger
Class Level Information
Class Levels Values
id 51 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 18 19 20 21 22 2324 25 26 28 29 30 31 32 33 3435 36 37 38 39 40 41 42 43 4546 47 48 49 51 52 53 54
week 3 12 24 0group 2 1 0
17 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)
Dimensions
Covariance Parameters 6Columns in X 12Columns in Z 0Subjects 51Max Obs Per Subject 3
This is a summary of the mathematical model specification whichis explained in lecture 4.
Number of Observations
Number of Observations Read 153Number of Observations Used 144Number of Observations Not Used 9
ATT: Missing data due to drop out and failed measurements.18 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)In contrary to the ordinary linear models, no explicit formulae forthe maximum likelihood estimates exist for linear mixed models ingeneral. Therefore SAS uses numerical optimisation to computeesitmates of the mean and covariance parameters.
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 1070.854549411 2 982.86560047 0.001447352 1 982.26253864 0.000099053 1 982.22468047 0.000000614 1 982.22445749 0.00000000
Convergence criteria met.
Always check that the numerical optimisation has converged!19 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)Options R and RCORR makes SAS print the estimated covarianceand correlation matrices.
Estimated R Matrix for id 1
Row Col1 Col2 Col3
1 106.23 96.3802 80.18932 96.3802 159.64 106.483 80.1893 106.48 106.38
Estimated R Correlation Matrix for id 1
Row Col1 Col2 Col3
1 1.0000 0.7401 0.75442 0.7401 1.0000 0.81713 0.7544 0.8171 1.0000
20 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)Fit statistics can be used for comparison of different models?.
Fit Statistics
-2 Res Log Likelihood 982.2AIC (smaller is better) 994.2AICC (smaller is better) 994.9BIC (smaller is better) 1005.8
Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq5 88.63 <.0001
The test of "all means are the same" is hardly ever of interest.
? Make sure to use the PROC MIXED METHOD=ML-option if you want touse this to test nested models for the mean-structure (lecture 2).21 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)At last what is most interesting: estimates and tests.
Solution for Fixed Effects
Effect week treat Estimate StdError DF t Value Pr > |t|Intercept 24.3431 2.0793 49.4 11.71 <.0001week 12 1.0887 1.7694 46.2 0.62 0.5414week 24 3.0895 1.4995 44.5 2.06 0.0452week 0 0 . . . .group 1 -2.0547 2.8999 48.9 -0.71 0.4820group 0 0 . . . .week*group 12 1 -1.9493 2.4871 45.8 -0.78 0.4372week*group 12 0 0 . . . .week*group 24 1 -3.6078 2.1298 45.3 -1.69 0.0971week*group 24 0 0 . . . .week*group 0 1 0 . . . .week*group 0 0 0 . . . .
(confidence intervals omitted due to lack of space)
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > Fweek 2 44.5 0.99 0.3794group 1 47 1.84 0.1817week*group 2 44.5 1.43 0.2490
22 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Output (analysis of response profiles)Standardized (aka Studentized) residuals: Normal distribution?
(Other residuals and boxplots of residuals vs time and group omitted)23 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Outline
Data in wide and long format
Descriptive statistics
Analysis of response profiles (FLW section 5.8)
Reading the output from proc mixed
Baseline adjustment
24 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
Which model should I choose?
Results from the cLMM and the ANCOVA model are usually verysimilar.
We recommed the cLMM.I Programming and interpretation is easier.I It is slightly better at handling missing data.
Exception:I If randomization was performed conditionally on baseline
measurements, then the ANCOVA is a valid model while thecLMM is not.
25 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
The constrained linear mixed model (cLMM)
To fit the constrained model:1. Define a new treatment variable by joining groups at baseline.2. Leave out the main term treat in the model statement.
DATA ckd;SET ckd;treat = group;IF week = 0 THEN treat = 0;RUN;
PROC MIXED DATA=ckd;CLASS id week (ref=’0’) treat (ref=’0’);MODEL aix = week treat*week / SOLUTION CL DDFM=KR;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;
26 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
ANCOVATo prepare for the analysis.
I Baseline must be included as a covariate in the data.I Only follow-up times are used when running the analysis.I For ease of interpretation and numerical stability we center
the baseline variable around its mean.I For ease of quantification we use change-since-baseline as
outcome.
DATA followup;SET ckd;IF week > 0;baseline = aix0 - xxxx;aixchange=aix-aix0;RUN;
27 / 28
u n i v e r s i t y o f c o p e n h a g e n d e p a r t m e n t o f b i o s t a t i s t i c s
ANCOVA
To run the analysis with proc mixed:I Include the baseline*time interaction in the model.I Since the analysis is based on follow-up data, the most natural
reference point for time is now the last follow-up.I The treatment effect (af last follow-up) is estimated by the
group-effect.
PROC MIXED DATA=followup;CLASS id week (ref=’24’) group (ref=’0’);MODEL aixchange = group week group*week baseline*week
/ SOLUTION CL DDFM=KR;REPEATED week / SUBJECT=id TYPE=UN R RCORR;RUN;
28 / 28