Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE...
Transcript of Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE...
![Page 1: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/1.jpg)
Handling Missing Data in R with MICE
Handling Missing Data in R with MICE
Stef van Buuren1,2
1Methodology and Statistics, FSBS, Utrecht University
2Netherlands Organization for Applied Scientific Research TNO, Leiden
Winnipeg, June 11, 2017
SvB
![Page 2: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/2.jpg)
Handling Missing Data in R with MICE
Why this course?
Missing data are everywhere
Ad-hoc fixes often do not work
Multiple imputation is broadly applicable, yield correct statisticalinferences, and there is good software
Goal of the course: get comfortable with a modern and powerfulway of solving missing data problems
SvB
![Page 3: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/3.jpg)
Handling Missing Data in R with MICE
Course materials
https://github.com/stefvanbuuren/winnipeg
SvB
![Page 4: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/4.jpg)
Handling Missing Data in R with MICE
Reading materials
1 Van Buuren, S. and Groothuis-Oudshoorn, C.G.M. (2011). mice:Multivariate Imputation by Chained Equations in R. Journal ofStatistical Software, 45(3), 1–67.https://www.jstatsoft.org/article/view/v045i03
2 Van Buuren, S. (2012). Flexible Imputation of Missing Data.Chapman & Hall/CRC, Boca Raton, FL. Chapters 1–6, 10.http://www.crcpress.com/product/isbn/9781439868249
SvB
![Page 5: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/5.jpg)
Handling Missing Data in R with MICE
Flexible Imputation of Missing Data (FIMD)
SvB
![Page 6: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/6.jpg)
Handling Missing Data in R with MICE
R software and examples
R Install from https://cran.r-project.org
RStudio: Install from https://www.rstudio.com
R package mice 2.30 or higher: from CRAN or fromhttps://github.com/stefvanbuuren/mice
More examples: http://www.multiple-imputation.com
SvB
![Page 7: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/7.jpg)
Handling Missing Data in R with MICE > Time table
Time table (morning)
Time Session L/P Description09.00 - 09.15 L Overview09.15 - 10.00 I L Introduction to missing data10.00 - 10.30 I P Ad hoc methods + MICE
10.30 - 10.45 PAUSE
10.45 - 11.30 II L Multiple imputation11.30 - 12.00 II P Boys data
12.00 - 13.15 PAUSE
SvB
![Page 8: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/8.jpg)
Handling Missing Data in R with MICE > Time table
Time table (afternoon)
Time Session L/P Description13.15 - 14.00 III L Generating plausible imputations14.00 - 14.30 III P Algorithmic convergence and pooling
14.30 - 14.45 PAUSE
14.45 - 15.15 IV L Imputation in practice15.15 - 15.45 IV P Post-processing and passive imputation15.45 - 16.00 V L Guidelines for reporting
SvB
![Page 9: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/9.jpg)
Handling Missing Data in R with MICE > I >
↵⌦
� SESSION I
SvB
![Page 10: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/10.jpg)
Handling Missing Data in R with MICE > I > Problem of missing data
Why are missing data interesting?
Obviously the best way to treat missing data is not to have them.(Orchard and Woodbury 1972)
Sooner or later (usually sooner), anyone who does statisticalanalysis runs into problems with missing data (Allison, 2002)
Missing data problems are the heart of statistics
SvB
![Page 11: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/11.jpg)
Handling Missing Data in R with MICE > I > Problem of missing data
Causes of missing data
Respondent skipped the item
Data transmission/coding error
Drop out in longitudinal research
Refusal to cooperate
Sample from population
Question not asked, di↵erent forms
Censoring
SvB
![Page 12: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/12.jpg)
Handling Missing Data in R with MICE > I > Problem of missing data
Consequences of missing data
Less information than planned
Enough statistical power?
Di↵erent analyses, di↵erent n’s
Cannot calculate even the mean
Systematic biases in the analysis
Appropriate confidence interval, P-values?
In general, missing data can severely complicate interpretation andanalysis.
SvB
![Page 13: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/13.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Listwise deletion
Analyze only the complete records
Also known as Complete Case Analysis (CCA)
AdvantagesSimple (default in most software)Unbiased under MCARCorrect standard errors, significance levels Two special properties inregression
SvB
![Page 14: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/14.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Listwise deletion
DisadvantagesWastefulLarge standard errorsBiased under MAR, even for simple statistics like the meanInconsistencies in reporting
SvB
![Page 15: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/15.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Mean imputation
Replace the missing values by the mean of the observed data
AdvantagesSimpleUnbiased for the mean, under MCAR
SvB
![Page 16: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/16.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Mean imputation
0 50 100 150
010
2030
4050
Ozone (ppb)
Freq
uenc
y
0 50 100 150 0 50 150 250
050
100
150
Solar Radiation (lang)O
zone
(ppb
)
SvB
![Page 17: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/17.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Mean imputation
DisadvantagesDisturbs the distributionUnderestimates the varianceBiases correlations to zeroBiased under MAR
AVOID (unless you know what you are doing)
SvB
![Page 18: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/18.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Regression imputation
Also known as prediction
Fit model for Yobs
under listwise deletion
Predict Ymis
for records with missing Y ’s
Replace missing values by prediction
AdvantagesUnbiased estimates of regression coe�cients (under MAR)Good approximation to the (unknown) true data if explainedvariance is high
Prediction is the favorite among non-statisticians
SvB
![Page 19: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/19.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Regression imputation
0 50 100 150
05
1015
2025
Ozone (ppb)
Freq
uenc
y
0 50 100 150 0 50 150 250
050
100
150
Solar Radiation (lang)O
zone
(ppb
)
SvB
![Page 20: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/20.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Regression imputation
DisadvantagesArtificially increases correlationsSystematically underestimates the varianceToo optimistic P-values and too short confidence intervals
AVOID. Harmful to statistical inference.
SvB
![Page 21: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/21.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Stochastic regression imputation
Like regression imputation, but adds appropriate noise to thepredictions to reflect uncertainty
AdvantagesPreserves the distribution of Y
obs
Preserves the correlation between Y and X in the imputed data
SvB
![Page 22: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/22.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Stochastic regression imputation
0 50 100 150
05
1015
2025
Ozone (ppb)
Freq
uenc
y
0 50 100 150 0 50 150 250
050
100
150
Solar Radiation (lang)O
zone
(ppb
)
SvB
![Page 23: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/23.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Stochastic regression imputation
DisadvantagesSymmetric and constant error restrictiveSingle imputation does not take uncertainty imputed data intoaccount, and incorrectly treats them as realNot so simple anymore
SvB
![Page 24: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/24.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Single imputation methods, wrapup
Underestimate uncertainty caused by the missing data
Unbiased only under restrictive assumptions
SvB
![Page 25: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/25.jpg)
Handling Missing Data in R with MICE > I > Ad-hoc methods
Alternatives
Maximum Likelihood, Direct Likelihood
Weighting
Multiple Imputation
Little, R.J.A. Rubin D.B. (2002) Statistical Analysis with MissingData. Second Edition. John Wiley Sons, New York.
SvB
![Page 26: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/26.jpg)
Handling Missing Data in R with MICE > II >
↵⌦
� SESSION II
SvB
![Page 27: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/27.jpg)
Handling Missing Data in R with MICE > II > What is multiple imputation
Rising popularity of multiple imputation
Year
Num
ber
of p
ublic
atio
ns (l
og)
1975 1980 1985 1990 1995 2000 2005 2010
1
2
5
10
20
50
100
200
early publications
'multiple imputation' in abstract
'multiple imputation' in title
SvB
![Page 28: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/28.jpg)
Handling Missing Data in R with MICE > II > What is multiple imputation
Main steps used in multiple imputation
⇢⇡�⇠
⇢⇡�⇠⇢⇡�⇠⇢⇡�⇠
⇢⇡�⇠⇢⇡�⇠⇢⇡�⇠
⇢⇡�⇠
-��
����✓
@@
@@@@R
-
-
-
-
@@@@@@R
������✓
Incomplete data Imputed data Analysis results Pooled results
SvB
![Page 29: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/29.jpg)
Handling Missing Data in R with MICE > II > What is multiple imputation
Steps in mice
incomplete data imputed data analysis results pooled results
data frame mids mira mipo
mice() with() pool()
SvB
![Page 30: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/30.jpg)
Handling Missing Data in R with MICE > II > Goal
Estimand
Q is a quantity of scientific interest in the population.
Q can be a vector of population means, population regression weights,population variances, and so on.
Q may not depend on the particular sample, thus Q cannot be astandard error, sample mean, p-value, and so on.
SvB
![Page 31: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/31.jpg)
Handling Missing Data in R with MICE > II > Goal
Goal of multiple imputation
Estimate Q by Q or Q accompanied by a valid estimate of itsuncertainty.
What is the di↵erence between Q or Q?
Q and Q both estimate Q
Q accounts for the sampling uncertainty
Q accounts for the sampling and missing data uncertainty
SvB
![Page 32: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/32.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Pooled estimate Q
Q` is the estimate of the `-th repeated imputation
Q` contains k parameters and is represented as a k ⇥ 1 column vector
The pooled estimate Q is simply the average
Q =1
m
mX
`=1
Q` (1)
SvB
![Page 33: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/33.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Within-imputation variance
Average of the complete-data variances as
U =1
m
mX
`=1
U`, (2)
where U` is the variance-covariance matrix of Q` obtained for the `-thimputation
U` is the variance is the estimate, not the variance in the data
The within-imputation variance is large if the sample is small
SvB
![Page 34: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/34.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Between-imputation variance
Variance between the m complete-data estimates is given by
B =1
m � 1
mX
`=1
(Q` � Q)(Q` � Q)0, (3)
where Q is the pooled estimate (c.f. equation 1)The between-imputation variance is large there many missing data
SvB
![Page 35: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/35.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Total variance
The total variance is not simply T = U + B
The correct formula is
T = U + B + B/m
= U +
✓1 +
1
m
◆B (4)
for the total variance of Q, and hence of (Q � Q) if Q is unbiasedThe term B/m is the simulation error
SvB
![Page 36: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/36.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Three sources of variation
In summary, the total variance T stems from three sources:
1U, the variance caused by the fact that we are taking a samplerather than the entire population. This is the conventionalstatistical measure of variability;
2B , the extra variance caused by the fact that there are missingvalues in the sample;
3B/m, the extra simulation variance caused by the fact that Qitself is based on finite m.
SvB
![Page 37: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/37.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Variance ratio’s (1)
Proportion of the variation attributable to the missing data
� =B + B/m
T
, (5)
Relative increase in variance due to nonresponse
r =B + B/m
U
(6)
These are related by r = �/(1� �).
SvB
![Page 38: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/38.jpg)
Handling Missing Data in R with MICE > II > Multiple imputation theory
Variance ratio’s (2)
Fraction of information about Q missing due to nonresponse
� =r + 2/(⌫ + 3)
1 + r
(7)
This measure needs an estimate of the degrees of freedom ⌫.
Relation between � and �
� =⌫ + 1
⌫ + 3�+
2
⌫ + 3. (8)
The literature often confuses � and �.
SvB
![Page 39: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/39.jpg)
Handling Missing Data in R with MICE > II > Statistical inference
Statistical inference for Q (1)
The 100(1� ↵)% confidence interval of a Q is calculated as
Q ± t(⌫,1�↵/2)
pT , (9)
where t(⌫,1�↵/2) is the quantile corresponding to probability 1� ↵/2 oft⌫ .
For example, use t(10, 0.975) = 2.23 for the 95% confidence intervalfor ⌫ = 10.
SvB
![Page 40: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/40.jpg)
Handling Missing Data in R with MICE > II > Statistical inference
Statistical inference for Q (2)
Suppose we test the null hypothesis Q = Q0 for some specified valueQ0. We can find the p-value of the test as the probability
Ps = Pr
F1,⌫ >
(Q0 � Q)2
T
�(10)
where F1,⌫ is an F distribution with 1 and ⌫ degrees of freedom.
SvB
![Page 41: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/41.jpg)
Handling Missing Data in R with MICE > II > Statistical inference
Degrees of freedom (1)
With missing data, n is e↵ectively lower. Thus, the degrees of freedomin statistical tests need to be adjusted.
The ‘old’ formula assumes n = 1:
⌫old
= (m � 1)
✓1 +
1
r
2
◆
=m � 1
�2(11)
SvB
![Page 42: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/42.jpg)
Handling Missing Data in R with MICE > II > Statistical inference
Degrees of freedom (2)
The new formula is
⌫ =⌫old
⌫obs
⌫old
+ ⌫obs
. (12)
where the estimated observed-data degrees of freedom that accountsfor the missing information is
⌫obs
=⌫com
+ 1
⌫com
+ 3⌫com
(1� �). (13)
with ⌫com
= n � k .
SvB
![Page 43: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/43.jpg)
Handling Missing Data in R with MICE > II > How many imputations?
How large should m be?
Classic advice: m = 3, 5, 10. More recently: set m higher: 20–100.Some advice
1 Use m = 5 or m = 10 if the fraction of missing information is low,� < 0.2.
2 Develop your model with m = 5. Do final run with m equal topercentage of incomplete cases.
3 Repeat the analysis with m = 5 with di↵erent seeds. If there arelarge di↵erences for some parameters, this means that the datacontain little information about them.
SvB
![Page 44: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/44.jpg)
Handling Missing Data in R with MICE > II > How many imputations?
The legacy
SvB
![Page 45: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/45.jpg)
Handling Missing Data in R with MICE > II > How many imputations?
Introductions to multiple imputation
1 Schafer, J.L. (1999). Multiple imputation: A primer. StatisticalMethods in Medical Research, 8(1), 3–15.
2 Sterne et al (2009). Multiple imputation for missing data inepidemiological and clinical research: potential and pitfalls. BMJ,338, b2393.
3 Van Buuren, S. (2012). Flexible Imputation of Missing Data.Chapman & Hall/CRC, Boca Raton, FL.
SvB
![Page 46: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/46.jpg)
Handling Missing Data in R with MICE > III >
↵⌦
� SESSION III
SvB
![Page 47: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/47.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Relation between temperature and gas consumption
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
SvB
![Page 48: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/48.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
We delete gas consumption of observation 47
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
deleted observation
a
SvB
![Page 49: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/49.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Predict imputed value from regression line
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
) b
SvB
![Page 50: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/50.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Predicted value + noise
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
) c
SvB
![Page 51: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/51.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Predicted value + noise + parameter uncertainty
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
) d
SvB
![Page 52: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/52.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Imputation based on two predictors
0 2 4 6 8 10
23
45
67
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
●
before insulationafter insulation
e
SvB
![Page 53: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/53.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Predictive mean matching: Y given X
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 54: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/54.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Add two regression lines
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 55: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/55.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Predicted given 5� C, ‘after insulation’
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 56: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/56.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Define a matching range y ± �
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 57: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/57.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Select potential donors
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 58: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/58.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Bayesian PMM: Draw a line
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 59: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/59.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Define a matching range y ± �
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 60: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/60.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Select potential donors
−2 0 2 4 6 8 10
12
34
56
78
Temperature (°C)
Gas
con
sum
ptio
n (c
ubic
feet
)
before insulationafter insulation
SvB
![Page 61: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/61.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Imputation of a binary variable
logistic regression
Pr(yi = 1|Xi ,�) =exp(Xi�)
1 + exp(Xi�). (14)
SvB
![Page 62: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/62.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Fit logistic model
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Linear predictor
Prob
abilit
y
SvB
![Page 63: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/63.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Draw parameter estimate
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Linear predictor
Prob
abilit
y
SvB
![Page 64: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/64.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Read o↵ the probability
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Linear predictor
Prob
abilit
y
1
2
SvB
![Page 65: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/65.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Impute ordered categorical variable
K ordered categories k = 1, . . . ,K
ordered logit model, or
proportional odds model
Pr(yi = k |Xi ,�) =exp(⌧k + Xi�)PKk=1 exp(⌧k + Xi�)
(15)
SvB
![Page 66: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/66.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Fit ordered logit model
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Linear predictor
Prob
abilit
y
SvB
![Page 67: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/67.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Read o↵ the probability
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Linear predictor
Prob
abilit
y
1
2
3
SvB
![Page 68: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/68.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Other types of variables
Count data
Semi-continuous data
Censored data
Truncated data
Rounded data
SvB
![Page 69: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/69.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, univariate
Univariate imputation in mice
Method Description Scale typepmm Predictive mean matching numeric⇤
norm Bayesian linear regression numericnorm.nob Linear regression, non-Bayesian numericnorm.boot Linear regression with bootstrap numericmean Unconditional mean imputation numeric2L.norm Two-level linear model numericlogreg Logistic regression factor, 2 levels⇤
logreg.boot Logistic regression with bootstrap factor, 2 levelspolyreg Multinomial logit model factor, > 2 levels⇤
polr Ordered logit model ordered, > 2 levels⇤
lda Linear discriminant analysis factorsample Simple random sample any
SvB
![Page 70: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/70.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Problems in multivariate imputation
Predictors themselves can be incomplete
Mixed measurement levels
Order of imputation can be meaningful
Too many predictor variables
Relations could be nonlinear
Higher order interactions
Impossible combinations
SvB
![Page 71: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/71.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Three general strategies
Monotone data imputation
Joint modeling
Fully conditional specification (FCS)
SvB
![Page 72: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/72.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Imputation of monotone pattern
X1 Y1 Y2 Y3 Y4
SvB
![Page 73: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/73.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Imputation of monotone pattern
X1 Y1 Y2 Y3 Y4
SvB
![Page 74: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/74.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Imputation of monotone pattern
X1 Y1 Y2 Y3 Y4
SvB
![Page 75: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/75.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Joint Modeling (JM)
1 Specify joint model P(Y ,X ,R)
2 Derive P(Ymis
|Yobs
,X ,R)
3 Use MCMC techniques to draw imputations Ymis
SvB
![Page 76: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/76.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Joint modeling: Software
R/S Plus norm, cat, mix, pan, AmeliaSAS proc MI, proc MIANALYZE
STATA MI commandStand-alone Amelia, solas, norm, pan
SvB
![Page 77: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/77.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Joint Modeling: Pro’s
Yield correct statistical inference under the assumed JM
E�cient parametrization (if the model fits)
Known theoretical properties
Works very well for parameters close to the center
Many applications
SvB
![Page 78: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/78.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Joint Modeling: Con’s
Lack of flexibility
May lead to large models
Can assume more than the complete data problem
Can impute impossible data
SvB
![Page 79: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/79.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Fully Conditional Specification (FCS)
1 Specify P(Ymis
|Yobs
,X ,R)
2 Use MCMC techniques to draw imputations Ymis
SvB
![Page 80: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/80.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Multivariate Imputation by Chained Equations (MICE)
MICE algorithm
Specify imputation model for each incomplete column
Fill in starting imputations
And iterate
Model: Fully Conditional Specification (FCS)
SvB
![Page 81: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/81.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Fully Conditional Specification: Con’s
Theoretical properties only known in special cases
Cannot use computational shortcuts, like sweep-operator
Joint distribution may not exist (incompatibility)
SvB
![Page 82: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/82.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Fully Conditional Specification: Pro’s
Easy and flexible
Imputes close to the data, prevents impossible data
Subset selection of predictors
Modular, can preserve valuable work
Works well, both in simulations and practice
SvB
![Page 83: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/83.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Fully Conditional Specification (FCS): Software
R mice, transcan, mi, VIM, baboonSPSS V17 procedure multiple imputation
SAS IVEware, SAS 9.3
STATA ice command, multiple imputation commandStand-alone Solas, Mplus
SvB
![Page 84: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/84.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
How many iterations?
Quick convergence
5–10 iterations is adequate for most problems
More iterations is � is high
inspect the generated imputations
Monitor convergence to detect anomalies
SvB
![Page 85: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/85.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Non-convergence
Iteration
8085
9095
100
110 mean hgt
2830
3234
3638
sd hgt
3738
3940
mean wgt
2627
2829
sd wgt
5010
015
0
5 10 15 20
mean bmi
050
100
150
5 10 15 20
sd bmi
SvB
![Page 86: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/86.jpg)
Handling Missing Data in R with MICE > III > Creating imputations, multivariate
Convergence
Iteration
9294
96
mean hgt
2426
2830
sd hgt
36.0
37.0
38.0
mean wgt
25.0
26.0
27.0
sd wgt
1617
1819
20
5 10 15 20
mean bmi
23
45
5 10 15 20
sd bmi
SvB
![Page 87: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/87.jpg)
Handling Missing Data in R with MICE > IV >
↵⌦
� SESSION IV
SvB
![Page 88: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/88.jpg)
Handling Missing Data in R with MICE > IV > Modeling choices
Imputation model choices
1 MAR or MNAR
2 Form of the imputation model
3 Which predictors
4 Derived variables
5 What is m?
6 Order of imputation
7 Diagnostics, convergence
SvB
![Page 89: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/89.jpg)
Handling Missing Data in R with MICE > IV > Which predictors
Which predictors?
1 Include all variables that appear in the complete-data model
2 In addition, include the variables that are related to thenonresponse
3 In addition, include variables that explain a considerable amountof variance
4 Remove from the variables selected in steps 2 and 3 thosevariables that have too many missing values within the subgroupof incomplete cases.
Function quickpred() and flux()
SvB
![Page 90: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/90.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Derived variables
ratio of two variables
sum score
index variable
quadratic relations
interaction term
conditional imputation
compositions
SvB
![Page 91: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/91.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
How to impute a ratio?
weight/height ratio: whr=wgt/hgt kg/m.Easy if only one of wgt or hgt or whr is missingMethods
POST: Impute wgt and hgt, and calculate whr after imputation
JAV: Impute whr as ‘just another variable’
PASSIVE1: Impute wgt and hgt, and calculate whr duringimputation
PASSIVE2: As PASSIVE1 with adapted predictor matrix
SvB
![Page 92: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/92.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method POST
> imp1 <- mice(boys)
> long <- complete(imp1, "long", inc = TRUE)
> long$whr <- with(long, wgt/(hgt/100))
> imp2 <- long2mids(long)
SvB
![Page 93: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/93.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method JAV: Just another variable
> boys$whr <- boys$wgt/(boys$hgt/100)
> imp.jav <- mice(boys, m = 1, seed = 32093, maxit = 10)
SvB
![Page 94: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/94.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method JAV
Height (cm)
Weig
ht/H
eig
ht (k
g/m
)
10
20
30
40
50
60
50 100 150 200
JAV
50 100 150 200
passive
50 100 150 200
passive 2
SvB
![Page 95: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/95.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE
> meth["whr"] <- "~I(wgt/(hgt/100))"
SvB
![Page 96: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/96.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE, predictor matrix
age hgt wgt bmi hc gen phb tv reg whr
age 0 0 0 0 0 0 0 0 0 0
hgt 1 0 1 0 1 1 1 1 1 0
wgt 1 1 0 0 1 1 1 1 1 0
bmi 1 1 1 0 1 1 1 1 1 0
hc 1 1 1 1 0 1 1 1 1 1
gen 1 1 1 1 1 0 1 1 1 1
phb 1 1 1 1 1 1 0 1 1 1
tv 1 1 1 1 1 1 1 0 1 1
reg 1 1 1 1 1 1 1 1 0 1
whr 1 1 1 0 1 1 1 1 1 0
SvB
![Page 97: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/97.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE
Height (cm)
Weig
ht/H
eig
ht (k
g/m
)
10
20
30
40
50
60
50 100 150 200
JAV
50 100 150 200
passive
50 100 150 200
passive 2
SvB
![Page 98: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/98.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE2
> pred[c("wgt", "hgt", "hc", "reg"), "bmi"] <- 0
> pred[c("gen", "phb", "tv"), c("hgt", "wgt", "hc")] <- 0
> pred[, "whr"] <- 0
SvB
![Page 99: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/99.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE2, predictor matrix
age hgt wgt bmi hc gen phb tv reg whr
age 0 0 0 0 0 0 0 0 0 0
hgt 1 0 1 0 1 1 1 1 1 0
wgt 1 1 0 0 1 1 1 1 1 0
bmi 1 1 1 0 1 1 1 1 1 0
hc 1 1 1 0 0 1 1 1 1 0
gen 1 0 0 1 0 0 1 1 1 0
phb 1 0 0 1 0 1 0 1 1 0
tv 1 0 0 1 0 1 1 0 1 0
reg 1 1 1 0 1 1 1 1 0 0
whr 1 1 1 1 1 1 1 1 1 0
SvB
![Page 100: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/100.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Method PASSIVE2
Height (cm)
Wei
ght/
Hei
ght (
kg/m
)
10
20
30
40
50
60
50 100 150 200
JAV
50 100 150 200
passive
50 100 150 200
passive 2
SvB
![Page 101: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/101.jpg)
Handling Missing Data in R with MICE > IV > Derived variables
Derived variables: summary
Derived variables pose special challenges
Plausible values respect data dependencies
If you can, create derived variables after imputation
If you cannot, use passive imputation
Break up direct feedback loops using the predictor matrix
SvB
![Page 102: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/102.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
Standard diagnostic plots in mice
Since mice 2.5, plots for imputed data:
one-dimensional scatter: stripplot
box-and-whisker plot: bwplot
densities: densityplot
scattergram: xyplot
SvB
![Page 103: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/103.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
Stripplot
> library(mice)
> imp <- mice(nhanes, seed = 29981)
> stripplot(imp, pch = c(1, 19))
SvB
![Page 104: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/104.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
stripplot(imp, pch=c(1,19))
Imputation number
1.0
1.5
2.0
2.5
3.0
0 1 2 3 4 5
age
2025
3035
0 1 2 3 4 5
bmi
1.0
1.2
1.4
1.6
1.8
2.0
0 1 2 3 4 5
hyp
150
200
250
0 1 2 3 4 5
chl
SvB
![Page 105: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/105.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
A larger data set
> imp <- mice(boys, seed = 24331, maxit = 1)
> bwplot(imp)
SvB
![Page 106: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/106.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
bwplot(imp)
Imputation number
05
1015
20
0 1 2 3 4 5
age
5010
015
020
0
0 1 2 3 4 5
●●
●
●
●●●
●●
●
●
hgt
020
4060
8010
012
0
0 1 2 3 4 5
wgt
1520
2530
0 1 2 3 4 5
●
●
●●
●
●
●
●●
●
●●●
●
●●
●
●
●●
●
bmi
4050
60
0 1 2 3 4 5
●●●●●●●
●
●●●
●
●
●
● ●
●●
●
●
●
●●●●
●
●
●●
●
●
●
hc
05
1015
2025
0 1 2 3 4 5
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
tv
SvB
![Page 107: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/107.jpg)
Handling Missing Data in R with MICE > IV > Diagnostics
densityplot(imp)
Density0.00
0.01
0.02
0.03
0.04
50 100 150 200
hgt
0.00
0.01
0.02
0.03
0 50 100
wgt
0.00
0.05
0.10
0.15
0.20
0.25
10 15 20 25 30
bmi
0.00
0.05
0.10
0.15
30 40 50 60 70
hc
0.00
0.05
0.10
0 10 20 30
tv
SvB
![Page 108: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/108.jpg)
Handling Missing Data in R with MICE > V >
↵⌦
� SESSION V
SvB
![Page 109: Handling Missing Data in R with MICE - stefvanbuuren.name · Handling Missing Data in R with MICE Handling Missing Data in R with MICE Stef van Buuren1,2 1Methodology and Statistics,](https://reader035.fdocuments.in/reader035/viewer/2022062414/5c7fa7ea09d3f25c328bb36d/html5/thumbnails/109.jpg)
Handling Missing Data in R with MICE > V > Reporting guidelines
Reporting guidelines
1 Amount of missing data
2 Reasons for missingness
3 Di↵erences between complete and incomplete data
4 Method used to account for missing data
5 Software
6 Number of imputed datasets
7 Imputation model
8 Derived variables
9 Diagnostics
10 Pooling
11 Listwise deletion
12 Sensitivity analysis
SvB