BioSHaRE: Analysis of mixed effects models using federated data analysis approach - Edwin van den...
-
Upload
lisette-giepmans -
Category
Health & Medicine
-
view
319 -
download
1
Transcript of BioSHaRE: Analysis of mixed effects models using federated data analysis approach - Edwin van den...
INDIVIDUAL PARTICIPANT DATA ANALYSIS: A FEDERATED APPROACH
EDWIN VAN DEN HEUVELSACHA LA BASTIDE – VAN GEMERT
CONTENTIntroductionFederated Data Analysis
Linear RegressionMixed models
EM-AlgorithmValidation results
Test dataBioSHARE data
Concluding remarks
INTRODUCTIONMeta-Analysis
Combining data from different sources started most likely with Carl Friedrich Gauss (1777-1855).
He used data from astronomers to calculate planet orbitsHe developed least squares and the classical reliability theory: the true
parameter is observed with
noise
INTRODUCTIONMeta-Analysis
Combining data was a true problem in the beginning of the 20th centuryPotency estimates from bioassays showed tremendous heterogeneity
Least squares was unsatisfactoryLandmark paper of Cochran in 1954 discussed various
weighted meansThis field implicitly used random effects model
Reference
Unknown
Response
Concentration
RU
INTRODUCTIONMeta-Analysis
Gene Glass introduces the term (aggregate data) meta-analysis in 1976 as the analysis of analyses
This paper did not refer to the bioassay field at allPools estimates from published papers
A meta-analysis assumes the existence ofThe estimate of the association bi at study i
A standard error si of the estimate bi
The number of degrees of freedom di for standard error si
Different statistical approaches are available to pool the estimates
INTRODUCTIONMeta-Analysis
Fixed effects meta-analysis modelbi = + ei, ei ~ N(0,i
2)
the standard error si is an estimate of i
Random effects meta-analysis modelbi = + Ui + ei, ei ~ N(0,i
2), Ui ~ N(0,2)
the standard error si is an estimate of i
2 represents heterogeneity in the estimates
In case heterogeneity is present ( ≠ 0) the fixed effects analysis underestimates the standard error of the pooled estimate
INTRODUCTIONMeta-Analysis
Not all researchers are in favor of meta-analysisVan Houwelingen (1997) wrote:
“…popular practice of analysing summary measures from selected publications is a poor man’s solution.”…I hope that we will have full multi-center multi-study databases that can be analysed by appropriate random effects models considering both random variation within and between studies and/or centres.”
Thus there is a strong need for individual participant data analysis
INTRODUCTIONMeta-Analysis
IPD meta-analysis can be performed in two ways:One-stage analysis: All individual data is simultaneously analyzed (possibly with sophisticated statistical models)Two-stage or coordinated meta analysis: Each study is analyzed separately and the model parameters are pooled according to original meta-analysis tools
Two-stage analysis seems easier to implement, since it does not require that data is pooled at one locationOne-stage IPD meta-analysis that does not pool data at one location is called federated data analysis
FEDERATED DATA ANALYSISLinear Regression
Consider the following settingYij is the response of subject j in study i
Xij is the exposure of subject j in study i
Zij is a confounder of subject j in study i
The simplest linear regression model isM1: Yij = 0 + Z·Zij + X·Xij + eij
Model M1 assumes:The populations are homogeneous – interceptAssociations are homogeneousResidual variances are homogeneousThe ratio sample and population size is homogenous
FEDERATED DATA ANALYSISLinear Regression
Federated data analysis for estimation of 0, Z, X, and 2 require study summary statistics:
Number of observationsSum of the confoundersSum of the exposuresSum of the squared confoundersSum of the squared exposuresSum of the confounder – exposure productSum of the responseSum of the response – confounder productSum of the response – exposure productsSum of the squared responses (for the SE)
FEDERATED DATA ANALYSISLinear Regression
Heterogeneous populationsM2: Yij = 0,i + Z·Zij + X·Xij + eij
Heterogeneous associationsM3a: Yij = 0,i + Z,i·Zij + X·Xij + eij
M3b: Yij = 0,i + Z·Zij + X,i·Xij + eij
M3c: Yij = 0,i + Z,i·Zij + X,i·Xij + eij
Heterogeneous residual variancesM4: Yij = 0,i + Z,i·Zij + X,i·Xij + eij,i
Standard deviation i of eij,i depends on study i
FEDERATED DATA ANALYSISLinear Regression
Models M2 and M3a Have a homogeneous association for the exposure Require a federated data analysis The same summary statistics for the federated data analysis of model M1 are involved
Models M3b, M3c, and M4 can be estimated with the same summary statistics used in the federated data analysisRequire aggregate data meta-analysis to pool the estimates bX,i from different studies
FEDERATED DATA ANALYSISLinear Regression
Simulation studies shows that an aggregate data meta-analysis for model M3a produces strong heterogeneity in the estimates bX,i even tough the association is homogeneousTreating the regression parameters in models M2, M3a, M3b, M3c, and M4 as fixed effects will underestimates the pooled association X
Thus models M2, M3a, M3b, M3c, and M4 need to assume that the parameters are random – mixed effects models like the random effects meta-analysis
FEDERATED DATA ANALYSISMixed Effects
Model M2 becomesM2: Yij = 0 + Z·Zij + X·Xij + Ui + eij
The associations are still assumed homogeneousThe residual variance is homogeneousIntercept is heterogeneous Ui ~ N(0,2): random intercept model
Federated data analysis for mixed models is less straightforward – random term complicates method The Expectation-Maximization algorithm can be used to estimate the model parameters in a federated approach
EM-ALGORITHMMixed Effects Models
Step 0: Choose starting values 0(0), Z(0), X(0), (0), and (0) for 0, Z, X, , and
Step 1: E-Step: using the estimates from the previous step, estimate Ui
M-Step: Using the result from the E-step determine 0(1), Z(1), X(1), and (1)
Evaluate: how much the estimates has changedIf the changes are small enough → convergenceIf the changes are still to large → conduct step 1 using the last available estimates
EM uses the same summary statistics
iteration
VALIDATION RESULTSTEST DATA: MULTICENTER TRIAL
Data from a multicenter trial was usedTwo responses: Hemoglobin in blood (g/dl)Blood loss during surgery (mL)Exposure: Treatment (control; new)Covariate: Age (years)Three centers (1, 2, 3) – Different centers were selected for the responses
The EM-algorithm was used with the summary statistics needed to estimate M1A random intercept model with maximum likelihood was applied on the full data set
VALIDATION RESULTSTEST DATA: MULTICENTER TRIAL
Description of the validation dataHemoglobine
Blood loss
Center 1 (n=200)
Center 2 (n=20)
Center 3 (n=30)
P-value
Hb (Std) 6.50 (0.890) 6.81 (1.065) 6.80 (0.859) 0.179Age (Std) 66.4 (9.78) 67.3 (9.97) 66.1 (8.21) 0.864Treatment (%) 100 (50) 8 (40) 15 (50) 0.692
Center 1 (n=200)
Center 2 (n=48)
Center 3 (n=39)
P-value
BL (Std) 641 (701) 763 (527) 748 (428) <0.001Age (Std) 66.4 (9.78) 64.6 (9.64) 61.7 (9.73) 0.007Treatment (%) 100 (50) 21 (44) 21 (54) 0.622
VALIDATION RESULTSTEST DATA: MULTICENTER TRIAL
Hemoglobine
EM-nr indicates the number of iterations used in EMConvergence criterion for all parameters was set at 10-8
A start value of 0 for leads to incorrect resultsConvergence is relatively fast and close to the truth for positive starting values
0 Z X 2 2
EM-0 1 1 1 1 1EM-190206 7.6086 -0.01548 0.04021 0.006659 0.7887EM-0 1 1 1 0 1EM-3 7.5683 -0.01556 0.03838 0 0.7933SAS 7.6086 -0.01548 0.04021 0.006657 0.7887
VALIDATION RESULTSTEST DATA: MULTICENTER TRIAL
Blood loss
EM-nr indicates the number of iterations used in EMConvergence criterion for all parameters was set at 10-8
A start value of 1 for leads to incorrect resultsConvergence is really fast and close to the truth when = 0 as starting values
0 Z X 2 2
EM-0 1 1 1 1 1EM-1139723 608.23 1.7755 -97.9651 0.01140 411547EM-0 1 1 1 0 1EM-3 608.23 1.7755 -97.9651 0 411547SAS 608.23 1.7755 -97.9651 0 411547
VALIDATION RESULTSTEST DATA: MULTICENTER TRIAL
Both sets of starting values are needed to make appropriate inference
When the two sets provide identical estimates on fixed parameters, then the set with = 0 provides the answerWhen the two sets provide identical estimates on fixed parameters, then ≠ 0 provides the answer
The standard errors can also be determined It has not yet been incorporated in the R-programManual calculations demonstrate that the results coincide with the SAS output, when the appropriate estimates are taken into account
VALIDATION RESULTSBIOSHARE DATA
Response: Systolic blood pressureExposure: NoiseConfounders: AgeSexPM10Two cohorts: HUNT and LifeLinesEM-algorithm applied to fit random interceptComparison with model M1
Using standard-algorithmUsing DataSHIELD glm
VALIDATION RESULTSBIOSHARE DATA
Systolic blood pressure
The EM-algorithm seems to demonstrate a heterogeneity in the interceptsThe analysis of model M1 and EM are identical when the starting value of = 0 → they both used the summariesDataSHIELD glm seems to deviate somewhat, but this did not happen on the test data
0 AGE SEX PM10 NOISE 2 2
EM-0 1 1 1 1 1 1 1EM-Final 111.59 0.4141 -7.255 0.04627 -0.01351 1.8992 217.43EM-0 1 1 1 1 1 0 1EM-Final 114.68 0.4143 -7.2473 -0.16617 -0.00300 0 217.45Model M1 114.68 0.4143 -7.2473 -0.16617 -0.00300 NA 217.45DataSHIELD 114.95 0.4149 -7.2449 -0.18129 -0.00230 NA 217.41
CONCLUDING REMARKSFollow-up steps for BioSHARE (in August):1. Complete the existing algorithm for linear random
intercept models including standard errors2. Implement this algorithm in DataSHIELD3. Finalize statistics paper on algorithms for federated data
analysis for mixed models
Extensions for DataSHIELD after BioSHARE1. To handling missing data sets as well2. To linear random coefficient models3. To generalized random coefficient models
Acknowledgement
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 261433 (Biobank Standardisation and Harmonisation for Research Excellence in the European Union - BioSHaRE-EU)
<please adapt text and lay out as necessary, and include other funders as well. >