Neuroscience on a Population Scale: Design, Measurement ...
Transcript of Neuroscience on a Population Scale: Design, Measurement ...
Steven G. Heeringa
Senior Research Scientist and Associate Director
Survey Research Center, Institute for Social Research
University of Michigan
Annual Workshop
Michigan Program in Survey Methods
University of Michigan
October 25, 2019
Neuroscience on a Population Scale:Design, Measurement, Data Integration and Analysis.
Outline and focus of today’s talk
• Unique challenges in population neuroscience studiesDesignMeasurementData types and integrationStatistical analysis and interpretation
• Adolescent Brain Cognitive Development (ABCD) Study
Design Challenges: Subject recruitment, sampling plans for population neuroscience research
Both internal and external validity are needed
Study participation is demanding and study consent and response is a major challenge.
Portability/geographic reach of the measurement technology
Full, conventional probability sample designs like that employed in the epidemiological studies such as the National Health and Nutrition Examination Survey (NHANES) may not be feasible.
Internal Validity, External Validity and “Population Representativeness” in Epidemiological Studies.
Keiding and Louis (2016), Perils and potentials of self-selected entry to epidemiological studies and surveys. J. R. Statist. Soc. A 179, Part 2, pp. 319–376
However, self-selection dramatically departs from the traditional, so-called gold standard approaches of targeted enrolment to scientific studies and sampling frame-based surveys. Traditionalists argue that we must adhere to the values of planned accrual and follow-up for all studies and identification of a sampling frame for surveys and possibly also for epidemiological and other such studies. Others propose that we should stop worrying about it and open up accrual, using modern approaches (covariate adjustments, find instrumental variables, ‘big data’) to make the necessary adjustments.
Elliott, M.R. and Richard L. Valliant. 2017 “Inference for nonprobability samples.” Statistical Science, 32(2):249-264.
Additional design/measurement challenges in population neuroscience studies.
The Brain R2 Problem: Multi-level (brain, individual, social networks, community, environment, culture).
Sample size and power to detect small effects and interactions.• Important outcomes are likely a function of many small effects.• Analogy to trends in analysis of genetic/epigenetic contributions to
health and other outcomes.
Types of Measurement in Population Neuroscience Studies
DNA
Epigenetics
Biomarkers
Child Assessments
Parent Interviews
Family History
Environment
School and AdminRecords
MRI
DTI
fMRI
Epidemiological, Statistical and Computational Challenges in Population Neuroscience
• Descriptive/Analytical?, “left brain”/”right brain”• Informative sampling, subject recruitment design• Selection mechanisms (e.g. bias) in recruitment• Lack of independence, correlated observation
• Geographic clustering of subjects• Clustering in measurement (centers, interviewers, instruments)
• Latent Constructs• Missing data• Longitudinal Analysis of growth, change• Analysis of Brain Image Data
• Spatial correlation (3D array of voxels)• Temporal correlation (time series, fMRI)• Image correction, registration/standardization, smoothing• Dimension reduction, creating summary measures
Analytic Aims for Population Neuroscience
Brain as Brain : Anatomy, networks
Brain as Outcome: e.g. Does regular marijuana use (screen time) during adolescent development affect brain development (morphology) or function.
Brain as Predictor: e.g. Educational attainment.
Computational and Statistical Tools• Whole Brain Computations and Graphical Display
• e.g. AFNI
• Cortical Surface Analysis• e.g. SUMA, FreeSurfer, SurfStat
• GLMs, GAMMs• R, SAS, Stata, etc.
• Machine learning (?)
• Pace of development and enhancement of methodology and software is rapid!
ADOLESCENT BRAIN COGNITIVE DEVELOPMENT STUDY
HTTPS:/ /ABCDSTUDY.ORG/INDEX.HTML
Overview of ABCD• ABCD is the largest long-term study of brain development in the
United States.
• Baseline cohort of n=11,875 children age 9-10 recruited 09/2016-10/2018.• Recruitment and data collection in 21 sites nationally• Coordinating site: University of CA, San Diego• Special twin sample recruitment (n=1727) in 4 sites
• Longitudinal design to follow baseline cohort until age 20.
• “Open science model” for hypothesis generation and data sharing.
• NIH and CDC Partners. NIH/NIDA Funded.
Distribution of ABCD Study Sites, https://abcdstudy.org/
ABCD Design Strategies to Maximize External Validity and Statistical Efficiency for Population-based Analysis
21 Sites and Catchment Areas• Nationally distributed• Demographically and socio-economically diverse• May be treated as a “pseudo” sample of primary stage units
Probability sample of schools and students within individual catchment areas• Introduces randomization and “representativeness” to the recruitment• Still highly vulnerable to selection bias due to noncooperation by
schools and parents within schools
Demographic controls (targets) for site specific baseline samples and the national aggregate.• Achieve minimum sample sizes and covariate balance with respect to
the U.S. population of eligible children.• Minimize weighting inefficiencies in descriptive analysis
Informative Features of the ABCD Sample Design for Population-based Estimation and Inference
• Clustering of observations on ABCD children
Sites, schools, families, interviewers, imaging equipmentObservations not independent: Intra-class correlations. Approaches:
(1) Model the clustering as random effects. MLM(2) Use distribution-free robust methods that account for
clustering in variance estimation.
• Selectivity (random and nonrandom) of the sample selection/recruitment
Site selection, sample stratification, school consent, parental consentCovariate adjustment in analysis modelsCalibration weighting to established population controls
Estimation of ABCD Population Descriptive Statistics
• Examples of descriptive statisticsMeans, quantiles of continuous variables: BMI, NIH Tool Box test scores, polygenic scores, hippocampus volume, measures of neuron activity.Categorical proportions for binary, multinomial and count variables.
• Employ software specifically developed for design-based estimation from clustered sample data, e.g. R Survey library.
• Employ propensity-based population weight factor in estimation of the descriptive statistics.
Propensity-based Population Weight for ABCD Participants
• 2011-2015 American Community Survey (ACS) serves a benchmark for characterizing U.S. children age 9 and 10.
n=376,370 observations of 9,10 year olds and their familiesKey demographic and SES variables with consistent measurement in ACS and ABCD baseline are identified: age, sex, race/ethnicity, family income, family type, hhsize, Census region, parent employment
• Use logistic regression (with weight) to model the logit of the probability that Y=1, that the case belongs to ABCD vs. ACS
0 1 1(i) logit[Prob(Y=1)|X]= ... P PX Xˆ
ˆˆ( ) ( 1 | )1
X
X
eii p Y Xe
ˆ( ) 1 / ( 1 | )iii Weight p Y X
Fitted ABCD Propensity Model (Logistic)Predictor Category
Intercept -5.11 - - -Age 9 0.112 1.12 0.95 1.32Sex Male 0.037 1.04 0.88 1.22Race/Ethnicity White -0.787 0.46 0.34 0.61
Black -0.293 0.75 0.52 1.07Hispanic -0.849 0.43 0.31 0.60Asian -1.570 0.21 0.12 0.36
Family Income <$25K -0.711 0.49 0.34 0.71$25K-$49K -0.830 0.44 0.31 0.62$50K-$74K -0.705 0.49 0.35 0.69$75K-$99K -0.366 0.69 0.50 0.97$100K-$199K -0.149 0.86 0.64 1.15
Family Type Married 1.136 * * * Parent Employment Married, 2 in LF -0.846 * * *
Married, 1 in LF -1.037 * * *Married, O in LF -1.281 * * *Single, in LF 0.027 * * *
Region Northeast -0.424 0.65 0.51 0.84Midwest -0.489 0.61 0.48 0.78South -0.712 0.49 0.39 0.61
Household size 2-3 0.008 1.01 0.72 1.414 -0.115 0.89 0.66 1.205 -0.105 0.90 0.66 1.236 0.070 1.07 0.76 1.51
b̂ ˆ ˆ( )LCL ˆ( )UCL
Individual Weight Examples• Mean weight==(11,874/8,211,605)-1 ~ (0.00145)-1= 690
• Example 1: 9 year old African-American girl from New England who lives in a family of 4 with two working parents and a family income of $100K-$199K per year:
• Example 2: 10 year old girl of Asian ancestry residing in the South in a 4 person family with two parents who are not working and $25k-$49K total annual income:
1
1
exp( 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115)1 exp 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115
.003372 296.60
iW
1
1
exp( 5.11 1.570 0.830 1.136 1.281 0.115 0.712)1 exp 5.11 1.570 0.830 1.136 1.281 0.115 0.712
.000207 4828.09
iW
Distribution of ABCD Baseline Analysis Weights
ABCD October 25, 2018 Data Set.
05.
0e-0
4.0
01.0
015
Den
sity
0 500 1000 1500 2000rpwgtmeth1
Distribution of ABCD Analysis Weights by Sex of Child
ABCD Final Baseline Data Set. N=11,873.
Distributions of ABCD Analysis Weights by Family Income Category
ABCD Final Baseline Data Set. N=11,873.
ABCD Demographic Distributions.*Data through 10/25/2018.
Demographic/SES Characteristic
Category ABCD Sample ACS (Weighted)Unweighted Weighted **
n % % %Sex Male 6064 52.3% 51.2% 51.2%
Female 5530 47.7% 48.8% 48.8%
Age 9 6036 52.1% 49.6% 49.6%10 5558 47.9% 50.4% 50.4%
Race/Ethnicity Hispanic 2379 20.5% 24.0% 24.0%NH White 6104 52.6% 52.4% 52.4%NH Black 1683 14.5% 13.4% 13.4%
Asian 253 2.2% 3.6% 4.7%All Other 1175 10.1% 6.4% 5.5%
Total 11594 100.0% 100.0% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.
ABCD SES Distributions.*Data through 10/25/2018.
Demographic/SES Characteristic
CategoryABCD Sample ACS (Weighted)
Unweighted Weighted**
n % % %Family Income <$25,000 1782 15.4% 20.2% 21.5%
$25,000-$49,999 1735 15.0% 20.7% 21.7%
$50,000-$74,999 1628 14.0% 17.5% 17.0%
$75,000-$99,999 1646 14.2% 13.1% 12.5%
$100,000-$199,999
3500 30.2% 21.6% 20.5%
>=$200,000 1303 11.2% 7.0% 6.8%
Total** 11594 100.0% 100% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.
ABCD: Population Estimates by SES
Characteristic Category ABCD (Unweighted)
% (se)
ABCD (Weighted, Design Corrected)
Pooled %(se)
Not-Pooled %(se)
Family Income <$25K 16.2 (0.34) 20.0 (2.4) 20.0 (2.3)$25K-$49K 14.9 (0.33) 20.5 (1.6) 20.1 (1.6)$50K-$74K 13.8 (0.3) 17.5 (0.9) 17.0 (0.9)$75K-$99K 14.3 (0.3) 13.2 (0.8) 13.2 (0.8)$100K-$199K 29.6 (0.4) 21.7 (2.2) 22.4 (2.2)$200K + 11.2 (0.3) 7.1 (1.0) 7.4 (1.1)
Parent Employment
Married, 2 in LF 50.3 (0.5) 41.9 (1.9) 42.2 (1.9)Married, 1 in LF 21.9 (0.4) 23.1 (1.7) 23.2 (1.8)Married, O in LF 1.3 (0.1) 2.0 (0.3) 1.9 (0.2)Single, M, in LF 1.6 (0.1) 1.9 (0.2) 2.0 (0.2)Single, M, Not in LF 0.4 (0.1) 0.4 (0.1) 0.4 (0.1)Single, F, in LF 19.5 (0.4) 24.0 (1.6) 22.4 (2.2)Single, F, Not in LF 5.1 (0.2) 6.7 (0.6) 7.4 (1.1)
ABCD Final Baseline Data Set. N=11,873.
ABCD: Estimates of the Population Distribution of NIH Tool Box Flanker and Reading Test Scores (uncorrected)
ABCD Final Baseline Data Set. n=11,873
Variable Distribution Statistic
ABCD (Unweighted)
ABCD (Weighted, Design Corrected)
Pooled Not-Pooled NIH ToolBoxFlanker Test(Uncorrected)
n 11,712 11712 9999Mean 94.0 (0.08) 93.83 (0.30) 93.83 (0.27)Q5 75.98 (0.34) 75.52 (0.96) 75.62 (0.99)Q25 88.95 (0.14) 88.70 (0.50) 88.72 (0.48)Q50 (Median) 94.94 (0.10) 94.79 (0.26) 94.76 (0.22)Q75 99.79 (0.09) 99.65 (0.19) 99.64 (0.16)Q95 105.77 (0.11) 105.74 (0.18) 105.77 (0.16)
NIH ToolboxReading Test(Uncorrected)
n 11704 11704 9991Mean 90.86 (0.06) 90.60 (0.07) 90.74 (0.25)Q5 79.22 (0.18) 78.77 (0.49) 78.71 (0.48)Q25 88.69 (0.09) 86.40 (0.28) 86.46 (0.29)Q50 (Median) 90.25 (0.04) 90.05 (0.10) 90.20 (0.11)Q75 94.26 (0.06) 94.00 (0.31) 94.25 (0.16)Q95 101.37 (0.12) 101.25 (0.25) 101.47 (0.25)
ABCD: Population estimates by Number of Lifetime ER Visits
ABCD Early Release Data Set. Recruitment to 09/2017.
Variable Count of Visits
ABCD (Unweighted)
% (se)
ABCD (Weighted, Design Corrected)
Pooled %(se) Not-Pooled %(se)
Lifetime ER Visits 0 45.2 (0.5) 43.9 (1.4) 43.9 (1.5)1 25.5 (0.4) 25.1 (0.6) 25.1 (0.6)2 15.9 (0.3) 16.1 (0.4) 16.2 (0.7)3 10.8 (0.3) 11.8 (0.6) 11.6 (0.7)4 2.0 (0.1) 2.4 (0.3) 2.5 (0.3)5 0.5 (0.1) 0.7 (0.1) 0.7 (0.1)
Poisson Regression of LT ER Visits.
.
Regression Parameter
Regression Method Model Coefficient Relative Risk RatioParameter Estimate
Standard Error
Relative Risk
LCI UCI
Sex: Female MLE -0.148 0.019 0.86 0.83 0.90Design:Pooled -0.135 0.020 0.87 0.84 0.91Design:Not Pooled -0.130 0.020 0.88 0.84 0.91Model: 2 Level, All sites -0.149 0.016 0.86 0.83 0.89Model: 2 Level, No twin -0.151 0.016 0.86 0.83 0.89Model: 3 Level (DEAP) -0.145 0.021 0.87 0.83 0.90
FamInc: 25-49k OLS 0.131 0.033 1.14 1.07 1.22Design:Pooled 0.113 0.041 1.12 1.03 1.21Design:Not Pooled 0.120 0.045 1.13 1.03 1.23Model: 2 Level, All sites 0.144 0.039 1.15 1.07 1.25Model: 2 Level, No twin 0.152 0.039 1.16 1.08 1.26Model: 3 Level (DEAP) 0.141 0.040 1.15 1.06 1.25
Multivariate Modeling of ABCD Cross-sectional Relationships and Longitudinal Outcomes
• Multi-level modeling (ABCD DEAP)
Three levels with abcd_site and family defining the random effects at Level 3 and Level 2 (DEAP method)
Include key demographic and SES measures as Level 1 fixed effects/ covariates
Explore scientifically-justified first level interactions between key demographic and SES covariates
No current evidence to support recommendation on use of weights in multilevel analysis
Full model-based approach to analyzing ABCD data on developmental outcomes (DEAP GAMM4 model)
0 1
00 00
Y Level 1 Modelwhere:
1,..., indexes the individual cohort member;j=1,..., N indexes the cohort member's family;k=1,...,21 indexes the ABCD site/imaging center.
ijk jk jk i ijk
jk k
jk
jk
x
i
R
U
t
1 10 1
00 00 000
10 100 10
Level 2 Model InterceptLevel 2 Model Slope
Level 3 Model InterceptLevel 3 Model Slope
jk k jk
k
k k
k
U
VV
ABCD Data Exploration and Analysis Portal
DEAP: “Explore” Interface
DEAP: Multi-level Analysis Interface
ABCD Data Access
NIMH Data Archive (NDA)https://ndar.nih.gov/study.html?id=576
Thank you! Questions?
Supplemental slides
Standardized Measures: NIH Tool Box
http://www.healthmeasures.net/explore-measurement-systems/nih-toolbox
MRI Technology
How it this implemented?◦ Big magnet
◦ Focuses on magnetic properties of water◦ Cells have lots of water◦ Blood can be more or less magnetic depending
on how much oxygen is in it◦ Background:
◦ MRI machines vary in:◦ Imaging strength – 3T is the research standard◦ Bore size – how big the hole is◦ If optimized for brain imaging
◦ Head coil◦ Many now “research dedicated”
◦ Musts:◦ Researchers – have biophysics support/expertise◦ Participants: Lie still, tolerate the noise
Source: Luke Hyde, University of Michigan
MRI: Types of data collected
Structural/Anatomical MRI◦ Amount of grey matter◦ Brain region size,shape◦ Amount of corticol folding
◦ Gives you◦ Development of the brain over time – our understanding of
adolescents and normal/abnormal development◦ Individual differences in size/shape/density/function of brain areas
across individuals.◦ e.g., SES effects on neural structure, amygdala structure for
children with and without autism◦ For better or worse, this is compelling to the public/policy
makers◦ Predictor of a health or behavioral outcome
Source: Luke Hyde, University of Michigan
DTI: Types of data collected
Structural: White Matter◦ Diffusion Tensor Imaging (DTI)
◦ White matter are axons carrying information from one area to another, often in bundles
◦ Maps the “highways” of information◦ Uses the direction water is flowing
◦ Gives you:◦ Individual level tracts across whole brain◦ Look at individual differences in the development of these tracts◦ How they correlate with predictors or outcomes
Source: Luke Hyde, University of Michigan
fMRI:Types of data collectedFunctional MRI
◦ Uses BOLD (Blood Oxygen Level Dependent) signal◦ Blood changes in magnetization when oxygenated
◦ See which brain areas are activated as participants do a task – anything that involves thinking!◦ Not clear if input or output blood flow◦ Relatively “slow” – only every 2 seconds◦ Relatively “large” – 1 voxel = 2 x 2 x 2 mm
◦ Millions of neurons◦ Indirect measure of brain activity
◦ Can access most of the brain!◦ Can be:
◦ Task-based◦ “resting”
◦ Analyzed as:◦ Specific brain areas (brain mapping)◦ Networks and how they cohere
Source: Luke Hyde, University of Michigan
Understanding complex pathways
****
ns
Source: Luke Hyde, University of Michigan
Brain as predictor: So why use it?
Can be a better predictor than self-report◦ Berkman & Falk (2013). Beyond
brain mapping: Using neural measures to predict real-world outcomes. Current Directions in Psychological Science.
◦ Predicting treatment success◦ Neurofeedback
Can tell us more about the underlying thought process that can’t be reflected on.
◦ E.g., emotion versus cognitive areas
◦ Adolescents
Source: Luke Hyde, University of Michigan
Whole Brain and Cortical Surface Analysishttps://afni.nimh.nih.gov/
FreeSurfer: https://surfer.nmr.mgh.harvard.edu/