Neuroscience on a Population Scale: Design, Measurement ...

Steven G. Heeringa

Senior Research Scientist and Associate Director

Survey Research Center, Institute for Social Research

University of Michigan

Annual Workshop

Michigan Program in Survey Methods

University of Michigan

October 25, 2019

Neuroscience on a Population Scale:Design, Measurement, Data Integration and Analysis.

Outline and focus of today’s talk

• Unique challenges in population neuroscience studiesDesignMeasurementData types and integrationStatistical analysis and interpretation

• Adolescent Brain Cognitive Development (ABCD) Study

Design Challenges: Subject recruitment, sampling plans for population neuroscience research

Both internal and external validity are needed

Study participation is demanding and study consent and response is a major challenge.

Portability/geographic reach of the measurement technology

Full, conventional probability sample designs like that employed in the epidemiological studies such as the National Health and Nutrition Examination Survey (NHANES) may not be feasible.

Internal Validity, External Validity and “Population Representativeness” in Epidemiological Studies.

Keiding and Louis (2016), Perils and potentials of self-selected entry to epidemiological studies and surveys. J. R. Statist. Soc. A 179, Part 2, pp. 319–376

However, self-selection dramatically departs from the traditional, so-called gold standard approaches of targeted enrolment to scientific studies and sampling frame-based surveys. Traditionalists argue that we must adhere to the values of planned accrual and follow-up for all studies and identification of a sampling frame for surveys and possibly also for epidemiological and other such studies. Others propose that we should stop worrying about it and open up accrual, using modern approaches (covariate adjustments, find instrumental variables, ‘big data’) to make the necessary adjustments.

Elliott, M.R. and Richard L. Valliant. 2017 “Inference for nonprobability samples.” Statistical Science, 32(2):249-264.

Additional design/measurement challenges in population neuroscience studies.

The Brain R2 Problem: Multi-level (brain, individual, social networks, community, environment, culture).

Sample size and power to detect small effects and interactions.• Important outcomes are likely a function of many small effects.• Analogy to trends in analysis of genetic/epigenetic contributions to

health and other outcomes.

Types of Measurement in Population Neuroscience Studies

DNA

Epigenetics

Biomarkers

Child Assessments

Parent Interviews

Family History

Environment

School and AdminRecords

MRI

DTI

fMRI

Epidemiological, Statistical and Computational Challenges in Population Neuroscience

• Descriptive/Analytical?, “left brain”/”right brain”• Informative sampling, subject recruitment design• Selection mechanisms (e.g. bias) in recruitment• Lack of independence, correlated observation

• Geographic clustering of subjects• Clustering in measurement (centers, interviewers, instruments)

• Latent Constructs• Missing data• Longitudinal Analysis of growth, change• Analysis of Brain Image Data

• Spatial correlation (3D array of voxels)• Temporal correlation (time series, fMRI)• Image correction, registration/standardization, smoothing• Dimension reduction, creating summary measures

Analytic Aims for Population Neuroscience

Brain as Brain : Anatomy, networks

Brain as Outcome: e.g. Does regular marijuana use (screen time) during adolescent development affect brain development (morphology) or function.

Brain as Predictor: e.g. Educational attainment.

Computational and Statistical Tools• Whole Brain Computations and Graphical Display

• e.g. AFNI

• Cortical Surface Analysis• e.g. SUMA, FreeSurfer, SurfStat

• GLMs, GAMMs• R, SAS, Stata, etc.

• Machine learning (?)

• Pace of development and enhancement of methodology and software is rapid!

ADOLESCENT BRAIN COGNITIVE DEVELOPMENT STUDY

HTTPS:/ /ABCDSTUDY.ORG/INDEX.HTML

Overview of ABCD• ABCD is the largest long-term study of brain development in the

United States.

• Baseline cohort of n=11,875 children age 9-10 recruited 09/2016-10/2018.• Recruitment and data collection in 21 sites nationally• Coordinating site: University of CA, San Diego• Special twin sample recruitment (n=1727) in 4 sites

• Longitudinal design to follow baseline cohort until age 20.

• “Open science model” for hypothesis generation and data sharing.

• NIH and CDC Partners. NIH/NIDA Funded.

Distribution of ABCD Study Sites, https://abcdstudy.org/

ABCD Design Strategies to Maximize External Validity and Statistical Efficiency for Population-based Analysis

21 Sites and Catchment Areas• Nationally distributed• Demographically and socio-economically diverse• May be treated as a “pseudo” sample of primary stage units

Probability sample of schools and students within individual catchment areas• Introduces randomization and “representativeness” to the recruitment• Still highly vulnerable to selection bias due to noncooperation by

schools and parents within schools

Demographic controls (targets) for site specific baseline samples and the national aggregate.• Achieve minimum sample sizes and covariate balance with respect to

the U.S. population of eligible children.• Minimize weighting inefficiencies in descriptive analysis

Informative Features of the ABCD Sample Design for Population-based Estimation and Inference

• Clustering of observations on ABCD children

Sites, schools, families, interviewers, imaging equipmentObservations not independent: Intra-class correlations. Approaches:

(1) Model the clustering as random effects. MLM(2) Use distribution-free robust methods that account for

clustering in variance estimation.

• Selectivity (random and nonrandom) of the sample selection/recruitment

Site selection, sample stratification, school consent, parental consentCovariate adjustment in analysis modelsCalibration weighting to established population controls

Estimation of ABCD Population Descriptive Statistics

• Examples of descriptive statisticsMeans, quantiles of continuous variables: BMI, NIH Tool Box test scores, polygenic scores, hippocampus volume, measures of neuron activity.Categorical proportions for binary, multinomial and count variables.

• Employ software specifically developed for design-based estimation from clustered sample data, e.g. R Survey library.

• Employ propensity-based population weight factor in estimation of the descriptive statistics.

Propensity-based Population Weight for ABCD Participants

• 2011-2015 American Community Survey (ACS) serves a benchmark for characterizing U.S. children age 9 and 10.

n=376,370 observations of 9,10 year olds and their familiesKey demographic and SES variables with consistent measurement in ACS and ABCD baseline are identified: age, sex, race/ethnicity, family income, family type, hhsize, Census region, parent employment

• Use logistic regression (with weight) to model the logit of the probability that Y=1, that the case belongs to ABCD vs. ACS

0 1 1(i) logit[Prob(Y=1)|X]= ... P PX Xˆ

ˆˆ( ) ( 1 | )1

X

X

eii p Y Xe

ˆ( ) 1 / ( 1 | )iii Weight p Y X

Fitted ABCD Propensity Model (Logistic)Predictor Category

Intercept -5.11 - - -Age 9 0.112 1.12 0.95 1.32Sex Male 0.037 1.04 0.88 1.22Race/Ethnicity White -0.787 0.46 0.34 0.61

Black -0.293 0.75 0.52 1.07Hispanic -0.849 0.43 0.31 0.60Asian -1.570 0.21 0.12 0.36

Family Income <$25K -0.711 0.49 0.34 0.71$25K-$49K -0.830 0.44 0.31 0.62$50K-$74K -0.705 0.49 0.35 0.69$75K-$99K -0.366 0.69 0.50 0.97$100K-$199K -0.149 0.86 0.64 1.15

Family Type Married 1.136 * * * Parent Employment Married, 2 in LF -0.846 * * *

Married, 1 in LF -1.037 * * *Married, O in LF -1.281 * * *Single, in LF 0.027 * * *

Region Northeast -0.424 0.65 0.51 0.84Midwest -0.489 0.61 0.48 0.78South -0.712 0.49 0.39 0.61

Household size 2-3 0.008 1.01 0.72 1.414 -0.115 0.89 0.66 1.205 -0.105 0.90 0.66 1.236 0.070 1.07 0.76 1.51

b̂ ˆ ˆ( )LCL ˆ( )UCL

Individual Weight Examples• Mean weight==(11,874/8,211,605)-1 ~ (0.00145)-1= 690

• Example 1: 9 year old African-American girl from New England who lives in a family of 4 with two working parents and a family income of $100K-$199K per year:

• Example 2: 10 year old girl of Asian ancestry residing in the South in a 4 person family with two parents who are not working and $25k-$49K total annual income:

1

1

exp( 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115)1 exp 5.11 0.112 0.293 0.149 1.136 0.846 0.424 0.115

.003372 296.60

iW

1

1

exp( 5.11 1.570 0.830 1.136 1.281 0.115 0.712)1 exp 5.11 1.570 0.830 1.136 1.281 0.115 0.712

.000207 4828.09

iW

Distribution of ABCD Baseline Analysis Weights

ABCD October 25, 2018 Data Set.

05.

0e-0

4.0

01.0

015

Den

sity

0 500 1000 1500 2000rpwgtmeth1

Distribution of ABCD Analysis Weights by Sex of Child

ABCD Final Baseline Data Set. N=11,873.

Distributions of ABCD Analysis Weights by Family Income Category


ABCD Demographic Distributions.*Data through 10/25/2018.

Demographic/SES Characteristic

Category ABCD Sample ACS (Weighted)Unweighted Weighted **

n % % %Sex Male 6064 52.3% 51.2% 51.2%

Female 5530 47.7% 48.8% 48.8%

Age 9 6036 52.1% 49.6% 49.6%10 5558 47.9% 50.4% 50.4%

Race/Ethnicity Hispanic 2379 20.5% 24.0% 24.0%NH White 6104 52.6% 52.4% 52.4%NH Black 1683 14.5% 13.4% 13.4%

Asian 253 2.2% 3.6% 4.7%All Other 1175 10.1% 6.4% 5.5%

Total 11594 100.0% 100.0% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.

ABCD SES Distributions.*Data through 10/25/2018.

Demographic/SES Characteristic

CategoryABCD Sample ACS (Weighted)

Unweighted Weighted**

n % % %Family Income <$25,000 1782 15.4% 20.2% 21.5%

$25,000-$49,999 1735 15.0% 20.7% 21.7%

$50,000-$74,999 1628 14.0% 17.5% 17.0%

$75,000-$99,999 1646 14.2% 13.1% 12.5%

$100,000-$199,999

3500 30.2% 21.6% 20.5%

>=$200,000 1303 11.2% 7.0% 6.8%

Total** 11594 100.0% 100% 100.0%*Percentages may not add to 100% due to rounding. Item missing data singly imputed using SAS Proc MI, FCS method.**Inverse propensity weighting to joint distributions from 2011-2015 ACS.

ABCD: Population Estimates by SES

Characteristic Category ABCD (Unweighted)

% (se)

ABCD (Weighted, Design Corrected)

Pooled %(se)

Not-Pooled %(se)

Family Income <$25K 16.2 (0.34) 20.0 (2.4) 20.0 (2.3)$25K-$49K 14.9 (0.33) 20.5 (1.6) 20.1 (1.6)$50K-$74K 13.8 (0.3) 17.5 (0.9) 17.0 (0.9)$75K-$99K 14.3 (0.3) 13.2 (0.8) 13.2 (0.8)$100K-$199K 29.6 (0.4) 21.7 (2.2) 22.4 (2.2)$200K + 11.2 (0.3) 7.1 (1.0) 7.4 (1.1)

Parent Employment

Married, 2 in LF 50.3 (0.5) 41.9 (1.9) 42.2 (1.9)Married, 1 in LF 21.9 (0.4) 23.1 (1.7) 23.2 (1.8)Married, O in LF 1.3 (0.1) 2.0 (0.3) 1.9 (0.2)Single, M, in LF 1.6 (0.1) 1.9 (0.2) 2.0 (0.2)Single, M, Not in LF 0.4 (0.1) 0.4 (0.1) 0.4 (0.1)Single, F, in LF 19.5 (0.4) 24.0 (1.6) 22.4 (2.2)Single, F, Not in LF 5.1 (0.2) 6.7 (0.6) 7.4 (1.1)


ABCD: Estimates of the Population Distribution of NIH Tool Box Flanker and Reading Test Scores (uncorrected)

ABCD Final Baseline Data Set. n=11,873

Variable Distribution Statistic

ABCD (Unweighted)


Pooled Not-Pooled NIH ToolBoxFlanker Test(Uncorrected)

n 11,712 11712 9999Mean 94.0 (0.08) 93.83 (0.30) 93.83 (0.27)Q5 75.98 (0.34) 75.52 (0.96) 75.62 (0.99)Q25 88.95 (0.14) 88.70 (0.50) 88.72 (0.48)Q50 (Median) 94.94 (0.10) 94.79 (0.26) 94.76 (0.22)Q75 99.79 (0.09) 99.65 (0.19) 99.64 (0.16)Q95 105.77 (0.11) 105.74 (0.18) 105.77 (0.16)

NIH ToolboxReading Test(Uncorrected)

n 11704 11704 9991Mean 90.86 (0.06) 90.60 (0.07) 90.74 (0.25)Q5 79.22 (0.18) 78.77 (0.49) 78.71 (0.48)Q25 88.69 (0.09) 86.40 (0.28) 86.46 (0.29)Q50 (Median) 90.25 (0.04) 90.05 (0.10) 90.20 (0.11)Q75 94.26 (0.06) 94.00 (0.31) 94.25 (0.16)Q95 101.37 (0.12) 101.25 (0.25) 101.47 (0.25)

ABCD: Population estimates by Number of Lifetime ER Visits

ABCD Early Release Data Set. Recruitment to 09/2017.

Variable Count of Visits

ABCD (Unweighted)

% (se)


Pooled %(se) Not-Pooled %(se)

Lifetime ER Visits 0 45.2 (0.5) 43.9 (1.4) 43.9 (1.5)1 25.5 (0.4) 25.1 (0.6) 25.1 (0.6)2 15.9 (0.3) 16.1 (0.4) 16.2 (0.7)3 10.8 (0.3) 11.8 (0.6) 11.6 (0.7)4 2.0 (0.1) 2.4 (0.3) 2.5 (0.3)5 0.5 (0.1) 0.7 (0.1) 0.7 (0.1)

Poisson Regression of LT ER Visits.

.

Regression Parameter

Regression Method Model Coefficient Relative Risk RatioParameter Estimate

Standard Error

Relative Risk

LCI UCI

Sex: Female MLE -0.148 0.019 0.86 0.83 0.90Design:Pooled -0.135 0.020 0.87 0.84 0.91Design:Not Pooled -0.130 0.020 0.88 0.84 0.91Model: 2 Level, All sites -0.149 0.016 0.86 0.83 0.89Model: 2 Level, No twin -0.151 0.016 0.86 0.83 0.89Model: 3 Level (DEAP) -0.145 0.021 0.87 0.83 0.90

FamInc: 25-49k OLS 0.131 0.033 1.14 1.07 1.22Design:Pooled 0.113 0.041 1.12 1.03 1.21Design:Not Pooled 0.120 0.045 1.13 1.03 1.23Model: 2 Level, All sites 0.144 0.039 1.15 1.07 1.25Model: 2 Level, No twin 0.152 0.039 1.16 1.08 1.26Model: 3 Level (DEAP) 0.141 0.040 1.15 1.06 1.25

Multivariate Modeling of ABCD Cross-sectional Relationships and Longitudinal Outcomes

• Multi-level modeling (ABCD DEAP)

Three levels with abcd_site and family defining the random effects at Level 3 and Level 2 (DEAP method)

Include key demographic and SES measures as Level 1 fixed effects/ covariates

Explore scientifically-justified first level interactions between key demographic and SES covariates

No current evidence to support recommendation on use of weights in multilevel analysis

Full model-based approach to analyzing ABCD data on developmental outcomes (DEAP GAMM4 model)

0 1

00 00

Y Level 1 Modelwhere:

1,..., indexes the individual cohort member;j=1,..., N indexes the cohort member's family;k=1,...,21 indexes the ABCD site/imaging center.

ijk jk jk i ijk

jk k

jk

jk

x

i

R

U

t

1 10 1

00 00 000

10 100 10

Level 2 Model InterceptLevel 2 Model Slope

Level 3 Model InterceptLevel 3 Model Slope

jk k jk

k

k k

k

U

VV

ABCD Data Exploration and Analysis Portal

DEAP: “Explore” Interface

DEAP: Multi-level Analysis Interface

ABCD Data Access

NIMH Data Archive (NDA)https://ndar.nih.gov/study.html?id=576

Thank you! Questions?

[email protected].

Supplemental slides

Standardized Measures: NIH Tool Box

http://www.healthmeasures.net/explore-measurement-systems/nih-toolbox

MRI Technology

How it this implemented?◦ Big magnet

◦ Focuses on magnetic properties of water◦ Cells have lots of water◦ Blood can be more or less magnetic depending

on how much oxygen is in it◦ Background:

◦ MRI machines vary in:◦ Imaging strength – 3T is the research standard◦ Bore size – how big the hole is◦ If optimized for brain imaging

◦ Head coil◦ Many now “research dedicated”

◦ Musts:◦ Researchers – have biophysics support/expertise◦ Participants: Lie still, tolerate the noise

Source: Luke Hyde, University of Michigan

MRI: Types of data collected

Structural/Anatomical MRI◦ Amount of grey matter◦ Brain region size,shape◦ Amount of corticol folding

◦ Gives you◦ Development of the brain over time – our understanding of

adolescents and normal/abnormal development◦ Individual differences in size/shape/density/function of brain areas

across individuals.◦ e.g., SES effects on neural structure, amygdala structure for

children with and without autism◦ For better or worse, this is compelling to the public/policy

makers◦ Predictor of a health or behavioral outcome


DTI: Types of data collected

Structural: White Matter◦ Diffusion Tensor Imaging (DTI)

◦ White matter are axons carrying information from one area to another, often in bundles

◦ Maps the “highways” of information◦ Uses the direction water is flowing

◦ Gives you:◦ Individual level tracts across whole brain◦ Look at individual differences in the development of these tracts◦ How they correlate with predictors or outcomes


fMRI:Types of data collectedFunctional MRI

◦ Uses BOLD (Blood Oxygen Level Dependent) signal◦ Blood changes in magnetization when oxygenated

◦ See which brain areas are activated as participants do a task – anything that involves thinking!◦ Not clear if input or output blood flow◦ Relatively “slow” – only every 2 seconds◦ Relatively “large” – 1 voxel = 2 x 2 x 2 mm

◦ Millions of neurons◦ Indirect measure of brain activity

◦ Can access most of the brain!◦ Can be:

◦ Task-based◦ “resting”

◦ Analyzed as:◦ Specific brain areas (brain mapping)◦ Networks and how they cohere


Understanding complex pathways

****

ns


Brain as predictor: So why use it?

Can be a better predictor than self-report◦ Berkman & Falk (2013). Beyond

brain mapping: Using neural measures to predict real-world outcomes. Current Directions in Psychological Science.

◦ Predicting treatment success◦ Neurofeedback

Can tell us more about the underlying thought process that can’t be reflected on.

◦ E.g., emotion versus cognitive areas

◦ Adolescents


Whole Brain and Cortical Surface Analysishttps://afni.nimh.nih.gov/

FreeSurfer: https://surfer.nmr.mgh.harvard.edu/

Neuroscience on a Population Scale: Design, Measurement ...

Documents

Transcript of Neuroscience on a Population Scale: Design, Measurement ...