Toward a rigorous statistical framework for brain mapping€¦ · Toward a rigorous statistical...
Transcript of Toward a rigorous statistical framework for brain mapping€¦ · Toward a rigorous statistical...
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 1
Toward a rigorous statistical framework for brain mapping
Bertrand Thirion, [email protected]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 2
Cognitive neuroscience
How are cognitive activities affected or controlled by neural circuits in the brain ?
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 3
The brain, the mind and the scanner
Cognitive theories
Brain
Sca
nn
er
FMRI data
Brain mapping
Experimental paradigm
stimuli
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 4
The brain, the mind and the scanner
Cognitive theories
Brain
Sca
nn
er
FMRI data
Decoding
Encoding
Brain mapping
Experimental paradigm
stimuli
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 5
Mapping cognitive functions to brain activity
right hand-left hand
listen-read
Button press - reading
Computation - instructions
expression - intention
Guess the gender
Face trustworthiness
false belief – mechanistic(auditory) False belief –
mechanistic(visual)
Grasping-orientation judgement
Hand – side judgement
Intention - random
Sentence - checkerboards
saccades - fixation
Speech-non speech
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 6
Resolution increases
2007: 3 mm
2014: 1.5 mm
2020: 0.5 mm ?
p = 50,000 p = 400,000 p = 107
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 7
better estimators for large-scale brain imaging
● Massive online dictionary learning● Dimension reduction for images● Fast regularized ensembles of models● Statistical inference for high-dimensional models
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 8
fMRI datasets are feature-rich
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 9
Discovering structure in fMRI
Can be captured by dictionary learning / sparse coding [Olshausen Nature 1996]→ Use of sparse PCA
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 10
High-dimensional fMRI
● n = number of samples, 102 to 106
● p = number of voxels, 105-106
“Has anyone on the ML run group-wise analysis on the HCP resting state data, and if so what tools did you use?
I am having memory issues when running more than 10 subjects and I was wondering if anyone has a way of getting around the large memory requirements when concatenating in time.”
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 11
Huge ?
● Human Connectome project n=2.106, p=2.105, 2TB of data
● Online dictionary learning [Mairal et al. ICML 2009]
● Constrained rather than penalized formulation ● How to go faster ?
– Work on batches of images and voxels● Online method in both samples and feature dimensions
[Mensch et al. ICML 2016, IEEE TSP 2018]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 12
Stochastic gradient approacheshttp://amensch.fr/research/2016/06/10/modl.html
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 13
Stochastic gradient approaches
10-fold gain in CPU time without loss in accuracy
Can be used for recommender systems
http://amensch.fr/research/2016/06/10/modl.html
[Mensch et al. ICML 2016,
IEEE TSP 2018]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 14
Brain atlases
[Mensch et al. ICML 2016 IEEE TSP 2018]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 15
Leveraging rest data for brain decoding
Different datasets share some common patterns
Different datasets share some common representations[Bzdok et al. Plos Comp Biol 2016, Mensch et al NIPS 2017]
Cognitive representation
Spatial representation
FACES
VIDEO ONLY
HORIZONTAL CHECKERBOARD
HCP
Archi
CamCan
Joint training
Task fMRI study
Z-score images
Cognitive-aware projection
Per-dataset classification
Sparse decompositions
Resting-state data
~ 2
00
00
0
~ 3
36
~ 5
0
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 16
Advantage of large-scale analysis
Information transferred from large datasets (HCP) to smaller ones increases classification accuracy[Mensch et al NIPS 2017]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 17
Outline
● Massive online dictionary learning● Dimension reduction for images● Fast regularized ensembles of Models● Statistical inference for high-dimensional
models
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 18
Compression in the image domain
● Reduce the complexity of learning algorithms: p→k ≪ p
● Random projections = fast generic solution, but – Sub-optimal for structured signals
– Not invertible when p and k are large
● Local redundancy → feature grouping strategies / clustering: “super-pixels”– Fast clustering procedures needed (large k regime)
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 19
Compression by feature grouping
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 20
Crafting good image compression
● Key assumption: signal of interest L-Lipschitz
● Feature grouping matrix
● almost trivially:
● And
● Worst case
Need a fast method to learn balanced clusters
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 21
Denoising properties
● Noisy signal model
● Denoising
● Equal-size clusters
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 22
Recursive nearest neighbor
Based on local decisions = fast (linear time) – avoid percolation
[Thirion et al. Stamlins 2015]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 23
Effect on data analysis tasks
Impressive speed-up and increased accuracy with respect to non-compressed representation
– Clustering has a denoising effect
[Hoyos Idrobo IEEE PAMI under revision]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 24
More results
[Hoyos Idrobo IEEE PAMI under revision]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 25
Outline
● Massive online dictionary learning● Dimension reduction for images● Fast regularized ensembles of Models● Statistical inference for high-dimensional
models
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 26
Brain activity decoding
X1
X2
Xp
... y
● behavior = f (brain activity)
w
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 27
Penalized linear regression
> convex optimization> set hyperparameters by cross-validation
Ridge (shrinkage)
Lasso (very sparse)
Elastic net (sparsity + grouping)
Smooth lasso (sparsity + smoothness)
Total variation (piecewise sparsity)
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 28
Structure-inducing priors
● Large p → redundancy, latent structure● Brain imaging: spatial regularity → small total
variationHCP dataset: “shape” versus “face” contrast across subjects
[Michel et al TMI 2011, Gramfort et al. 2013 Eickenberg et al. MICCAI 2015, Dohmatob et al. PRNI 2014, 2015]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 29
Optimizing TV takes time
[Varoquaux et al. GRETSI 2015]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 30
Bagging of sparse clustered models
X
Clustering(create contiguous regions)
y
Solve Lasso
on cluster-based
representation
average
...
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 31
Computationally efficient structure
State of the art solution: not very stable, but cheap
“fast regularized ensembles of models”
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 32
Computationally efficient structure
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 33
Computationally efficient structure
“fast regularized ensembles of models”
[Hoyos Idrobo et al PRNI 2015, Neuroimage 2017, PAMI under review]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 34
Computationally efficient structure
[Hoyos Idrobo et al PRNI 2015, Neuroimage 2017, PAMI under review]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 35
Computationally efficient structure
[Hoyos Idrobo et al PRNI 2015, Neuroimage 2017, PAMI under review]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 36
Benchmark
HCP dataset: “shape” versus “face” contrast across subjects
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 37
Outline
● Massive online dictionary learning● Dimension reduction for images● Fast regularized ensembles of Models● Statistical inference for high-dimensional
models
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 38
Statistical inference on w
● Standard solutions for high-dimensional linear models (p > n)– Corrected ridge
– Desparsified Lasso
● Adaptation to brain imaging (p≫n)
● Inference: find {j: wj > 0} with some statistical guarantees
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 39
Desparsified Lasso
[Zhang & Zhang 2014 Series B Stat Meth]
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 40
Desparsified Lasso
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 41
Desparsified Lasso: which λ
[zhang & zhang 2014 Series B Stat Meth]
● For each j,
– ηj should be as small as possible
● keep λ small
– τj should be as high as possible
● λ not too small
Evaluating η and τ for many λ’s is expensive
→ We choose λ = .03 λmax
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 42
Preliminary assessment
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 43
Preliminary assessment
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 44
Preliminary assessment
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 45
Adaptation to brain imaging
Step 1: clustering
Step 2: inference on compressed representations
Step 3: repeat on different parcellations and aggregate the p-values(FReM-like approach)
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 46
Some initial results
DL p-values from different clusterings
aggregation
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 47
Some initial results
DL p-values from different clusterings
aggregation
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 48
Conclusion
● Large-p data bring challenges:– Computation cost
– Overfit
– Difficulty of statistical inference
● Solutions: online learning, subsampling, compression
● Ensembling improves estimators
● Open frontiers: statistical inference
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 49
From good ideas to good practices: software
● Machine learning in Python
● Machine learning for neuroimaging http://nilearn.github.io
● BSD, Python, OSS
– Classification of (neuroimaging) data
– Network analysis
19/01/2018 Bertrand Thirion – Statistics/Learning at Paris-Saclay 50
AcknowledgementsParietalG. Varoquaux,A. Gramfort,P. Ciuciu,D. Wassermann,D. Engemann,A. Manoel,D. ChyzhykA.L. Grilo Pinho,E. Dohmatob,A. Mensch,J.A. Chevalier,A. Hoyos idrobo,D. Bzdok,J. Dockès,P. Cerda,C. LazarusD. La RoccaG. LemaitreL. El GueddariO. GriselM. MassiasP. AblinH. JanatiJ. MassichK. DadiC. Petitot
Other collaborators (thanks for the data)S. Dehaene R. Poldrack, J. HaxbyC. F. GorgolevskiJ. Salmon