Three testing perspective on connectome data - Start-up ...

32
Three testing perspective on connectome data Start-up research - Follow up A. Cabassi , A.Casa , M.Fontana , M. Russo & A. Farcomeni University of Cambridge Università degli Studi di Padova Politecnico di Milano Sapienza Università di Roma Palermo, th June

Transcript of Three testing perspective on connectome data - Start-up ...

Page 1: Three testing perspective on connectome data - Start-up ...

Three testing perspective on connectome dataStart-up research - Follow up

y A. Cabassi1, A.Casa2, M.Fontana3, M. Russo2 & A. Farcomeni4

University of Cambridge1

Università degli Studi di Padova2

Politecnico di Milano3

Sapienza Università di Roma4

Palermo, 19th June 2018

Page 2: Three testing perspective on connectome data - Start-up ...

Data and Motivation 1/26

DataandMotivation

Page 3: Three testing perspective on connectome data - Start-up ...

Data

Data and Motivation 2/26

Multimodal and mixed-domain data:Structural networks: anatomical interconnections among brainregions;Dynamic functional activity: dynamical activity of each brainregion during fMRI;Functional networks: synchronization in brain activity for eachpair of brain regions

V = 70 brain regions with corresponding location and lobeinformation;

n = 24 subjects with k = 2 scans each and additional informationon age, handedness and psychological traits.

Page 4: Three testing perspective on connectome data - Start-up ...

Aim and motivation

Data and Motivation 3/26

Provide some insights about some of the statistical issues arisingwhen dealing with analysis of MRI scans;

Different perspectives and goals:1. Test correspondence among anatomical and functionalconnectivity;

2. Check quality of available data estimating the effectivenumber of white matter fibers connecting brain regions;

3. Define a metric for functional networks in order to test fordifferences in functional connectivity of different groups ofpeople.

Page 5: Three testing perspective on connectome data - Start-up ...

Functional correlations in connectomic maps 4/26

Functional correlationsin connectomicmaps

Page 6: Three testing perspective on connectome data - Start-up ...

Background

Functional correlations in connectomic maps 5/26

Neurological hypothesis: functional connectome is strongly relatedto the underlying structural networks;

Nature of this relation is not completely clear yet:Is it possible to infer anatomical connections from functionalones?How does the relation vary with time?

Aim: Is the absence of white matter fiber connecting brain regionsreflected in their functional correlation?

Page 7: Three testing perspective on connectome data - Start-up ...

Literature review

Functional correlations in connectomic maps 6/26

Graphical models: probabilistic models where a graph is used toexpress the conditional dependence between sets of randomvariables;

Let X be an n × p matrix with Xi1, . . . , Xip ∼ N(0, Σ) and denotethe precision matrix as Θ = Σ−1;

Associating a node to each variable, the absence of an edgeconnecting nodes i and j indicates conditional independenceamong Xi and Xj;

Maximizing Lp(Θ) = log`Θ` − tr(SΘ), where S = XTX/n, we getΘ = S−1→ what if p > n?

Page 8: Three testing perspective on connectome data - Start-up ...

Literature review

Functional correlations in connectomic maps 7/26

Friedman et al. (2008) proposed the graphical lasso;

Idea: minimize the penalized profile likelihood

Lpen,p = log`Θ` − tr(SΘ) − λ``Θ``1

where ``Θ``1 =∑i,jΘij;

It provides Θ even when S is singular and induces a sparserepresentation of the dependence among observed variables.

Several inferential tools proposed to test if conditionaldependences are statistically significant.

Page 9: Three testing perspective on connectome data - Start-up ...

Proposed methodology

Functional correlations in connectomic maps 8/26

Proposal: parametric bootstrap based test to check if absence ofwhite matter fiber between regions is reflected in absence of afunctional correlation among them;

More sintetically:

H0 : Ω = Ω0 vs H1 : Ω , Ω0

where Ω and Ω0 are correlations matrices with the second oneconstrained by external information.

Page 10: Three testing perspective on connectome data - Start-up ...

Proposed methodology

Functional correlations in connectomic maps 9/26

X, n × p matrix of functional activities of p brain regions measuredon n subjects, while D is a p × p structural network matrix;

Estimate, via glasso, C∗ s.t. (C∗)−1ij = 0 iff Dij = 0 for all n subjectsand obtain C∗1 , . . . , C

∗B matrices sampling from Wishart distribution

with scale matrix C∗;

Let S(c) be the sum of squared correlations among unconnectedregions→ compare it with the bootstrap distribution of S(c∗i ) with i, . . . , B;

Compute bootstrap p-value.

Page 11: Three testing perspective on connectome data - Start-up ...

Results

Functional correlations in connectomic maps 10/26

0

250

500

750

1000

1250

75 80 85

S(C*)

coun

t

1st scan

0

400

800

1200

85 90 95 100 105 110

S(C*)

coun

t

2nd scan

Temporal dimension is stacked allowing to consider Wishartdistribution as the sampling one;

Results are consisten with usual assumption in neuroscientificcommunity;

Similar results have been obtained considering LRT, even if testshave different rationale.

Page 12: Three testing perspective on connectome data - Start-up ...

Conclusions

Functional correlations in connectomic maps 11/26

We propose a simple and fast test to study the relation betweenfunctional and anatomical connectivity among brain regions;

New directions

Study in greater details the properties of the test (e.g. the power)and compare with other solutions;

Handle carefully time information;

Incorporate spatial information (distances between regions) andcharacteristics of the specific subjects.

Page 13: Three testing perspective on connectome data - Start-up ...

Bayesian method for fiber count validation 12/26

Bayesianmethod forfiber countvalidation

Page 14: Three testing perspective on connectome data - Start-up ...

DTI white matter fiber count validation

Bayesian method for fiber count validation 13/26

DTI scan is a ratherapproximate techniques.

It includes multiple source ofvariability⇒ Scanner, lab,pre-processing & Individual.

This uncertainty might lead tomisleading results.

To achieve more robust results we aim to estimate the unknownnumber of white matter fiber for each pair of brain region.

We propose a hierarchical Bayesian model.

Page 15: Three testing perspective on connectome data - Start-up ...

Our proposed approach

Bayesian method for fiber count validation 14/26

(nkij : k = 1, . . . , K) ∼ Bin(Mij, πj),

logit(πj) = αj + αMatchHemispherej,

Mij ∼ Pois(λij),

log(λij) = βi + βj + βagei.

Page 16: Three testing perspective on connectome data - Start-up ...

Application to DTI data I

Bayesian method for fiber count validation 15/26

Model based average of the weighted structural network

3.5 6.5 10.0 13.5

Sample average of the weighted structural network

0.0 2.5 5.0 7.5 10.0

Identified active areas agree with observed ones.

As expected we find an higher number of white matter fibers.

Page 17: Three testing perspective on connectome data - Start-up ...

Application to DTI data II

Bayesian method for fiber count validation 16/26

the distribution of πjs givesinfo on regions in which iseasier to observeconnections.

Connection with highprobabilities to be observedshare the same righthemisphere.

Page 18: Three testing perspective on connectome data - Start-up ...

Conclusion

Bayesian method for fiber count validation 17/26

DTI are still a valid source of information even if they should beused with care.

Pre-processing and external source of information should bealways be included in the model.

In our opinion our proposed approach can mitigate undercounteffect and be integrated in more refined analysis

Page 19: Three testing perspective on connectome data - Start-up ...

Object-Oriented analysis of network data 18/26

Object-Orientedanalysisofnetworkdata

Page 20: Three testing perspective on connectome data - Start-up ...

Goal

Object-Oriented analysis of network data 19/26

Object Oriented Data Analysis: statistics for complex objectsE.g. Directed acyclic graphs, tensors, shapes, images, networks.Main idea:

Consider complex objects as the statistical units of our analysis;

Analyse the data in the mathematical space in which they live.

Our goal: To define an object-oriented framework for structural andfunctional networks. In particular, we wish to define:

A distance between networks;

A test for the equality of the average networks of multiple groups.

Page 21: Three testing perspective on connectome data - Start-up ...

Literature review

Object-Oriented analysis of network data 20/26

Reducing each observed network to a vector of summary statistics.

Univariate testing approaches considering each edge separatelyadjusted to control FDR or FWER taking into account the networkstructure.

Use of auxiliary data (e.g. spatial proximity) to inform the posteriorprobability that some pairs of nodes interact differently.

Durante and Dunson (2017) develop a Bayesian procedure forinference and testing of group differences in the network structure.

Ginestet et al. (2017) test the equality of two groups of networksusing the concept of Fréchet mean of networks and deriving a CLTfor sequences of network averages, using the Euclidean distance.

Page 22: Three testing perspective on connectome data - Start-up ...

Literature review

Object-Oriented analysis of network data 20/26

Reducing each observed network to a vector of summary statistics.

Univariate testing approaches considering each edge separatelyadjusted to control FDR or FWER taking into account the networkstructure.

Use of auxiliary data (e.g. spatial proximity) to inform the posteriorprobability that some pairs of nodes interact differently.

Durante and Dunson (2017) develop a Bayesian procedure forinference and testing of group differences in the network structure.

Ginestet et al. (2017) test the equality of two groups of networksusing the concept of Fréchet mean of networks and deriving a CLTfor sequences of network averages, using the Euclidean distance.

Page 23: Three testing perspective on connectome data - Start-up ...

Literature review

Object-Oriented analysis of network data 20/26

Reducing each observed network to a vector of summary statistics.

Univariate testing approaches considering each edge separatelyadjusted to control FDR or FWER taking into account the networkstructure.

Use of auxiliary data (e.g. spatial proximity) to inform the posteriorprobability that some pairs of nodes interact differently.

Durante and Dunson (2017) develop a Bayesian procedure forinference and testing of group differences in the network structure.

Ginestet et al. (2017) test the equality of two groups of networksusing the concept of Fréchet mean of networks and deriving a CLTfor sequences of network averages, using the Euclidean distance.

Page 24: Three testing perspective on connectome data - Start-up ...

Distances

Object-Oriented analysis of network data 21/26

Procrustes size-and-shape distance

dP(G1, G2) = infR∈O(D)

‖L1 − L2R‖ (1)

whereLi decomposition of Gi s.t. Gi = LiL′i , i = 1, 2;‖·‖ Frobenius norm;D set of unitary operators.

Gromov-Wasserstein distance

dGH =12infR‖dX(x, x′) − dY(y, y′)‖LpR×R (2)

whereR ∈ R(X; Y), set of all correspondences between X and Y;(X, dX) and (Y, dY) compact metric spaces.

Page 25: Three testing perspective on connectome data - Start-up ...

Exploratory analysis

Object-Oriented analysis of network data 22/26

Heatmap of the Procrustesdistances.

Hierarchical clustering.

Page 26: Three testing perspective on connectome data - Start-up ...

Test for the equality

Object-Oriented analysis of network data 23/26

G11, . . . , GN11 and G12, . . . , GN22 two groups of adjacency matrices,iid samples from 2 random processes with mean Γ1 and Γ2.

H0 : Γ1 = Γ2 against H1 : Γ1 , Γ2. (3)

Similar strategy to the one used by Pigoli et al. (2014) for testing theequality of covariance operators of functional data, i.e. reformulate testas

H0 : d(Γ1, Γ2) = 0 against H1 : d(Γ1, Γ2) > 0 (4)

Also possible to extend to the case of multiple groups (Cabassi et al.2017).

Page 27: Three testing perspective on connectome data - Start-up ...

Test for the equality

Object-Oriented analysis of network data 23/26

G11, . . . , GN11 and G12, . . . , GN22 two groups of adjacency matrices,iid samples from 2 random processes with mean Γ1 and Γ2.

H0 : Γ1 = Γ2 against H1 : Γ1 , Γ2. (3)

Similar strategy to the one used by Pigoli et al. (2014) for testing theequality of covariance operators of functional data, i.e. reformulate testas

H0 : d(Γ1, Γ2) = 0 against H1 : d(Γ1, Γ2) > 0 (4)

Also possible to extend to the case of multiple groups (Cabassi et al.2017).

Page 28: Three testing perspective on connectome data - Start-up ...

Two-sample permutation test

Object-Oriented analysis of network data 24/26

Given a sample G1, . . . , GN of independent and identically distributedobservations, the sample Fréchet mean is

Γ = arg infΓ

N∑n=1

d(Gn, Γ)2. (5)

Algorithm

1. Compute d(Γ1, Γ2), with Γi sample Fréchet mean of group i;2. Apply B random permutations to the labels of the sample graphs;3. For each of them compute d(Γ∗1 , Γ

∗2 );

4. The p-value of the test is

λ =

∑1d(Γ∗1 , Γ

∗2 ) ≥ d(Γ1, Γ2)

B.

Page 29: Three testing perspective on connectome data - Start-up ...

Two-sample permutation test

Object-Oriented analysis of network data 24/26

Given a sample G1, . . . , GN of independent and identically distributedobservations, the sample Fréchet mean is

Γ = arg infΓ

N∑n=1

d(Gn, Γ)2. (5)

Algorithm

1. Compute d(Γ1, Γ2), with Γi sample Fréchet mean of group i;2. Apply B random permutations to the labels of the sample graphs;3. For each of them compute d(Γ∗1 , Γ

∗2 );

4. The p-value of the test is

λ =

∑1d(Γ∗1 , Γ

∗2 ) ≥ d(Γ1, Γ2)

B.

Page 30: Three testing perspective on connectome data - Start-up ...

Tests for the real data

Object-Oriented analysis of network data 25/26

Test p-value Adjusted p-value1. Mental disorder diagnosis 0.914 12. Under vs. over 30 0.634 13. Under vs. over 50 0.091 0.273

Tablep-values of the tests.

Issues

Probably need more observations;

Not clear how to choose threshold for the age limit.

Page 31: Three testing perspective on connectome data - Start-up ...

Tests for the real data

Object-Oriented analysis of network data 25/26

Test p-value Adjusted p-value1. Mental disorder diagnosis 0.914 12. Under vs. over 30 0.634 13. Under vs. over 50 0.091 0.273

Tablep-values of the tests.

Issues

Probably need more observations;

Not clear how to choose threshold for the age limit.

Page 32: Three testing perspective on connectome data - Start-up ...

Concluding remarks

Object-Oriented analysis of network data 26/26

Summary

No assumptions on the data generating process;

Computationally intensive.

Future work

Implement test using Gromov-Wasserstein distance;

Compare to state-of-the-art methods where possible.