On Heritability of Structural Connectomes

On Heritability of Structural Connectomes

by

Jaewon Chung

A thesis submitted to The Johns Hopkins University

in conformity with the requirements for the degree of

Master of Science in Engineering

Baltimore, Maryland

May, 2019

© 2019 by Jaewon Chung

All rights reserved

Abstract

Recent advancements in technology has allowed collection of enormous

amounts of neuroimaging data to study the functionality of human brains.

In order to study the human brain, connectomes, or brain graphs, can be

derived from neuroimaging data, such as diffusion (dMRI) and functional

magnetic resonance imaging (fMRI). GraSPy, an open-source Python package,

was developed in order to leverage recent advances in statistics and random

graph theory to study populations of connectomes. These algorithms were

then applied to show that the structral connectomes are heritable in humans.

Primary Reader: Joshua T. Vogelstein

ii

Table of Contents

Table of Contents iii

List of Tables v

List of Figures vi

1 GraSPy: Graph Statistics in Python 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Library Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Simulations (Figure 1.1a) . . . . . . . . . . . . . . . . . 2

1.2.2 Preprocessing (Figure 1.1b) . . . . . . . . . . . . . . . . 4

1.2.3 Embedding (Figure 1.1c) . . . . . . . . . . . . . . . . . . 4

1.2.4 Hypothesis Testing (Figure 1.1d) . . . . . . . . . . . . . 4

1.2.5 Clustering (Figure 1.1e) . . . . . . . . . . . . . . . . . . 6

1.2.6 Plotting (Figure 1.1f) . . . . . . . . . . . . . . . . . . . . 6

1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Structural Connectomes are Heritable 10

iii

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.2 Diffusion MRI Acquisition & Processing . . . . . . . . 15

2.4 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.2 Random Dot Product Graphs . . . . . . . . . . . . . . . 16

2.4.3 Adjacency Spectral Embedding . . . . . . . . . . . . . . 17

2.4.4 Choosing the Embedding Dimension . . . . . . . . . . 17

2.4.5 Pass-To-Ranks . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5 Estimation of Heritability . . . . . . . . . . . . . . . . . . . . . 19

2.5.1 Preprocessing Data . . . . . . . . . . . . . . . . . . . . . 19

2.5.2 Models of Heritability and Distance Measures . . . . . 20

2.5.3 Kolomogrov-Smirnov Two-Sample Test . . . . . . . . . 21

iv

List of Tables

2.1 P-values given by the Kolmogorov-Smirnov test on the distribu-

tion of distances as shown in Figure 2.1. The null hypothesis is

that the two distributions of test statistics are the same, and the

alternate hypothesis (H1) is that one distribution is stochasti-

cally larger than the other. The null hypothesis is rejected using

significance level α-0.05 for all alternate hypothesis under all

three models of heritability. . . . . . . . . . . . . . . . . . . . . 14

2.2 Participants and their demographics of HCP1200 Dataset. . . 14

v

List of Figures

1.1 Illustration of submodules and procedure for statistical infer-

ence on population of graphs. A detailed description of each

submodule is given in section 2. . . . . . . . . . . . . . . . . . . 2

1.2 Connectome model fitting and complexity. Larval Drosophila

left mushroom body adjacency matrix (unweighted, directed),

followed by random samples from four different statistical mod-

els of connectomes fit using GraSPy: random dot product graph

(RDPG), degree-corrected stochastic block model (DCSBM),

stochastic block model (SBM), and Erdos-Rényi (ER). The bot-

tom left shows the number of parameters for each, as compared

to the 40,000+ number of parameters (possible edges) for the

inhomogeneous Erdos-Rényi (IER) model in which all potential

edges are specified. Blocks are sorted by size (number of mem-

ber vertices) and nodes are sorted by degree within each block.

The block labels correspond to K) Kenyon cells, P) projection

neurons, O) mushroom body output neurons, I) mushroom

body input neurons (Eichler et al., 2017). . . . . . . . . . . . . 5

vi

2.1 Kernel density estimates (KDE) of pairwise distances of connec-

tomes from three different measures: exact, global scale, and

vertex-wise scale, which represent different models of heritabil-

ity. Each column corresponds to a distance measure, and each

row corresponds to a familial relationship. White vertical line

within each KDE represents the mean. Under all three models,

the distances for monozygotic twins were stochastically smaller

than those of dizygotic twins, sibling and unrelated, while dis-

tances for unrelated pairs were stochastically larger than others.

Distances for dizygotic twins were stochastically smaller than

those of siblings and unrelated, but stochastically larger than

those of monozygotic twins. . . . . . . . . . . . . . . . . . . . 13

vii

Chapter 1

GraSPy: Graph Statistics in Python

1.1 Introduction

Graphs, or networks, are a mathematical representation of data that consists

of discrete objects (nodes or vertices) and relationships between these objects

(edges). For example, if thinking of regions of a human brain as vertices, the

edges can represent how strongly each pair of regions are connected to each

other. Since graphs necessarily deal with relationships between nodes, many

of the classical statistical assumptions about independence are violated. Thus,

specific statistical methodology is required for performing robust statistical in-

ference on graphs and populations of graphs (Athreya et al., 2018). GraSPy fills

this gap by providing implementations of algorithms with strong statistical

guarantees, such as graph and multi-graph embedding methods, two-graph

hypothesis testing, and clustering of vertices of graphs. Many of the algo-

rithms implemented in GraSPy are flexible and can operate on graphs that are

weighted or unweighted, as well as directed or undirected. All subsequent

analysis in this thesis was performed using GraSPy.

1

Graph Populat ions

Preprocessing Embedding

Hypothesis Test ing

Clustering

Visualizat ion

Simulat ions

a

b cd

e

f

Figure 1.1: Illustration of submodules and procedure for statistical inference onpopulation of graphs. A detailed description of each submodule is given in section 2.

1.2 Library Overview

Overview of submodules available in GraSPy is summarized in Figure 1.1. The

library contains functionality for fitting and sampling from random graph

models, performing dimensionality reduction on graphs or populations of

graphs (embedding), testing hypotheses on graphs, and plotting of graphs

and embeddings.

The following provides brief overview of different submodules of GraSPy,

and more detailed overview and code usage can be found in the tutorial sec-

tion of GraSPy documentation at https://graspy.neurodata.io/tutorial.

1.2.1 Simulations (Figure 1.1a)

Three classes of random graph models are implemented in GraSPy: 1) Erdos-

Rényi (ER) model, 2) stochastic block model (SBM), and 3) random dot product

graph (RDPG) model. ER model is the simplest model, in which the model

is parameterized by the number of vertices, n, and either p that specifies a

probability of an edge existing between a pair of vertices or m that specifies

2

https://graspy.neurodata.io/tutorial

the exact number of edges. All nodes have the same probability of connection

to each other under the ER model. Unlike ER models, the SBM produces

graphs containing communities, where vertices in each community share

common probabilities of connection to every other community. The SBM is

parameterized by the number of communities, K, a vector of probabilities of a

node belonging to each community, τ, and a probability matrix, B ∈ [0, 1]K×K,

that specifies the probability of edges within and between communities. An

extension of the SBM, the Degree-corrected SBM (DCSBM) has an added

parameter associated with each node that denotes its promiscuity in the graph,

which is its relative degree among the other nodes in its community. Nodes

still share the same relative probabilities of connection to each community,

but the nodes within a community may have heterogeneous expected degrees.

Finally, the RDPG model assumes that each vertex in the graph is associated

with a latent vector in Rd. The probability of an edge existing between pairs

of vertices is determined by the dot product of the associated latent position

vectors (Young and Scheinerman, 2007). The RDPG is parameterized by an

n by d matrix of these latent positions. GraSPy provides implementations

for sampling from each of these graph models given these parameters, as

well as estimating the parameters of a model from a given graph. GraSPy also

allows for weighting functions and directed graphs when sampling from these

models.

3

1.2.2 Preprocessing (Figure 1.1b)

Various utility functions help the user input real data into GraSPy or check

simple attributes about a graph. Some examples include finding the largest

connected component of a graph, finding the intersection or union of con-

nected components across multiple graphs, transforming the weights of a

graph, or checking whether a graph is directed. These functions speed the

user’s workflow when working with real data that may be messy or noisy

before preprocessing.

1.2.3 Embedding (Figure 1.1c)

Inference on random graphs depends on low-dimensional Euclidean repre-

sentation of the vertices of graphs, known as latent positions, typically given by

spectral decompositions of adjacency or Laplacian matrices (Levin et al., 2017).

Adjacency spectral embedding (ASE) and Laplacian spectral embedding (LSE)

are methods for embedding a single graph, and omnibus embedding allows

for embedding multiple graphs into the same dimensions such that the em-

beddings can be meaningfully compared. In addition, GraSPy allows for the

number of embedding dimensions to be automatically chosen by the algorithm

of Zhu and Ghodsi, 2006.

1.2.4 Hypothesis Testing (Figure 1.1d)

Given two graphs, a natural question to ask is whether these graphs are both

random samples from the same generative distribution. GraSPy provides two

types of test for this null hypothesis: semiparametric and nonparametric. Both

4

Figure 1.2: Connectome model fitting and complexity. Larval Drosophila left mush-room body adjacency matrix (unweighted, directed), followed by random samplesfrom four different statistical models of connectomes fit using GraSPy: random dotproduct graph (RDPG), degree-corrected stochastic block model (DCSBM), stochasticblock model (SBM), and Erdos-Rényi (ER). The bottom left shows the number ofparameters for each, as compared to the 40,000+ number of parameters (possibleedges) for the inhomogeneous Erdos-Rényi (IER) model in which all potential edgesare specified. Blocks are sorted by size (number of member vertices) and nodes aresorted by degree within each block. The block labels correspond to K) Kenyon cells,P) projection neurons, O) mushroom body output neurons, I) mushroom body inputneurons (Eichler et al., 2017).

tests are framed under the RDPG model, where the generative distribution

can be modeled as a set of latent positions. The semiparametric test can only

be performed on two graphs of the same size and with known correspondence

between the vertices of the two graphs (Tang et al., 2017). Nonparametric

testing can be performed on graphs without vertex alignment, or even with

5

different numbers of vertices (Tang et al., 2014). Both tests provide a sta-

tistically principled way of claiming whether two observed graphs are the

same; for example, one can test whether the brain connectivity graphs of

siblings or twins came from the same generative distribution (Chung et al., in

preparation).

1.2.5 Clustering (Figure 1.1e)

GraSPy uses Gaussian mixture models (GMM) and k-means to compute the

grouping structure of vertices after embedding. The number of clusters to

fit for GMM is chosen by Bayesian information criterion (BIC), which is a

penalized likelihood function to evaluate the quality of estimators. Similarly,

the silhouette score is used to choose the number of clusters for k-means. Both

functions sweep over a range of parameters and use the above metrics to

choose clustering parameters in an unsupervised manner.

1.2.6 Plotting (Figure 1.1f)

GraSPy extends seaborn to visualize graphs as adjacency matrices and embed-

ded graphs as paired scatter plots (Waskom et al., 2018). Individual graphs

can be visualized using heatmap function, and multiple graphs can be overlaid

on top of each other using gridplot function. Both adjacency matrix visual-

izations can be sorted by various node metadata. pairplot can visualize high

dimensional data, such as graphs in the embedded space, as a pairwise scatter

plot.

6

1.3 Conclusion

GraSPy is the first open-source Python package to perform robust statisti-

cal analysis on graphs and graph populations. Its compliance with the

scikit-learn API makes it an easy-to-use tool for anyone familiar with

machine learning in Python. In addition, GraSPy is implemented with an

extensible class structure, making it easy to modify and add new algorithms

to the package. As GraSPy continues to grow and add functionality, we believe

it will accelerate statistically-valid discovery in any field of study concerned

with populations of graphs.

7

References

Athreya, Avanti, Donniell E. Fishkind, Minh Tang, Carey E. Priebe, YoungserPark, Joshua T. Vogelstein, Keith Levin, Vince Lyzinski, Yichen Qin, andDaniel L Sussman (2018). “Statistical Inference on Random Dot ProductGraphs: a Survey”. In: Journal of Machine Learning Research 18.226, pp. 1–92.URL: http://jmlr.org/papers/v18/17-448.html.

Young, Stephen J and Edward R Scheinerman (2007). “Random dot productgraph models for social networks”. In: International Workshop on Algorithmsand Models for the Web-Graph. Springer, pp. 138–149.

Levin, Keith, Avanti Athreya, Minh Tang, Vince Lyzinski, and Carey E Priebe(2017). “A central limit theorem for an omnibus embedding of multiplerandom dot product graphs”. In: pp. 964–967.

Zhu, Mu and Ali Ghodsi (2006). “Automatic dimensionality selection fromthe scree plot via the use of profile likelihood”. In: Computational Statistics& Data Analysis 51.2, pp. 918–930.

Eichler, Katharina, Feng Li, Ashok Litwin-Kumar, Youngser Park, IngridAndrade, Casey M Schneider-Mizell, Timo Saumweber, Annina Huser,Claire Eschbach, Bertram Gerber, et al. (2017). “The complete connectomeof a learning and memory centre in an insect brain”. In: Nature 548.7666,p. 175.

Tang, Minh, Avanti Athreya, Daniel L Sussman, Vince Lyzinski, YoungserPark, and Carey E Priebe (2017). “A semiparametric two-sample hypoth-esis testing problem for random graphs”. In: Journal of Computational andGraphical Statistics 26.2, pp. 344–354.

Tang, Minh, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, and Carey E.Priebe (2014). “A nonparametric two-sample hypothesis testing problemfor random dot product graphs”. In: Journal of Computational and GraphicalStatistics, arXiv:1409.2344.

Waskom, Michael, Olga Botvinnik, Drew O’Kane, Paul Hobson, Joel Ost-blom, Saulius Lukauskas, David C Gemperline, Tom Augspurger, Yaroslav

8

http://jmlr.org/papers/v18/17-448.html

Halchenko, John B. Cole, Jordi Warmenhoven, Julian de Ruiter, CameronPye, Stephan Hoyer, Jake Vanderplas, Santi Villalba, Gero Kunter, EricQuintero, Pete Bachant, Marcel Martin, Kyle Meyer, Alistair Miles, YoavRam, Thomas Brunner, Tal Yarkoni, Mike Lee Williams, Constantine Evans,Clark Fitzgerald, Brian, and Adel Qalieh (2018). mwaskom/seaborn: v0.9.0(July 2018). DOI: 10.5281/zenodo.1313201. URL: https://doi.org/10.5281/zenodo.1313201.

9

https://doi.org/10.5281/zenodo.1313201



Chapter 2

Structural Connectomes areHeritable

2.1 Introduction

Understanding the extent to which genes and environment determines hu-

man brain connectivity and structure, or heritability, is of great interest for

improving our understanding of brain function and diseases. To study such

properties, brains are often modelled as connectomes, or brain graphs, by

defining regions of the brain as nodes and the strength of connections between

regions as edges (“Connectal Coding: Discovering the Structures Linking

Cognitive Phenotypes to Individual Histories”). Numerous pipelines have

been developed and applied to diffusion magnetic resonance imaging (dMRI)

to reconstruct white matter fiber-tract trajectories in vivo, which are then used

to derive connectomes (Kiar et al., 2018; Maier-Hein et al., 2017). Graph theo-

retic methods have been applied to such connectomes to examine anatomical

connectivity in healthy subjects, schizophrenia patients, and identical twins

10

(Bullmore and Sporns, 2009; Bohlken et al., 2014; Heuvel et al., 2010; Micheloy-

annis, 2012; Bassett and Bullmore, 2006). This study extends current literature

by applying recent advances in statistics on random graph models to study

the heritability of human structural connectomes (Athreya et al., 2018; Tang

et al., 2017).

Prior work on structural connectomes utilize graph features, such as clus-

tering coefficients, small worldness, average path length, betweenness central-

ity, modularity, motifs, etc (Bullmore and Sporns, 2009; Bohlken et al., 2014;

Heuvel et al., 2010; Micheloyannis, 2012; Bassett and Bullmore, 2006). Some of

these features have simple intuitions, such as small-world topology, which de-

scribes that brain networks tend to form local clusters with dense connections

within a cluster but sparse connections between clusters (Bassett and Bullmore,

2006; Bullmore and Sporns, 2009). However, using these features to explain

differences in connectomes is difficult to interpret since various connectomes

can generate the same feature value (“Connectal Coding: Discovering the

Structures Linking Cognitive Phenotypes to Individual Histories”). This lack

of interpretability of the results also makes translating the results into practice

and new research directions difficult.

In this work, we present a statistically principled procedure for studying

a population of connectomes, and examine the heritability of structural con-

nectomes. Using a random graph model, called random dot product graph

(RDPG), each region of interest (ROI) represents a node and has an associ-

ated set of latent variables that can capture the genetic and environmental

influences on its connectivity to other ROIs (Athreya et al., 2018). The latent

11

variables for connectomes are estimated via spectral decomposition, and eu-

clidean distances between latent variables are computed for all monozygotic,

dizygotic, sibling, and unrelated pairs to obtain distributions of distances

(Tang et al., 2017). The differences in distributions are then validated via

two-sample Kolomgrov-Smirnov tests. We demonstratively show that the

human structural connectomes are highly heritable and that the differences in

brain structure is determined by the differences in the genome.

2.2 Results

Figure 2.1 presents the kernel density estimates (KDEs) from the three distance

measures for all pairs of monozygotic twins, dizygotic twins, siblings and

unrelated individuals. The three distance measures, denoted exact, global

scale, and vertex-wise scale, represent three different models of heritability.

Under all three models of heritability, there is a stochastic ordering of monozy-

gotic, dizygotic, siblings, and unrelated from smallest to largest. The ordering

suggests similarity between two structural connectomes are highly related to

genetic similarity between the individuals. However, the ordering of dizy-

gotic and sibling suggests additional environmental factors or age effects in

structural connectivity patterns.

To formally validate the ordering, the two-sample Kolmogrov-Smirnoff

(KS) test of distributions was employed. Alternate hypothesis were formed

for all possible six pairs of distributions as explained in 2.5.2. Table 2.1 present

the results from KS test. Significance levels are marked with * (p < .05), ** (p <

.01), and *** (p < .001).

12

Mon

ozyg

otic

Exact Global Scale Vertex-wise ScaleDizyg

otic

Sibling

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5Distance

Unrelated

1.0 1.5 2.0 2.5 3.0 3.5 4.0Distance

1 2 3 4 5 6 7Distance

Figure 2.1: Kernel density estimates (KDE) of pairwise distances of connectomes fromthree different measures: exact, global scale, and vertex-wise scale, which representdifferent models of heritability. Each column corresponds to a distance measure, andeach row corresponds to a familial relationship. White vertical line within each KDErepresents the mean. Under all three models, the distances for monozygotic twinswere stochastically smaller than those of dizygotic twins, sibling and unrelated, whiledistances for unrelated pairs were stochastically larger than others. Distances fordizygotic twins were stochastically smaller than those of siblings and unrelated, butstochastically larger than those of monozygotic twins.

The null hypothesis is rejected for all six alternative hypothesis under the

three models of heritability. Thus, the KS tests under the different alternate hy-

pothesis validate the stochastic ordering of familial relationships as suggested

in Figure 2.1. In summary, higher genetic similarity leads to more similar

connectomes, and these differences are statistically significantly different.

2.3 Data Acquisition

2.3.1 Participants

We used publicly available diffusion MRI (dMRI) and structural MRI (sMRI)

data from the S1200 (2017) release of the Human Connectome Project (HCP)

13

Table 2.1: P-values given by the Kolmogorov-Smirnov test on the distribution ofdistances as shown in Figure 2.1. The null hypothesis is that the two distributions oftest statistics are the same, and the alternate hypothesis (H1) is that one distribution isstochastically larger than the other. The null hypothesis is rejected using significancelevel α-0.05 for all alternate hypothesis under all three models of heritability.

Alternate Hypothesis ModelsExact Global Scale Vertex-wise Scale

Monozygotic < Dizygotic 1.78E-08 *** 1.78E-08 *** 2.23E-04 ***Monozygotic < Sibling 1.46E-16 *** 6.26E-17 *** 2.33E-12 ***Monozygotic < Unrelated 9.28E-34 *** 1.33E-32 *** 8.39E-20 ***Dizygotic < Sibling 1.23E-02 * 1.29E-02 * 8.23E-03 **Dizygotic < Unrelated 2.14E-10 *** 3.84E-10 *** 1.42E-08 ***Sibling < Unrelated 9.77E-19 *** 5.65E-19 *** 4.16E-09 ***

Young Adult study, acquired by the Washington University in St. Louis

(WUSTL) and the University of Minnesota (Minn) Van Essen et al., 2013; Van

Essen et al., 2012. Out of the 1206 participants released, 985 had viable dMRI

for processing. Demographics of the 985 participants are described in Table

2.2. All data collection procedures were approved by the institutional review

boards at WUSTL and Minn.

Table 2.2: Participants and their demographics of HCP1200 Dataset.

Zygosity Monozygotic Dizygotic Non-twin siblings

N 250 259 476Sex 167 F, 83 M 140 F, 119 M 223 F, 248 MAge (mean) 29.6 (3.3) 28.9 (3.4) 28.3 (3.9)Age (range) 22-36 22-36 22-37

14

2.3.2 Diffusion MRI Acquisition & Processing

Using dMRI and sMRI, graphs, or connectomes, were estimated using the

ndmg (Kiar et al., 2018) pipeline. The dMRI scans were pre-processed for

eddy currents using FSL’s eddy-correct (Smith, 2004). FSL’s "standard" linear

registration pipeline was used to register the sMRI and dMRI images to

the MNI152 atlas (Smith, 2004; Woolrich, 2009; Jenkinson, 2012; Mazziotta,

2001). A tensor model is fit using DiPy (Garyfallidis et al., 2014) to obtain

an estimated tensor at each voxel. A deterministic tractography algorithm is

applied using DiPy’s EuDX (Garyfallidis et al., 2014; Garyfallidis et al., 2012)

to obtain streamlines, which indicate the voxels connected by an axonal fiber

tract. Graphs are formed by contracting voxels into graph vertices depending

on spatial similarity (Mhembere et al., 2013). In this study, a modified version

of Desikan-Killiany-Tourville (DKT) parcellation (Klein and Tourville, 2012)

was used to define the ROIs. Given a parcellation with vertices V and a

corresponding mapping P(vi) indicating the voxels within a region i, we

contract our fiber streamlines as follows. w(vi, vj) = ∑u∈P(vi) ∑w∈P(vj)I {Fu,w}

where Fu,w is true if a fiber tract exists between voxels u and w, and false if

there is no fiber tract between voxels u and w.

2.4 Preliminaries

2.4.1 Graph

A graph, or network, G, is defined as an ordered set of vertices and edges

(V, E) where V is the vertex set, and E, the set of edges, is a subset of the

15

Cartesian product of V × V. A vertex set is represented as V = {1, 2, . . . , n}

where |V| = n, and an edge exists between i and j if (i, j) ∈ E. A network can

also be represented by its adjacency matrix A ∈ Rn×n where A(i,j) represents

the value of the edge between i and j. Connectomes estimated from dMRIs are

graphs in which nodes are regions of interest (ROI) and edges are the number

of fiber tracts between a pair of ROIs.

2.4.2 Random Dot Product Graphs

Motivated by recent statistical results with strong theoretical guarantees from

(Athreya et al., 2018; Sussman et al., 2012), we choose the random dot product

graph (RDPG) as our model for connectomes (Young and Scheinerman, 2007).

In this model, the each vertex is a region of the brain, and connections between

a pair of regions of the brain are dictated by unobserved latent positions, which

serve as our estimate of effects of genome and environment on connectomes.

Formally, a random dot product graph is defined as follows:

Definition 1. (Random dot product graphs (RDPG)). Let F be a distribution on Rd

such that F satisfies the inner product condition, and for any two elements u, v ∈ F,

uTv ∈ [0, 1]. Let X1, X2, . . . , Xniid∼ F, and X = [X1, X2, . . . , Xn]T ∈ Rn×d whose

rows are in F. Suppose A is a random adjacency matrix given by

P[A |X] = ∏i<j

(XTi Xj)

Aij(XTi Xj)

1−Aij (2.1)

We then write a random dot product graph (RDPG) as (A, X) ∼ RDPG(X), and say

that A is the adjacency matrix of a random dot product graph with latent position X

of rank at most d.

16

We further define the matrix P = (pij) of edge probabilities by P = X XT.

We will also write A ∼ Bernoulli(P) to denote the existence between any two

vertices i and j, where i > j, is a Bernoulli random variable with probability

pij in which the edges are independent. We emphasize that we only consider

undirected graphs with no self-loops with non-negative edge weights.

We note that this model has an inherent non-identifiability. Given latent

position matrix X ∈ Rn×d and an unitary matrix W ∈ Rn×n, the matrix

P = X XT = (X W)(WT XT) are equivalent.

2.4.3 Adjacency Spectral Embedding

Given an adjacency matrix A ∼ RDPG(X, n), a natural task is to recover the

latent positions X that gave rise to A. Adjacency spectral embedding provides

consistent estimates of latent positions (Sussman et al., 2012), and is defined

as follows.

Definition 2. The adjacency spectral embedding (ASE) of A into Rd is given by

X = Ud S1/2d where A = U S UT given by singular value decomposition (SVD) and

chooses the top d singular values and their associated singular vectors.

2.4.4 Choosing the Embedding Dimension

We emphasize that the true dimensionality of latent positions is unknown

and we must choose the embedding dimension. A common methodology for

choosing the number of embedding dimensions in SVD is to visually examine

the scree plot and choose an elbow that separates the top signal dimensions

17

and noise dimensions. In this work, we consider the method of Zhu and

Ghodsi, 2006. Given A = A = U S UT, the singular values S are used to

choose the embedding dimension d via

d = argmaxd

Pro f ileLikelihoodS(d) (2.2)

where Pro f ileLikelihoodS(d) provides the magnitude of the gap after first

d singular values.

2.4.5 Pass-To-Ranks

Since the connectomes are weighted, we describe the pass-to-ranks method

for undirected graphs, which normalizes the edge weights such that Aij ∈

[0, 1] ∀ i, j ∈ {1, 2, . . . , n}.

Definition 3. Given A ∈ Rn×n, let R(Aij) be the “rank” of Aij, that is, R(Aij) = k

if Aij is the kth smallest number above the main diagonal of A. The pass-to-ranks

(PTR) matrix, A, is defined as follows:

Aij =

{R(Aij)e if Aij > 0 ∀ i < j.

0 otherwise.(2.3)

where e = |E| is the number of edges. Ties are broken by averaging the ranks.

Since we only consider undirected graphs, Aij = Aji ∀ i, j ∈ {1, 2, . . . , n}.

We pass-to-ranks all graphs prior to ASE because such representation of

connectomes has been shown to be more reliable than raw values (Kiar et al.,

2018).

18

2.5 Estimation of Heritability

We compared pairs of individuals from the same family, which is defined

as having a shared mother, father or both. Within a family, a pair of indi-

viduals can be monozygotic twins, dizygotic twins, and non-twin siblings.

Furthermore, we examined random pairs of unrelated individuals, which are

defined as not sharing a mother, father, or both, to serve as a control group.

In order to control for the potential effect of age differences when comparing

non-twin siblings, we sample unrelated pairs such that the distribution of age

differences are the same for both non-twin siblings and twins.

2.5.1 Preprocessing Data

Connectomes derived from dMRI can vary in number of edges due to false

positive edges. In order to mitigate the potential effects of different number

of edges in estimating heritability, smallest valued edges were removed from

each graph such that all graphs have the same number of edges, which is the

minimum number of edges across all available graphs. Given m number of

graphs, {A(i)}mi=1, let ei be the number of edges for graph A(i). The minimum

number of edges across all graphs is defined as emin = min{e1, e2, . . . , em}. For

each graph A(i), the smallest valued edges were thresholded such that each

graph have emin number of edges. After thresholding the graphs, each graphs

were passed-to-rank prior to ASE.

19

2.5.2 Models of Heritability and Distance Measures

Given a pair (A1 ∼ RDPG(X), A2 ∼ RDPG(Y)) of graphs on the same vertex

set with known correspondence, that is there is a bijective map ϕ such that

there is a one-to-one correspondence of vertices between the two graphs, Tang

et al., 2017 provides three different measures of equality, or distance, of X and

Y. We consider the following three cases:

Exact: H0 : X = W Y vs H1 : X = W Y (2.4)

Global Scale: H0 : X = c W Y vs H1 : X = c W Y (2.5)

Vertex-wise Scale: H0 : X = D W Y vs H1 : X = D W Y (2.6)

where W ∈ Rd×d is an orthogonal matrix such that X = W Y, c ∈ R is a scalar,

and D ∈ Rd×d is a diagonal matrix. Let X, Y ∈ Rn×d be the estimates of latent

positions obtained from ASE of A1 and A2, respectively. The distances are

given by:

TExact(X, Y) = minW

∥X − W Y∥F (2.7)

TGlobal Scale(X, Y) = minc,W

∥X − c W Y∥F (2.8)

TVertex-wise Scale(X, Y) = minD,W

∥X − D W Y∥F (2.9)

These three distance measures form the three models of heritability. In-

tuitively, the distance tells us how close two graphs are to each other up to

some transformation. We obtain a distribution of distances for monozygotic,

20

dizygotic twins, non-twin siblings, and unrelated pairs under each of the

hertability model.

2.5.3 Kolomogrov-Smirnov Two-Sample Test

Once the distance distributions are obtained for monozygotic, dizygotic, sib-

lings and unrelated pairs, we employ the Kolmogrov-Smirnov (KS) two-

sample test to examine whether one distributions is statistically different from

another distribution. The null hypothesis state that two distributions are

sampled from the same underlying distribution. The test statistic, D, is given

by

D = supx|F1(x)− F2(x)| (2.10)

where F1 and F2 are two empirical distribution functions. We formulate the

following six alternative hypothesis and provide the intuition for the choice

of alternative.

1. Monozygotic < Dizygotic - Since monozygotic twins have identical

genetics, their brain structure should be more similar than those of

dizygotic twins.

2. Monozygotic < Sibling - Similarly, monozygotic twins should have

more similar brain structure than those of siblings.

3. Monozygotic < Unrelated - Since unrelated pairs should have no genetic

similarity, brain structure of unrelated pairs should be more dissimilar

than those of monozygotic twins.

21

4. Dizygotic < Sibling - Dizygotic twins and siblings should have similar

genetic variability within a family. However, there may be additional

effects from differences in age or environment that account for larger

differences in siblings.

5. Dizygotic < Unrelated - Same as 4.

6. Sibling < Unrelated - Same as 4.

22

References

Vogelstein, Joshua, Eric Bridgeford, Benjamin Pedigo, Jaewon Chung, KeithLevin, Brett Mensh, and Carey Priebe. “Connectal Coding: Discoveringthe Structures Linking Cognitive Phenotypes to Individual Histories”. In:Current opinion in neurobiology.

Kiar, Gregory, Eric Bridgeford, Will Gray Roncal, Vikram Chandrashekhar,Disa Mhembere, Sephira Ryman, Xi-Nian Zuo, Daniel S Marguiles, RCameron Craddock, Carey E Priebe, Rex Jung, Vince Calhoun, BrianCaffo, Randal Burns, Michael P Milham, and Joshua Vogelstein (2018).“A High-Throughput Pipeline Identifies Robust Connectomes But Trou-blesome Variability”. In: bioRxiv. DOI: 10.1101/188706. eprint: https://www.biorxiv.org/content/early/2018/04/24/188706.full.pdf. URL:https://www.biorxiv.org/content/early/2018/04/24/188706.

Maier-Hein, Klaus H, Peter F Neher, Jean-Christophe Houde, Marc-AlexandreCôté, Eleftherios Garyfallidis, Jidan Zhong, Maxime Chamberland, Fang-Cheng Yeh, Ying-Chia Lin, Qing Ji, et al. (2017). “The challenge of mappingthe human connectome based on diffusion tractography”. In: Nature com-munications 8.1, p. 1349.

Bullmore, Ed and Olaf Sporns (2009). “Complex brain networks: graph theo-retical analysis of structural and functional systems”. In: Nature ReviewsNeuroscience 10.3, 186âAS198. ISSN: 1471-0048. DOI: 10.1038/nrn2575.

Bohlken, Marc M., RenÃl’ C. W. Mandl, Rachel M. Brouwer, Martijn P. van denHeuvel, Anna M. Hedman, RenÃl’ S. Kahn, and Hilleke E. Hulshoff Pol(2014). “Heritability of structural brain network topology: A DTI study of156 twins”. In: Human Brain Mapping 35.10, 5295âAS5305. ISSN: 1097-0193.DOI: 10.1002/hbm.22550.

Heuvel, Martijn P. van den, RenÃl’ C. W. Mandl, Cornelis J. Stam, RenÃl’ S.Kahn, and Hilleke E. Hulshoff Pol (2010). “Aberrant Frontal and Tempo-ral Complex Network Structure in Schizophrenia: A Graph Theoretical

23

https://doi.org/10.1101/188706

https://www.biorxiv.org/content/early/2018/04/24/188706.full.pdf

https://www.biorxiv.org/content/early/2018/04/24/188706.full.pdf

https://www.biorxiv.org/content/early/2018/04/24/188706

https://doi.org/10.1038/nrn2575

https://doi.org/10.1002/hbm.22550

Analysis”. In: Journal of Neuroscience 30.47, 15915âAS15926. ISSN: 0270-6474,1529-2401. DOI: 10.1523/JNEUROSCI.2874-10.2010.

Micheloyannis, Sifis (2012). “Graph-based network analysis in schizophrenia”.In: World Journal of Psychiatry 2.1, 1âAS12. ISSN: 2220-3206. DOI: 10.5498/wjp.v2.i1.1.

Bassett, Danielle Smith and Ed Bullmore (2006). “Small-World Brain Net-works”. In: The Neuroscientist 12.6, 512âAS523. ISSN: 1073-8584. DOI: 10.1177/1073858406293182.

Athreya, Avanti, Donniell E. Fishkind, Minh Tang, Carey E. Priebe, YoungserPark, Joshua T. Vogelstein, Keith Levin, Vince Lyzinski, Yichen Qin, andDaniel L Sussman (2018). “Statistical Inference on Random Dot ProductGraphs: a Survey”. In: Journal of Machine Learning Research 18.226, pp. 1–92.URL: http://jmlr.org/papers/v18/17-448.html.

Tang, Minh, Avanti Athreya, Daniel L. Sussman, Vince Lyzinski, YoungserPark, and Carey E. Priebe (2017). “A Semiparametric Two-Sample Hypoth-esis Testing Problem for Random Graphs”. In: Journal of Computationaland Graphical Statistics 26.2, pp. 344–354. DOI: 10.1080/10618600.2016.1193505. eprint: https://doi.org/10.1080/10618600.2016.1193505.URL: https://doi.org/10.1080/10618600.2016.1193505.

Van Essen, David C, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens,Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. (2013). “TheWU-Minn human connectome project: an overview”. In: Neuroimage 80,pp. 62–79.

Van Essen, David C, Kamil Ugurbil, E Auerbach, D Barch, TEJ Behrens, RBucholz, Acer Chang, Liyong Chen, Maurizio Corbetta, Sandra W Cur-tiss, et al. (2012). “The Human Connectome Project: a data acquisitionperspective”. In: Neuroimage 62.4, pp. 2222–2231.

Smith, Stephen M et al. (2004). “Advances in functional and structural MRimage analysis and implementation as FSL.” In: NeuroImage 23 Suppl 1,S208–19. ISSN: 1053-8119. URL: http://www.ncbi.nlm.nih.gov/pubmed/15501092.

Woolrich, Mark W et al. (2009). “Bayesian analysis of neuroimaging datain FSL.” In: NeuroImage 45.1 Suppl, S173–86. ISSN: 1095-9572. URL: http://www.sciencedirect.com/science/article/pii/S1053811908012044.

Jenkinson, Mark et al. (2012). “FSL.” In: NeuroImage 62.2, pp. 782–90. ISSN:1095-9572. URL: http://www.ncbi.nlm.nih.gov/pubmed/21979382.

24

https://doi.org/10.1523/JNEUROSCI.2874-10.2010

https://doi.org/10.5498/wjp.v2.i1.1

https://doi.org/10.5498/wjp.v2.i1.1

https://doi.org/10.1177/1073858406293182

https://doi.org/10.1177/1073858406293182

http://jmlr.org/papers/v18/17-448.html

https://doi.org/10.1080/10618600.2016.1193505

https://doi.org/10.1080/10618600.2016.1193505

https://doi.org/10.1080/10618600.2016.1193505

https://doi.org/10.1080/10618600.2016.1193505

http://www.ncbi.nlm.nih.gov/pubmed/15501092


http://www.sciencedirect.com/science/article/pii/S1053811908012044

http://www.sciencedirect.com/science/article/pii/S1053811908012044


Mazziotta, John et al. (2001). “A four-dimensional probabilistic atlas of thehuman brain”. In: Journal of the American Medical Informatics Association 8.5,pp. 401–430.

Garyfallidis, Eleftherios, Matthew Brett, Bagrat Amirbekian, Ariel Rokem,Stefan Van Der Walt, Maxime Descoteaux, and Ian Nimmo-Smith (2014).“Dipy, a library for the analysis of diffusion MRI data”. In: Frontiers inneuroinformatics 8, p. 8.

Garyfallidis, Eleftherios, Matthew Brett, Marta Correia, Guy Williams, andIan Nimmo-Smith (2012). “QuickBundles, a Method for TractographySimplification”. In: Frontiers in Neuroscience 6, p. 175.

Mhembere, Disa, William Gray Roncal, Daniel Sussman, Carey E Priebe, RexJung, Sephira Ryman, R Jacob Vogelstein, Joshua T Vogelstein, and RandalBurns (2013). “Computing scalable multivariate glocal invariants of large(brain-) graphs”. In: Global Conference on Signal and Information Processing(GlobalSIP), 2013 IEEE. IEEE, pp. 297–300.

Klein, Arno and Jason Tourville (2012). “101 Labeled Brain Images and aConsistent Human Cortical Labeling Protocol”. In: Frontiers in Neuroscience6, p. 171. ISSN: 1662-453X. DOI: 10.3389/fnins.2012.00171. URL: https://www.frontiersin.org/article/10.3389/fnins.2012.00171.

Sussman, Daniel L, Minh Tang, Donniell E Fishkind, and Carey E Priebe (2012).“A consistent adjacency spectral embedding for stochastic blockmodelgraphs”. In: Journal of the American Statistical Association 107.499, pp. 1119–1128.

Young, Stephen J. and Edward R. Scheinerman (2007). “Random Dot ProductGraph Models for Social Networks”. In: Lecture Notes in Computer Science.Ed. by Anthony Bonato and Fan R. K.Editors Chung, 138âAS149.

Zhu, Mu and Ali Ghodsi (2006). “Automatic dimensionality selection fromthe scree plot via the use of profile likelihood”. In: Computational Statistics& Data Analysis 51.2, pp. 918–930.

25

https://doi.org/10.3389/fnins.2012.00171

https://www.frontiersin.org/article/10.3389/fnins.2012.00171

https://www.frontiersin.org/article/10.3389/fnins.2012.00171

Curriculum Vitae

Jaewon Chung was born on July 8th in Seoul, South Korea. He graduated from

Wesleyan University with a degree in neuroscience & behavior and economics.

Since then, he worked at various hospitals in New York. He first worked as

clinical researcher then as a data analyst for the NYU Langone Hospital. In

2017, Jaewon started his Master’s Degree in biomedical engineering at Johns

Hopkins University where he studied various topics, such as fluorescent

microscopy image segmentation and studying populations of human brains.

In 2019, Jaewon will continue his studies at Johns Hopkins University as a

PhD student in biomedical engineering.

26

On Heritability of Structural Connectomes

Documents

Transcript of On Heritability of Structural Connectomes