Supplemental Material

http://www.brain-map.org

A Big Thanks

Prof. Jason BohlandQuantitative Neuroscience LaboratoryBoston University

The Process

Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

Allen Reference Atlas

Allen Reference Atlas

• 3D Nissl volume comes from rigid reconstruction

• Each section reoriented to match adjacent images as closely as possible

• A 1.5T low resolution 3D average MRI volume used to ensure reconstruction is realistic

• Reoriented Nissl section down-sampled, converted to grayscale

• Isotropic 25μm grayscale volume.

Anatomy

• 208 large structures and structural groupings extracted

• Projected & smoothed onto 3D atlas volume to for structural annotation

• Additional decomposition of cortex into an intersection of 202 regions and areas

The Process

Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

InSitu Hybridization or ISH

Each gene ISH series is reconstructed from serial sections (200 μm spacing)

Coronal sectionSagittal section

Why ISH ?• Phenotypic properties in cells result of unique combination of expressed gene products

• Gene expression profiles => define cell types.

6 genes on 1 brain

Each gene on 56 sections

2 sections are for Nissl

8 genes on 1 brain

Each gene on 20 Sections.

ISH – Tissue Preparation & Imaging Process• Sectioning

• Staining (Non-isotopic digoxigenine (DIG))• Washing• Imaging

ISH – Probe Preparation

Traditional Approach vs. ISH•Histology

•One gene at a time

• For 20,000 genes need 20000 x (5 or 14) slides ~1year

•DNA microarrays & SAGE - Applied to large brain region

•Cannot differentiate neuronal subtypes

Kamme, F et. al. J. Neurosci (2003)Sugino, K. et. al. Nature Neurosci (2006)

•in situ hybridization measures expression & preserves spatial information for single gene

•Finer resolution – • cellular but not single cell

•Data can be used to analyze

• Gene expression• Gene regulation• CNS function (spatial)• Cellular phenotype (spatial)

ReproducibilityFor multiple genes, inbred mouse strain used

Although different mice used for different genes, expression for under same environmental conditions are reproducible.

Is ISH Reproducible?Primary Source of variation comes from

• Riboprobes• Day-to-day variability• Biological variability in brains• Still with inbred mice, variation between brains is

significant.

Processing

Expression StatisticsReconstruction – 3D

Data accessed by standard coord system – 200^3 μm voxels

Ontology of Allen Reference Atlas used to label individual voxels

Grid Based

Nearest Plane

Registration - Key• Volumes iteratively registered to AB atlas using

affine and locally nonlinear warping• Registration good to ~200 microns

Local deformation field example

3D Annotation

Lower dimensional data volumes

• Analyze binned expression volumes at 200 µm3 resolution ~31,000 image series (mostly single

hemisphere, sagittal series) 4,104 unique genes available from coronally

sectioned brains• Each volume is 67 x 41 x 58 voxels (about 50k

brain voxels) Comparable to fMRI resolution

Data normalization• Background correction & Registration

• Intensity normalization – • Correct background from negative control

• Registration - • Map the image to the reference atlas

• Smoothed Expression Energy Sum of intensities of expressing cells / # of cells in the voxel An average over many cells of diverse types

ISH Signal

(c) Coronal plane in situ hybridization (ISH) image of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing enriched expression in the bed nucleus of the stria terminalis (BST). The box represents a 1-mm2 square.

(d) Enlarged expression mask view of boxed area in c depicting gene expressionlevels color coded by ISH signal intensity (red, higher expression level; green/blue, lower expression level).

Measurements

p is a image pixel in voxel C

|C| is the total number of pixels in C

M(p) - expression segmentation mask 1 (“expressing” pixel) or 0 (“non expressing”

pixel)

I(p) grayscale value of ISH image intensity Gray = 0.3*Red + 0.59*Green + 0.11*Blue.

Per Gene SignatureProx1

Coronal sectionSagittal section

Prox1 volume maximum intensity projections

Raw

ISH

Expr

essi

on

Ener

gy

Expression measures expression density = sum of expressing pixels /

sum of all pixels in division expression intensity = sum of expressing pixel

intensity / sum of expressing pixels expression energy = sum of expressing pixel

intensity / sum of all pixels in division–== density x intensity

Recap - Measurements

MetaData Each voxel can be connected to a node in a

hierarchical brain atlas / ontology, and also to Waxholm space

Raw Nissl sections from the same brain (with 200 μm spacing) can also be obtained

Each gene has specific probe sequence used, various identifiers to link to gene information (we’ve used Entrez ID)

Deriving Insights

Large-scale data analysisHow much structure is present across space and across genes?

How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)?

Is there structure in the patterns of expression of highly localized genes?

What can we learn from the expression patterns of genes implicated in disorders?

see Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience.

Genome-wide Analysis of Expression

70.5% genes expressed in less than 20% cells

Notes Well-established genes for different cells identified

For 12 major brain regions, 100 top genes.

Cell-Specific GenesGene Ontology enrichment analysis usefulOligodendrocyte-enriched genes => myelin production.

Heterogeneity

Functional Compartments Genes with regional expression provides substrates for functional differences

Tools from AGEA

Correlation mode – View navigate 3-D spatial relationship maps

Clusters mode – Explore transcriptome based spatial organization

Gene Finder mode - Search for genes with local regionality

Expression energy for each gene (M=4,376) and for each voxel (N=51,533)

For each voxel find Pearson’s correlation coefficient between seed voxel and

other voxel using expression vectors of length M

Compute 51,533 three-dimensional correlation maps

Web viewer for easy navigation between maps and within each 3-D map

Correlation values as 24-bit false color using a blue-to-red (“jet”) color scale

Spatial Transcriptome

Clusters of Correlated Gene Expression

Classical definition of brain regionsOverall MorphologyCellular CytoarchitectureOntological DevelopmentFunctional Connectivity

Hierarchical clustering – Voxels are spatially organized as a binary tree Each node is collection of voxels and has 0 or 2

branches Initially 51,533 voxels assigned to root node of

the tree.

Final tree has103,065 nodes with a maximum depth of 53 levels and 51,533 leaf nodes (one for each voxel in the brain).

At each bifurcation an ordering is assigned to each child to enable the definition a global “depth first” ordering for all leaf nodes.

Clusters of Correlated Gene Expression

46

Clustering Analysis

Hierarchical Clustering

Microarray Data Analysis

Unsupervised Analysis – clustering

Supervised Analysis

Visualization & Decomposition

Pattern Analysis

Statistical Analysis

K-means

Hierarchical Clustering

Biclustering

CLICK

Self-Organizing Maps

DBSCAN

OPTICS

DENCLUE

…

Up regulated genes

Down regulated genes

Differentially Regulated Genes

Clusters ?

Clustering Analysis

Group genes that show a similar temporal expression pattern.

Group samples/genes that show a similar expression pattern.

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Inter-cluster distances are maximized

Intra-cluster distances are minimized

Clustering Analysis

Clusters ?

How many clusters?

Four Clusters Two Clusters

Six Clusters

Clustering Algorithms

• K-means and its variants

• Hierarchical clustering

K-means Clustering• Partitional clustering approach • Each cluster is associated with a centroid

(center point) • Each point is assigned to the cluster with the

closest centroid• Number of clusters, K, must be specified• The basic algorithm is very simple

Choosing Initial Centroids

Limitations - Differing Sizes

Original Points K-means (3 Clusters)

Limitations: Differing Density


Limitations: Non-globular Shapes


Hierarchical Clustering • Produces a set of nested clusters organized as a

hierarchical tree• Can be visualized as a dendrogram

– A tree like diagram that records the sequences of merges or splits

Agglomerative Clustering• More popular hierarchical clustering technique• Basic algorithm is straightforward

• Compute the proximity matrix• Let each data point be a cluster• Repeat• Merge the two closest clusters• Update the proximity matrix• Until only a single cluster remains

• Key operation is the computation of the proximity of two clusters• Different approaches to defining the distance

between clusters distinguish the different algorithms

In The Beginning ...Start with clusters of individual points and a proximity matrix p1

p3

p5p4

p2

p1 p2 p3 p4 p5 . . .

.

.

. Proximity Matrix

Intermediate Step After some merging steps, we have some clusters

C1

C4

C2 C5

C3

C2C1

C1

C3

C5

C4

C2

C3 C4 C5

Proximity Matrix

Intermediate StepWe want to merge the two closest clusters (C2 and C5) and update the proximity matrix.

C1

C4

C2 C5

C3

C2C1

C1

C3

C5

C4

C2

C3 C4 C5

Proximity Matrix

After MergingThe question is “How do we update the proximity matrix?”

C1

C4

C2 U C5

C3

? ? ? ?

?

?

?

C2 U C5

C1

C1

C3

C4

C2 U C5

C3 C4

Proximity Matrix

Inter-Cluster Similarity

–

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.

Similarity?

• MIN• MAX• Group Average• Distance Between Centroids Proximity Matrix


–

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN• MAX• Group Average• Distance Between Centroids

–

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix




p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix


× ×

Hierarchical Clustering: Group Average

Nested Clusters Dendrogram

1

2

3

4

5

61

2

5

3

4

Complexity: Time & Space• O(N2) space since it uses the proximity matrix.

– N is the number of points.• O(N3) time in many cases

– There are N steps and at each step the size, N2, proximity matrix must be updated and searched

– Complexity can be reduced to O(N2 log(N) ) time for some approaches

Microarray Data Analysis

Unsupervised Analysis – clustering

Supervised Analysis

Visualization & Decomposition

Pattern Analysis

Statistical Analysis

KNN

Decision tree

Neuro nets

SVM

LDA

Naïve Bayes

…

Finding enriched genes

Seeding with known structure-specificgenes.

Oligodendrocyte (Mbp, Mobp, Cnp1)Choroid-plexus (Col8a2, Lbp, Msx1)

Find the genes with similar expressionpatterns.

Supplemental Material

Documents

Transcript of Supplemental Material