Supplemental Material

77

description

Supplemental Material. http://www.brain-map.org. A Big Thanks. Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University. The Process. Construction and representation of the Anatomic Gene Expression Atlas (AGEA). . Allen Reference Atlas. Allen Reference Atlas. - PowerPoint PPT Presentation

Transcript of Supplemental Material

Page 1: Supplemental Material
Page 2: Supplemental Material

Supplemental Material

Page 3: Supplemental Material

http://www.brain-map.org

Page 4: Supplemental Material
Page 5: Supplemental Material

A Big Thanks

Prof. Jason BohlandQuantitative Neuroscience LaboratoryBoston University

Page 6: Supplemental Material

The Process

Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

Page 7: Supplemental Material

Allen Reference Atlas

Page 8: Supplemental Material

Allen Reference Atlas

• 3D Nissl volume comes from rigid reconstruction

• Each section reoriented to match adjacent images as closely as possible

• A 1.5T low resolution 3D average MRI volume used to ensure reconstruction is realistic

• Reoriented Nissl section down-sampled, converted to grayscale

• Isotropic 25μm grayscale volume.

Page 9: Supplemental Material

Anatomy

• 208 large structures and structural groupings extracted

• Projected & smoothed onto 3D atlas volume to for structural annotation

• Additional decomposition of cortex into an intersection of 202 regions and areas

Page 10: Supplemental Material

The Process

Construction and representation of the Anatomic Gene Expression Atlas (AGEA).

Page 11: Supplemental Material

InSitu Hybridization or ISH

Each gene ISH series is reconstructed from serial sections (200 μm spacing)

Coronal sectionSagittal section

Page 12: Supplemental Material

Why ISH ?• Phenotypic properties in cells result of unique combination of expressed gene products

• Gene expression profiles => define cell types.

Page 13: Supplemental Material

6 genes on 1 brain

Each gene on 56 sections

2 sections are for Nissl

Page 14: Supplemental Material

8 genes on 1 brain

Each gene on 20 Sections.

Page 15: Supplemental Material

ISH – Tissue Preparation & Imaging Process• Sectioning

• Staining (Non-isotopic digoxigenine (DIG))• Washing• Imaging

Page 16: Supplemental Material

ISH – Probe Preparation

Page 17: Supplemental Material

Traditional Approach vs. ISH•Histology

•One gene at a time

• For 20,000 genes need 20000 x (5 or 14) slides ~1year

•DNA microarrays & SAGE - Applied to large brain region

•Cannot differentiate neuronal subtypes

Kamme, F et. al. J. Neurosci (2003)Sugino, K. et. al. Nature Neurosci (2006)

•in situ hybridization measures expression & preserves spatial information for single gene

•Finer resolution – • cellular but not single cell

•Data can be used to analyze

• Gene expression• Gene regulation• CNS function (spatial)• Cellular phenotype (spatial)

Page 18: Supplemental Material

ReproducibilityFor multiple genes, inbred mouse strain used

Although different mice used for different genes, expression for under same environmental conditions are reproducible.

Page 19: Supplemental Material

Is ISH Reproducible?Primary Source of variation comes from

• Riboprobes• Day-to-day variability• Biological variability in brains• Still with inbred mice, variation between brains is

significant.

Page 20: Supplemental Material

Processing

Expression StatisticsReconstruction – 3D

Data accessed by standard coord system – 200^3 μm voxels

Ontology of Allen Reference Atlas used to label individual voxels

Page 21: Supplemental Material

Grid Based

Nearest Plane

Page 22: Supplemental Material

Registration - Key• Volumes iteratively registered to AB atlas using

affine and locally nonlinear warping• Registration good to ~200 microns

Local deformation field example

Page 23: Supplemental Material
Page 24: Supplemental Material

3D Annotation

Page 25: Supplemental Material

Lower dimensional data volumes

• Analyze binned expression volumes at 200 µm3 resolution ~31,000 image series (mostly single

hemisphere, sagittal series) 4,104 unique genes available from coronally

sectioned brains• Each volume is 67 x 41 x 58 voxels (about 50k

brain voxels) Comparable to fMRI resolution

Page 26: Supplemental Material

Data normalization• Background correction & Registration

• Intensity normalization – • Correct background from negative control

• Registration - • Map the image to the reference atlas

• Smoothed Expression Energy Sum of intensities of expressing cells / # of cells in the voxel An average over many cells of diverse types

Page 27: Supplemental Material

ISH Signal

(c) Coronal plane in situ hybridization (ISH) image of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing enriched expression in the bed nucleus of the stria terminalis (BST). The box represents a 1-mm2 square.

(d) Enlarged expression mask view of boxed area in c depicting gene expressionlevels color coded by ISH signal intensity (red, higher expression level; green/blue, lower expression level).

Page 28: Supplemental Material

Measurements

p is a image pixel in voxel C

|C| is the total number of pixels in C

M(p) - expression segmentation mask 1 (“expressing” pixel) or 0 (“non expressing”

pixel)

I(p) grayscale value of ISH image intensity Gray = 0.3*Red + 0.59*Green + 0.11*Blue.

Page 29: Supplemental Material

Per Gene SignatureProx1

Coronal sectionSagittal section

Prox1 volume maximum intensity projections

Raw

ISH

Expr

essi

on

Ener

gy

Page 30: Supplemental Material

Expression measures expression density = sum of expressing pixels /

sum of all pixels in division expression intensity = sum of expressing pixel

intensity / sum of expressing pixels expression energy = sum of expressing pixel

intensity / sum of all pixels in division–== density x intensity

Recap - Measurements

Page 31: Supplemental Material

MetaData Each voxel can be connected to a node in a

hierarchical brain atlas / ontology, and also to Waxholm space

Raw Nissl sections from the same brain (with 200 μm spacing) can also be obtained

Each gene has specific probe sequence used, various identifiers to link to gene information (we’ve used Entrez ID)

Page 32: Supplemental Material

Deriving Insights

Page 33: Supplemental Material

Large-scale data analysisHow much structure is present across space and across genes?

How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)?

Is there structure in the patterns of expression of highly localized genes?

What can we learn from the expression patterns of genes implicated in disorders?

see Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience.

Page 34: Supplemental Material

Genome-wide Analysis of Expression

70.5% genes expressed in less than 20% cells

Page 35: Supplemental Material

Notes Well-established genes for different cells identified

For 12 major brain regions, 100 top genes.

Page 36: Supplemental Material

Cell-Specific GenesGene Ontology enrichment analysis usefulOligodendrocyte-enriched genes => myelin production.

Page 37: Supplemental Material

Heterogeneity

Page 38: Supplemental Material

Functional Compartments Genes with regional expression provides substrates for functional differences

Page 39: Supplemental Material

Tools from AGEA

Correlation mode – View navigate 3-D spatial relationship maps

Clusters mode – Explore transcriptome based spatial organization

Gene Finder mode - Search for genes with local regionality

Page 40: Supplemental Material
Page 41: Supplemental Material

Expression energy for each gene (M=4,376) and for each voxel (N=51,533)

For each voxel find Pearson’s correlation coefficient between seed voxel and

other voxel using expression vectors of length M

Compute 51,533 three-dimensional correlation maps

Web viewer for easy navigation between maps and within each 3-D map

Correlation values as 24-bit false color using a blue-to-red (“jet”) color scale

Spatial Transcriptome

Page 42: Supplemental Material
Page 43: Supplemental Material

Clusters of Correlated Gene Expression

Classical definition of brain regionsOverall MorphologyCellular CytoarchitectureOntological DevelopmentFunctional Connectivity

Page 44: Supplemental Material
Page 45: Supplemental Material

Hierarchical clustering – Voxels are spatially organized as a binary tree Each node is collection of voxels and has 0 or 2

branches Initially 51,533 voxels assigned to root node of

the tree.

Final tree has103,065 nodes with a maximum depth of 53 levels and 51,533 leaf nodes (one for each voxel in the brain).

At each bifurcation an ordering is assigned to each child to enable the definition a global “depth first” ordering for all leaf nodes.

Clusters of Correlated Gene Expression

Page 46: Supplemental Material

46

Clustering Analysis

Page 47: Supplemental Material

Hierarchical Clustering

Page 48: Supplemental Material

Notes

Page 49: Supplemental Material

Microarray Data Analysis

Unsupervised Analysis – clustering

Supervised Analysis

Visualization & Decomposition

Pattern Analysis

Statistical Analysis

K-means

Hierarchical Clustering

Biclustering

CLICK

Self-Organizing Maps

DBSCAN

OPTICS

DENCLUE

Page 50: Supplemental Material

Up regulated genes

Down regulated genes

Differentially Regulated Genes

Page 51: Supplemental Material

Clusters ?

Page 52: Supplemental Material

Clustering Analysis

Group genes that show a similar temporal expression pattern.

Group samples/genes that show a similar expression pattern.

Page 53: Supplemental Material

Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups

Inter-cluster distances are maximized

Intra-cluster distances are minimized

Clustering Analysis

Page 54: Supplemental Material

Clusters ?

How many clusters?

Four Clusters Two Clusters

Six Clusters

Page 55: Supplemental Material

Clustering Algorithms

• K-means and its variants

• Hierarchical clustering

Page 56: Supplemental Material

K-means Clustering• Partitional clustering approach • Each cluster is associated with a centroid

(center point) • Each point is assigned to the cluster with the

closest centroid• Number of clusters, K, must be specified• The basic algorithm is very simple

Page 57: Supplemental Material

Choosing Initial Centroids

Page 58: Supplemental Material

Limitations - Differing Sizes

Original Points K-means (3 Clusters)

Page 59: Supplemental Material

Limitations: Differing Density

Original Points K-means (3 Clusters)

Page 60: Supplemental Material

Limitations: Non-globular Shapes

Original Points K-means (2 Clusters)

Page 61: Supplemental Material

Hierarchical Clustering • Produces a set of nested clusters organized as a

hierarchical tree• Can be visualized as a dendrogram

– A tree like diagram that records the sequences of merges or splits

Page 62: Supplemental Material

Agglomerative Clustering• More popular hierarchical clustering technique• Basic algorithm is straightforward

• Compute the proximity matrix• Let each data point be a cluster• Repeat• Merge the two closest clusters• Update the proximity matrix• Until only a single cluster remains

• Key operation is the computation of the proximity of two clusters• Different approaches to defining the distance

between clusters distinguish the different algorithms

Page 63: Supplemental Material

In The Beginning ...Start with clusters of individual points and a proximity matrix p1

p3

p5p4

p2

p1 p2 p3 p4 p5 . . .

.

.

. Proximity Matrix

Page 64: Supplemental Material

Intermediate Step After some merging steps, we have some clusters

C1

C4

C2 C5

C3

C2C1

C1

C3

C5

C4

C2

C3 C4 C5

Proximity Matrix

Page 65: Supplemental Material

Intermediate StepWe want to merge the two closest clusters (C2 and C5) and update the proximity matrix.

C1

C4

C2 C5

C3

C2C1

C1

C3

C5

C4

C2

C3 C4 C5

Proximity Matrix

Page 66: Supplemental Material

After MergingThe question is “How do we update the proximity matrix?”

C1

C4

C2 U C5

C3

? ? ? ?

?

?

?

C2 U C5

C1

C1

C3

C4

C2 U C5

C3 C4

Proximity Matrix

Page 67: Supplemental Material

Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.

Similarity?

• MIN• MAX• Group Average• Distance Between Centroids Proximity Matrix

Page 68: Supplemental Material

Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN• MAX• Group Average• Distance Between Centroids

Page 69: Supplemental Material

Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN• MAX• Group Average• Distance Between Centroids

Page 70: Supplemental Material

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN• MAX• Group Average• Distance Between Centroids

Inter-Cluster Similarity

Page 71: Supplemental Material

Inter-Cluster Similarity

p1

p3

p5

p4

p2

p1 p2 p3 p4 p5 . . .

.

.

.Proximity Matrix

• MIN• MAX• Group Average• Distance Between Centroids

× ×

Page 72: Supplemental Material

Hierarchical Clustering: Group Average

Nested Clusters Dendrogram

1

2

3

4

5

61

2

5

3

4

Page 73: Supplemental Material

Complexity: Time & Space• O(N2) space since it uses the proximity matrix.

– N is the number of points.• O(N3) time in many cases

– There are N steps and at each step the size, N2, proximity matrix must be updated and searched

– Complexity can be reduced to O(N2 log(N) ) time for some approaches

Page 74: Supplemental Material

Microarray Data Analysis

Unsupervised Analysis – clustering

Supervised Analysis

Visualization & Decomposition

Pattern Analysis

Statistical Analysis

KNN

Decision tree

Neuro nets

SVM

LDA

Naïve Bayes

Page 75: Supplemental Material

Next

Page 76: Supplemental Material
Page 77: Supplemental Material

Finding enriched genes

Seeding with known structure-specificgenes.

Oligodendrocyte (Mbp, Mobp, Cnp1)Choroid-plexus (Col8a2, Lbp, Msx1)

Find the genes with similar expressionpatterns.