Supplemental Material Supplemental Experimental Procedures ...
Supplemental Material
description
Transcript of Supplemental Material
Supplemental Material
http://www.brain-map.org
A Big Thanks
Prof. Jason BohlandQuantitative Neuroscience LaboratoryBoston University
The Process
Construction and representation of the Anatomic Gene Expression Atlas (AGEA).
Allen Reference Atlas
Allen Reference Atlas
• 3D Nissl volume comes from rigid reconstruction
• Each section reoriented to match adjacent images as closely as possible
• A 1.5T low resolution 3D average MRI volume used to ensure reconstruction is realistic
• Reoriented Nissl section down-sampled, converted to grayscale
• Isotropic 25μm grayscale volume.
Anatomy
• 208 large structures and structural groupings extracted
• Projected & smoothed onto 3D atlas volume to for structural annotation
• Additional decomposition of cortex into an intersection of 202 regions and areas
The Process
Construction and representation of the Anatomic Gene Expression Atlas (AGEA).
InSitu Hybridization or ISH
Each gene ISH series is reconstructed from serial sections (200 μm spacing)
Coronal sectionSagittal section
Why ISH ?• Phenotypic properties in cells result of unique combination of expressed gene products
• Gene expression profiles => define cell types.
6 genes on 1 brain
Each gene on 56 sections
2 sections are for Nissl
8 genes on 1 brain
Each gene on 20 Sections.
ISH – Tissue Preparation & Imaging Process• Sectioning
• Staining (Non-isotopic digoxigenine (DIG))• Washing• Imaging
ISH – Probe Preparation
Traditional Approach vs. ISH•Histology
•One gene at a time
• For 20,000 genes need 20000 x (5 or 14) slides ~1year
•DNA microarrays & SAGE - Applied to large brain region
•Cannot differentiate neuronal subtypes
Kamme, F et. al. J. Neurosci (2003)Sugino, K. et. al. Nature Neurosci (2006)
•in situ hybridization measures expression & preserves spatial information for single gene
•Finer resolution – • cellular but not single cell
•Data can be used to analyze
• Gene expression• Gene regulation• CNS function (spatial)• Cellular phenotype (spatial)
ReproducibilityFor multiple genes, inbred mouse strain used
Although different mice used for different genes, expression for under same environmental conditions are reproducible.
Is ISH Reproducible?Primary Source of variation comes from
• Riboprobes• Day-to-day variability• Biological variability in brains• Still with inbred mice, variation between brains is
significant.
Processing
Expression StatisticsReconstruction – 3D
Data accessed by standard coord system – 200^3 μm voxels
Ontology of Allen Reference Atlas used to label individual voxels
Grid Based
Nearest Plane
Registration - Key• Volumes iteratively registered to AB atlas using
affine and locally nonlinear warping• Registration good to ~200 microns
Local deformation field example
3D Annotation
Lower dimensional data volumes
• Analyze binned expression volumes at 200 µm3 resolution ~31,000 image series (mostly single
hemisphere, sagittal series) 4,104 unique genes available from coronally
sectioned brains• Each volume is 67 x 41 x 58 voxels (about 50k
brain voxels) Comparable to fMRI resolution
Data normalization• Background correction & Registration
• Intensity normalization – • Correct background from negative control
• Registration - • Map the image to the reference atlas
• Smoothed Expression Energy Sum of intensities of expressing cells / # of cells in the voxel An average over many cells of diverse types
ISH Signal
(c) Coronal plane in situ hybridization (ISH) image of gene tachykinin 2 (Tac2) from the Allen Brain Atlas showing enriched expression in the bed nucleus of the stria terminalis (BST). The box represents a 1-mm2 square.
(d) Enlarged expression mask view of boxed area in c depicting gene expressionlevels color coded by ISH signal intensity (red, higher expression level; green/blue, lower expression level).
Measurements
p is a image pixel in voxel C
|C| is the total number of pixels in C
M(p) - expression segmentation mask 1 (“expressing” pixel) or 0 (“non expressing”
pixel)
I(p) grayscale value of ISH image intensity Gray = 0.3*Red + 0.59*Green + 0.11*Blue.
Per Gene SignatureProx1
Coronal sectionSagittal section
Prox1 volume maximum intensity projections
Raw
ISH
Expr
essi
on
Ener
gy
Expression measures expression density = sum of expressing pixels /
sum of all pixels in division expression intensity = sum of expressing pixel
intensity / sum of expressing pixels expression energy = sum of expressing pixel
intensity / sum of all pixels in division–== density x intensity
Recap - Measurements
MetaData Each voxel can be connected to a node in a
hierarchical brain atlas / ontology, and also to Waxholm space
Raw Nissl sections from the same brain (with 200 μm spacing) can also be obtained
Each gene has specific probe sequence used, various identifiers to link to gene information (we’ve used Entrez ID)
Deriving Insights
Large-scale data analysisHow much structure is present across space and across genes?
How would the brain segment on the basis of gene expression patterns (as opposed to Nissl, etc.)?
Is there structure in the patterns of expression of highly localized genes?
What can we learn from the expression patterns of genes implicated in disorders?
see Bohland et al. (2009) Methods; Ng et al. (2009) Nature Neuroscience.
Genome-wide Analysis of Expression
70.5% genes expressed in less than 20% cells
Notes Well-established genes for different cells identified
For 12 major brain regions, 100 top genes.
Cell-Specific GenesGene Ontology enrichment analysis usefulOligodendrocyte-enriched genes => myelin production.
Heterogeneity
Functional Compartments Genes with regional expression provides substrates for functional differences
Tools from AGEA
Correlation mode – View navigate 3-D spatial relationship maps
Clusters mode – Explore transcriptome based spatial organization
Gene Finder mode - Search for genes with local regionality
Expression energy for each gene (M=4,376) and for each voxel (N=51,533)
For each voxel find Pearson’s correlation coefficient between seed voxel and
other voxel using expression vectors of length M
Compute 51,533 three-dimensional correlation maps
Web viewer for easy navigation between maps and within each 3-D map
Correlation values as 24-bit false color using a blue-to-red (“jet”) color scale
Spatial Transcriptome
Clusters of Correlated Gene Expression
Classical definition of brain regionsOverall MorphologyCellular CytoarchitectureOntological DevelopmentFunctional Connectivity
Hierarchical clustering – Voxels are spatially organized as a binary tree Each node is collection of voxels and has 0 or 2
branches Initially 51,533 voxels assigned to root node of
the tree.
Final tree has103,065 nodes with a maximum depth of 53 levels and 51,533 leaf nodes (one for each voxel in the brain).
At each bifurcation an ordering is assigned to each child to enable the definition a global “depth first” ordering for all leaf nodes.
Clusters of Correlated Gene Expression
46
Clustering Analysis
Hierarchical Clustering
Notes
Microarray Data Analysis
Unsupervised Analysis – clustering
Supervised Analysis
Visualization & Decomposition
Pattern Analysis
Statistical Analysis
K-means
Hierarchical Clustering
Biclustering
CLICK
Self-Organizing Maps
DBSCAN
OPTICS
DENCLUE
…
Up regulated genes
Down regulated genes
Differentially Regulated Genes
Clusters ?
Clustering Analysis
Group genes that show a similar temporal expression pattern.
Group samples/genes that show a similar expression pattern.
Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups
Inter-cluster distances are maximized
Intra-cluster distances are minimized
Clustering Analysis
Clusters ?
How many clusters?
Four Clusters Two Clusters
Six Clusters
Clustering Algorithms
• K-means and its variants
• Hierarchical clustering
K-means Clustering• Partitional clustering approach • Each cluster is associated with a centroid
(center point) • Each point is assigned to the cluster with the
closest centroid• Number of clusters, K, must be specified• The basic algorithm is very simple
Choosing Initial Centroids
Limitations - Differing Sizes
Original Points K-means (3 Clusters)
Limitations: Differing Density
Original Points K-means (3 Clusters)
Limitations: Non-globular Shapes
Original Points K-means (2 Clusters)
Hierarchical Clustering • Produces a set of nested clusters organized as a
hierarchical tree• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges or splits
Agglomerative Clustering• More popular hierarchical clustering technique• Basic algorithm is straightforward
• Compute the proximity matrix• Let each data point be a cluster• Repeat• Merge the two closest clusters• Update the proximity matrix• Until only a single cluster remains
• Key operation is the computation of the proximity of two clusters• Different approaches to defining the distance
between clusters distinguish the different algorithms
In The Beginning ...Start with clusters of individual points and a proximity matrix p1
p3
p5p4
p2
p1 p2 p3 p4 p5 . . .
.
.
. Proximity Matrix
Intermediate Step After some merging steps, we have some clusters
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
Intermediate StepWe want to merge the two closest clusters (C2 and C5) and update the proximity matrix.
C1
C4
C2 C5
C3
C2C1
C1
C3
C5
C4
C2
C3 C4 C5
Proximity Matrix
After MergingThe question is “How do we update the proximity matrix?”
C1
C4
C2 U C5
C3
? ? ? ?
?
?
?
C2 U C5
C1
C1
C3
C4
C2 U C5
C3 C4
Proximity Matrix
Inter-Cluster Similarity
–
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.
Similarity?
• MIN• MAX• Group Average• Distance Between Centroids Proximity Matrix
Inter-Cluster Similarity
–
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN• MAX• Group Average• Distance Between Centroids
Inter-Cluster Similarity
–
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN• MAX• Group Average• Distance Between Centroids
–
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN• MAX• Group Average• Distance Between Centroids
Inter-Cluster Similarity
Inter-Cluster Similarity
p1
p3
p5
p4
p2
p1 p2 p3 p4 p5 . . .
.
.
.Proximity Matrix
• MIN• MAX• Group Average• Distance Between Centroids
× ×
Hierarchical Clustering: Group Average
Nested Clusters Dendrogram
1
2
3
4
5
61
2
5
3
4
Complexity: Time & Space• O(N2) space since it uses the proximity matrix.
– N is the number of points.• O(N3) time in many cases
– There are N steps and at each step the size, N2, proximity matrix must be updated and searched
– Complexity can be reduced to O(N2 log(N) ) time for some approaches
Microarray Data Analysis
Unsupervised Analysis – clustering
Supervised Analysis
Visualization & Decomposition
Pattern Analysis
Statistical Analysis
KNN
Decision tree
Neuro nets
SVM
LDA
Naïve Bayes
…
Next
Finding enriched genes
Seeding with known structure-specificgenes.
Oligodendrocyte (Mbp, Mobp, Cnp1)Choroid-plexus (Col8a2, Lbp, Msx1)
Find the genes with similar expressionpatterns.