Returning Back …

39
Returning Back …

description

Returning Back … . A Big Thanks Again . Prof. Matt Hibbs Jackson Labs. Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University. modes. s.v.’s. modes. x. x. genes. M ≈. “weight”. voxels. spatial pattern. gene pattern. Large-scale Correlation. - PowerPoint PPT Presentation

Transcript of Returning Back …

PowerPoint Presentation

Returning Back

A Big Thanks Again

Prof. Jason BohlandQuantitative Neuroscience LaboratoryBoston University

Prof. Matt HibbsJackson LabsQuality control set of 3041 genesCombine gene volumes into a large matrixDecompose the voxel x gene matrix using singular value decomposition (SVD)

voxelsmodes

xx

modesgeness.v.sM weight

spatial pattern

gene patternLarge-scale Correlation3Product of three orthogonal matrices. If you think of each voxel as a vector in 3041 dimensional space, were looking for directions in that vector space that account for the most variability.Each mode consists of three items 1) spatial pattern voxels that are in some way correlated, 2) a gene pattern --- the weighting of how much each gene is expressing this mode, and 3) a weight how much of the overall variability in the data does this mode account for- A nice property is that the SVD modes are ordered so that each successive mode accounts for as much of the variability remaining in the data as is possible.

N=271 before weget to 90% of the variance

N=67 before we get to 80% of the variance

Principal modes (SVD)

Cerebral cortexOlfactory areasHippocampusRetrohippocampalStriatumPallidumThalamusHypothalamusMidbrainPonsMedullaCerebellumAll LH brain voxels plotted as projections on first 3 modes

4Because of this, we can project the data down to a low number of dimensions, and be sure that thats as good a characterization of the relationships between genes and voxels as we can possibly achieve in a least squares sense.With 3D we have nice visualization, and we can see that clustering emerges. But this only accounts for a small fraction of total variability, so we keep a lot more dimensions around in further analysis.Interpreting gene modesSpatial modes are easily visualized. Attempt to annotate eigenmodes using Gene Ontology (GO) annotations:

Each GO term partitions gene list into two subsets:IN genes: Genes annotated by that GO term OUT genes: Genes not annotated by that GO termEach singular vector associates each subset above with a set of amplitudes

Compare these amplitudes, asking whether IN genes have larger magnitudes than OUT genes use K-S test to test whether the amplitude distributions are different5In this low dimensional spaceCerebellum and striatum separated - GABAergic interneurons and glutamatergic projection neurons in adult mouse forebrain

Other regions are clustered in greatly reduced space, but with considerable overlap

Anatomical regions do not in general correspond directly to individual SVD modes

Clustering of gene expression proles in very low dimensional subspace groups voxels drawn from same brain regionsComponent Annotations

Distinctly high amplitude in the dentate gyrus of the hippocampus.

Enhanced specicity for the cerebellum,

Particularly prominent in the cerebellum and the striatum. Decomposition extracts correlated structure in expression proles that corresponds to anatomical subdivision7ABSOLUTE VALUE MAXIMUM INTENSITY PROJECTION IMAGESOnce again

Gene clustering?

Genes are somewhat less separable - and less categorical

Build gene-gene similarity graph partition, color code each point9K-Means SegmentationWhat does gene expression tell about regional brain organization ?

Use simple cluster analysis.K-means clustering:Dimensionality reduced (to 271) by truncating SVDAssign one of K labels to each voxelAll voxels assigned the same label have more similar expression profiles than voxels with different labelsSimilarity defined by Euclidean distance

Data-driven parcellation of mouse brain anatomy (level of granularity determined by K)

10What does gene expression tell us about the regional organization of the brain?Use simple cluster analysis.K-means clustering results

11Spatially Contiguous Clusters

K=2 clusters separates cerebral cortex hippocampus (gray) from other areas (white)K = 8 cerebellum/striatum clearly segmented, cortex is subdivided into distinct layersK = 16 - thalamus has its own cluster; cortical layers further differentiated, midbrain separated from hindbrainLarge K More anatomical details observed; separation of caudoputamen from the nucleus accumbens; display laminar and areal patterns in cortexClustering in Cerebral Cortex

K = 40(masked)ARAArea masksDivides aud/vis areas from somatosensory areasLaminar clusters broken into distinct groups along anteriorposterior direction (bottom) at border between auditory & somatosensory areasValidation13POSTERIOR auditory and visual areasANTERIOR somatosensory areasRelevant Questions Determine, for a given structure, at what value of K it emerges as its own cluster ?

Relative prioritization of anatomical areas based on expression pattern similarity

Dominant clustering of gene expression along cortical layers consistent with those of Ng et al.

Compare with Reference AtlasReference atlases here are flat parcellations with 12 or 94 regions

Similarity index (S)

ranges from 0-1

Overlap saturating at K > 30

Clusters for large K are subdivisions of those for low K15

Compare with Reference Atlas

K=12Clusters 1, 2, 3, and 4 together the cerebral cortexCluster 11 largely corresponds thalamusCluster 9 is wholly contained in the cerebellumCluster 10 in the striatum.16Classification of Region Membership

Supervised learning using linear discriminant (25% test set, 10-fold cross-validation) 94.5% correct overall17Rather than taking the unsupervised clustering approach, we can ask if, for a given voxel, taken arbitrarily from some location in the brain if we had its expression vector, would we be able to predict what classical neuroanatomical region it came from?~80% for 93 regionsWhat Next ?Size of voxels large relative to individual cell bodies

Voxels will contain a mixture of several cell types.

Unique expression signature for discrete brain locations with different combinations of cell types.

Spatial co-expression indicator of functionally-related or interacting genesLocalization of expression

Normalized Expression EnergyVoxels

Non-localized expression pattern

Well-localized expression pattern

Kullback-Leibler (KL) divergence from (spatial) uniformity19A particularly interesting set of genes may be those that are highly localized to specific structures (intuitively ,may have more focal impact)

summed

thresholded

Gene Localization

Select most localized genes (KL > ~1.56) to further analyze

Threshold voxels based on intensity histogram of summed expressions

Remaining LH mask (6102 voxels) essentially excludes cerebral cortex20

Voxel Uniformity in Gene SpaceMeasure KL divergence from uniform density across gene space at each voxel

Brighter color indicates lower KL divergence (more uniform expression across genes)

Note cortex is generally more uniform than subcortical areas

And middle cortical layers are notably more uniform than superficial and deepest layers 21Expression diversity

Expression diversity across gross structuresExpression diversity across cortical layers and areasAverage KL divergence across all voxels in a particular anatomical region22Construct a bipartite graph with N (200) genes in vertex set V1 and M (~6000) mask voxels in V2Edges are expression levels of each gene at each voxel

Apply graph partitioning methods to cut graph into connected componentsComponents contain both voxels and genesHere we used the isoperimetric algorithm (Grady and Schwartz, 2006).GENESVOXELSBiclustering Genes & VoxelsV1V2Can we group genes that are each highly localized to common brain regions (sets of voxels)?23What is Biclustering ?Finding submatrices in an n x m matrix that follow a desired pattern*Row/column order need not be consistent between different biclusters.

Bicluster properties

For any submatrix CIJ where I and J are a subsets of genes and conditions, the mean squared residude score is A bicluster is a submatrix CIJ that has a low mean squared residue score.Biclustering of Expression data: Cheng and Church, RECOMB 2001Cheng and ChurchGreedy ApproachFinds a submatrix that minimizes MSR

Biclusters (a) and (b) fits the definition of MSR

30Biclustering Localized Genes

40 genes29 genesResulting voxel clusters correspond well to individual anatomical regions, w/ functionally relevant gene lists

97% of energy in the cerebellumHighly localized to ventricle system

31Biclustering Localized Genes

30 genes11 genes

Results shown are for 13 biclusters

69% of energy in dentate gyrus, 20% Ammons horn99% of energy in thalamus32Cell-type expression modelHypothesis: do genes emerging from these biclusters represent preferential markers of cell types localized to the corresponding regions?

Cell-type specific microarray data are available (Okaty et al., 2009; 2011) to help answer this question

Compare microarray profiles of these cell types with voxel-based transcriptomic data from ABA2131 overlapping genes (with high quality ABA data)

Sacha Nelsons group at Brandeis33Cell-type based expression Spatial patterns reflect organization within brain regions

ABCDGranule cells (B) Purkinje cells (C) Stellate cells (D) mature oligodendrocytes34Biclusters Cell TypesHighly localized genes emerging from bi-clusters (usually) show selective expression in local cell types

CP bi-clusterCb bi-clusterHeritable Disease NetworksOnline Mendelian Inheritance in Man (OMIM)Contains records of genetic basis for ~4000 disordersManually curated 94 unique entities that are of neurological / neuropsychiatric interest and intersect our gene set

For each disorder, calculate the mean expression pattern across orthologs of implicated genes (MGI orthology)Calculate a distance matrix between disorders by computing the pairwise cosine distance between expression profilesCluster disorders using hierarchical cluster analysis

36OMIM Disease Clusters

Complete linkage clustering37On the x-axis are diseases, color codes the cluster that they were assigned at a certain stopping pointBorders of images indicate the average expression profile tied to diseases in the cluster

Lhx1Autism CandidateFor a given gene list, embed expression similarity in 2D space

Ex: ASD candidate genes from Wigler lab (CSHL)(16 genes in high quality coronal data set)

Calculate cosine distance matrix, and apply metric MDS

Provide sub-groupings based on expression locus

Fgd3Cb

MapT

Doc2aCtx

Ptpdc138Next ?

timeTR. . .

-1-0.500.51

ComponentSpatial components0246810121234fMRI