Visualizing RNA Expression Data John Quackenbush VIZBI 16 March 2011.
Visualizing RNA Expression Data
description
Transcript of Visualizing RNA Expression Data
Visualizing RNA Expression DataVisualizing RNA Expression Data
John QuackenbushJohn QuackenbushVIZBIVIZBI
16 March 201116 March 2011
Northern Blots:Northern Blots:Before the dawn of TimeBefore the dawn of Time
Northern BlotsNorthern Blots
Northern BlotsNorthern Blots
Quantitative RT-PCRQuantitative RT-PCRThe Pre-Modern EraThe Pre-Modern Era
Quantitative PCRQuantitative PCR
Quantitative PCR and other MethodsQuantitative PCR and other Methods
Large-scale Quantitative RT-PCR:Large-scale Quantitative RT-PCR:The Dawn of the Modern AgeThe Dawn of the Modern Age
An Aside: The Birth of ClusteringAn Aside: The Birth of Clustering
Our World Today:Our World Today:A Microarray OverviewA Microarray Overview
History is written by the victors History is written by the victors (or those who produce software): (or those who produce software):
The Birth of ClusteringThe Birth of Clustering
This was also the start of tormenting This was also the start of tormenting the red-green color-blind.the red-green color-blind.
Truth is determined by the person Truth is determined by the person giving the talk:giving the talk:
MeV is the best clustering tool ever!MeV is the best clustering tool ever!
http://www.tm4.orghttp://www.tm4.org
Truth is determined by the person Truth is determined by the person giving the talk:giving the talk:
MeV is the best clustering tool ever!MeV is the best clustering tool ever!
Truth is determined by the person Truth is determined by the person giving the talk:giving the talk:
MeV is the best clustering tool ever!MeV is the best clustering tool ever!
Public Microarray DataPublic Microarray Data ArrayExpressArrayExpress
20,423 20,423 Experiments (Experiments (572,682 hybs/572,682 hybs/arrays)arrays)
GEO GEO 21,320 21,320 Experiments Experiments ((529,108 arrays)529,108 arrays)
CIBEXCIBEX 148 Experiments (2,711 arrays)148 Experiments (2,711 arrays)
SMDSMD 21,52121,521Expts (80,319 incl private data)Expts (80,319 incl private data)
>1,000,000 arrays x >1,000,000 arrays x $500 = $500 = $500,000,000$500,000,000
Cancer Studies account Cancer Studies account for for >14% >14% of all of all studies in studies in databases… databases…
EBI’s Expression Atlas Rocks!EBI’s Expression Atlas Rocks!
TreatmentTreatmentOptionsOptions
QualityQualityOf LifeOf Life
GeneticGeneticRiskRisk
EarlyEarlyDetectionDetection
Patient Patient StratificationStratification
DiseaseDiseaseStagingStaging
OutcomesOutcomes
Natural History of DiseaseNatural History of Disease Clinical CareClinical Care
EnvironmentEnvironment + Lifestyle+ Lifestyle
BirthBirth TreatmentTreatment DeathDeath
Disease Progression and Disease Progression and Personalized CarePersonalized Care
BiomarkersBiomarkers
Welcome to the post-Modern World:Welcome to the post-Modern World:Next-Gen Technologies have Dramatically Next-Gen Technologies have Dramatically
Expanded our Genomic UniverseExpanded our Genomic Universe
Browser-mania rules!Browser-mania rules!
RNA-Seq data of 7 FFPE blocksRNA-Seq data of 7 FFPE blocks
Back to Excel, Man’s Best FriendBack to Excel, Man’s Best Friend
And more websites are integrating And more websites are integrating datadata
Cells Converge to Attractive StatesCells Converge to Attractive States
Stuart Kauffman presented the idea of a gene expression landscape Stuart Kauffman presented the idea of a gene expression landscape with attractorswith attractors
•~250 stable cell types each represent attractors~250 stable cell types each represent attractors
•Cells can be "pushed" or induced to converge to an attractor. Cells can be "pushed" or induced to converge to an attractor.
•Once in the attractor, a cell is robust to small perturbations.Once in the attractor, a cell is robust to small perturbations.
Jess MarJess Mar
Differentiation of Promyelocytes into Differentiation of Promyelocytes into Neutrophil-Like CellsNeutrophil-Like Cells
PromyeloctyesPromyeloctyes
(HL-60 Cell Line)(HL-60 Cell Line)
Neutrophil-like Neutrophil-like CellsCells
Dimethyl Sulfoxide Dimethyl Sulfoxide (DMSO)(DMSO)
All-Trans Retinoic Acid All-Trans Retinoic Acid (ATRA)(ATRA)
~6 days~6 days
Affymetrix Affymetrix GeneChipGeneChip
Time 0Time 0
Day 7Day 7
Collins et al. Collins et al. PNAS PNAS 19781978
RA used in differentiation RA used in differentiation therapy for acute therapy for acute promyelocytic leukemia.promyelocytic leukemia.
Combined with Combined with chemotherapy, complete chemotherapy, complete remission rates as high remission rates as high as 90-95% can be as 90-95% can be achieved.achieved.
Huang et al. Huang et al. PRL PRL 20052005Jess MarJess Mar
GEDI: Cells Display Divergent GEDI: Cells Display Divergent Trajectories That Eventually Converge as Trajectories That Eventually Converge as
they Differentiatethey Differentiate
Huang et al. Huang et al. PRL PRL 20052005
Graphical representation of the results from a Self-Organizing Map clustering.Graphical representation of the results from a Self-Organizing Map clustering.
Expression data from a single sample (time point) clustered according to a grid.Expression data from a single sample (time point) clustered according to a grid.
DMSODMSO, , ATRAATRA
What factors drive this divergent-then-convergent behavior?What factors drive this divergent-then-convergent behavior?
State AState A
State BState B
State AState A
Core Core Differentiation Differentiation PathwayPathway Transient Pathway Transient Pathway
(Perturbation 2)(Perturbation 2)
Transient Pathway Transient Pathway (Perturbation 1)(Perturbation 1)
Observed Observed Trajectory Trajectory (Perturbation 1)(Perturbation 1)
Observed Observed Trajectory Trajectory (Perturbation 2)(Perturbation 2)
State BState B
Our HypothesisOur Hypothesis
Jess MarJess Mar
Observed TrajectoryObserved Trajectory
ATRAATRA
DMSODMSO
ATRAATRA
DMSODMSO
2 hrs2 hrs 4 hrs4 hrs 8 hrs8 hrs 12 hrs12 hrs 18 hrs18 hrs 1 day1 day
ATRAATRA
DMSODMSO
ATRAATRA
DMSODMSO
2 days2 days 3 days3 days 5 days5 days4 days4 days 7 days7 days6 days6 daysJess MarJess Mar
2 hrs2 hrs 4 hrs4 hrs 8 hrs8 hrs 12 hrs12 hrs 18 hrs18 hrs 1 day1 day
Transient TrajectoryTransient Trajectory
ATRAATRA
DMSODMSO
ATRAATRA
DMSODMSO
2 days2 days 3 days3 days 5 days5 days4 days4 days 7 days7 days6 days6 days
Jess MarJess Mar
Core TrajectoryCore Trajectory
2 hrs2 hrs 4 hrs4 hrs 8 hrs8 hrs 12 hrs12 hrs 18 hrs18 hrs 1 day1 day
ATRAATRA
DMSODMSO
ATRAATRA
DMSODMSO
2 days2 days 3 days3 days 5 days5 days4 days4 days 7 days7 days6 days6 days
Jess MarJess Mar
Ultimately, we’d like to get to pathways:Ultimately, we’d like to get to pathways:Functional Roles Are Associated with ConstraintFunctional Roles Are Associated with Constraint
High-variance genes High-variance genes tend to function as tend to function as
cell surface cell surface receptors. receptors.
Low-variance genes Low-variance genes function as kinases function as kinases and transferases. and transferases.
ExtracellularExtracellular
MembraneMembrane
CytoplasmCytoplasm
NuclearNuclear
high variancehigh variance low variancelow variance
But the tools are very primativeBut the tools are very primative
Variance Constraints Alter Variance Constraints Alter Network TopologyNetwork Topology
Degree distributions for the MAPK module are significantly different Degree distributions for the MAPK module are significantly different (Kolmogorov-Smirnov test). (Kolmogorov-Smirnov test).
high variancehigh variance low variancelow variance
Degree of statistical significance Degree of statistical significance is altered by disease status.is altered by disease status.
So we’re back to Heat MapsSo we’re back to Heat MapsThe transcriptional profiles of ONS XS cells from SZ patients more closely The transcriptional profiles of ONS XS cells from SZ patients more closely resemble those of healthy fibroblasts than any other stem cell signature.resemble those of healthy fibroblasts than any other stem cell signature.
And of course, we’ve left out theAnd of course, we’ve left out theinterestingg stuff, like where genes are interestingg stuff, like where genes are
expressed.expressed.
LGRC Research Portal
LGRC Research Portal
PAGE DETAILS
Search-Facets-Search within results-Keyword prompts-Search history
Table:-Paged results-Sortable columns
Actions:-Go to Gene detail page-Add genes to ‘gene set’
Gene Expression Summary
RNASeq
PAGE DETAILS
Annotation summary & summary view for each assay/data type:
Accordion style sections
-GEXP – expression profile across major Dx categories-RNASeq – Exon structure of the gene-SNPs – Table of SNPs in region of gene, highlighting association with major Dx group- Methylation – Methylation profile in region around gene-Genomic alterations – table of CNVs & alterations observed w/ freq in region around gene
Actions:- Click through to assay detail page-Add gene to set
Annotation Summary
LGRC Research Portal
PAGE DETAILS
- View aggregate statistics- View cohort details- Build cohort sets- Build composite phenotypes
Actions:
-Go to data download for selected cohort -Go to assay detail for selected cohort-Go to cohort manager
LGRC Research Portal
Analysis ToolsAnalysis Tools
PAGE DETAILSPAGE DETAILS
-Very minimal parameters and Very minimal parameters and options…here just 2 cohorts of options…here just 2 cohorts of interest, maybe p-value cutoff interest, maybe p-value cutoff
Generates comprehensive reportGenerates comprehensive report
Edit in place results – Don’t set Edit in place results – Don’t set parameters, edit the resultsparameters, edit the results
Analysis goes into queue, email Analysis goes into queue, email notification when finishednotification when finished
Cohort 1:Cohort 1:
Cohort 2:Cohort 2:
Set 1Set 1
Set 2Set 2
Start AnalysisStart AnalysisView analysis parametersView analysis parameters
Job StatusJob Status RunningRunning
Job name: Job name: My job 1My job 1
Analysis of Differential Expression: My Job 1
Supervised Analysis
Meta analysis
Unsupervised analysis
PAGE DETAILS
-Very minimal parameters and options.
Generates comprehensive report
Edit in place results – Don’t set parameters, edit the results
Accordion style result sections
Generate PDF report of analysis
Analysis goes into queue, email notification when finished
Before I came here I was confused Before I came here I was confused about this subject. about this subject.
After listening to your lecture, After listening to your lecture, I am still confused but at a higher level. I am still confused but at a higher level.
- Enrico Fermi, (1901-1954)- Enrico Fermi, (1901-1954)
Genomics is here to stayGenomics is here to stay
The Gene Index TeamThe Gene Index TeamCorina AntonescuCorina Antonescu
Valentin AntonescuValentin AntonescuFenglong LiuFenglong LiuGeo PerteaGeo Pertea
Razvan SultanaRazvan SultanaJohn QuackenbushJohn Quackenbush
Microarray Expression TeamMicroarray Expression Team Stefan BentinkStefan Bentink
Thomas ChittendenThomas ChittendenAedin CulhaneAedin CulhaneKristina HoltonKristina Holton
Jane PakJane PakRenee RubioRenee Rubio
Eskitis InstituteEskitis InstituteChristine WellsChristine Wells
Alan Mackay-SimAlan Mackay-Sim
<[email protected]><[email protected]>AcknowledgmentsAcknowledgments
http://compbio.dfci.harvard.eduhttp://compbio.dfci.harvard.edu
(Former) Stellar Students(Former) Stellar StudentsMartin AryeeMartin Aryee
Kaveh Maghsoudi Kaveh Maghsoudi Jess MarJess Mar
Systems SupportSystems SupportStas Alekseev, Sys AdminStas Alekseev, Sys Admin
Array Software Hit TeamArray Software Hit TeamKatie FranklinKatie FranklinEleanor HoweEleanor Howe
Sarita NairSarita NairJerry PapenhausenJerry PapenhausenJohn QuackenbushJohn Quackenbush
Dan SchlauchDan SchlauchRaktim SinhaRaktim SinhaJoseph WhiteJoseph White
AssistantAssistantJoan CoraccioJoan Coraccio
Juliana CoraccioJuliana Coraccio
Center for Cancer Center for Cancer Computational BiologyComputational Biology
Mick CorrellMick CorrellHowie GoodellHowie GoodellKristina HoltonKristina Holton
Jerry PapenhausenJerry PapenhausenPatricia PapastamosPatricia PapastamosJohn QuackenbushJohn Quackenbush
http://cccb.dfci.harvard.eduhttp://cccb.dfci.harvard.edu
Shameless self-promotionShameless self-promotion