CBIO243: Principles of Cancer Systems Biology
Sylvia Plevritis, PhDCourse Director
Melissa KoTeaching Assistant
Fuad NijimCCSB Program Manager
March 31, 2014
Goals of CBIO243
• Introduce major principles of cancer systems biology that integrate experimental and computational biology.
• Gain familiarity with methods to analyze high-dimensional and highly-multiplexed data in order to synthesize biologically and clinically relevant insights and generate hypotheses for functional testing.
Biological Sciences:• Cancer Biology,• Hematology,• Immunology,• Genetics,• etc.
Computational Sciences:• Bioinformatics,• Engineering,• Computer Science,• Physics,• Statistics,• etc.
CSB
Approach: Integrative Analysis
Cancer Research Goal: Drug Targets Drug Resistance Combination
Therapies Tumor Evolution Cancer Drivers Metastasis Tumor Heterogeneity Cancer Stem Cells EMT Personalized Medicine Biomarkers Other ______
Experimental Sciences:
Sequencing Methylation Gene Expression CNV TMA Proteomics Single Cell Analysis LCM, Sorted Cells Drug Screening Other ______
_______
Computational Sciences:
Statistical Regression
Machine Learning Bayesian Analysis Boolean Analysis ODE/PDE Network
Reconstruction Pathway Analysis Other _____
________
Functional Validation
Components of Cancer Systems Biology
Topics Covered
• Basic principles of molecular biology of cancer• Experimental high-throughput technologies• Design of perturbation studies, including drug screening.• Overview of publically available datasets, including GEO,
TCGA, CCLE, and ENCODE• Online biocomputational tools, including selected
accessible tools from the NCI Center for Bioinformatics• Network reconstruction from genomic data• Application of systems biology to identifying drug targets• Application of systems biology to personalized medicine
Grading
• Weekly paper review/class participation (30%)
• Project Presentations (20%)
• Final Project Report (50%): 6-7 page written report and oral presentation demonstrating the understanding of key concepts in cancer systems biology research.
Weekly Reading Review
• Summarize objective/hypothesis, the data, the controls, results and the published interpretations.
• Discuss whether the authors' conclusions were justified, and suggest improved analyses and/or future research.
• Describe relevance to cancer systems biology, and any gaps in training to fully understand paper.
First Reading Assignment
• Chuang, H.-Y., Lee, E., Liu, Y.-T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology.
• Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard, P., et al. (2010). An Integrated Approach to Uncover Drivers of Cancer. Cell, 143(6), 1005–1017.
Background Material
• Overview of Cancer– Hannahan D, Weinberg RA. Hallmarks of Cancer:
The Next Generation, Cell 14(5), 2011. • Overview of Molecular Biology
– Kimball’s Biology Pages– http://home.comcast.net/~john.kimball1/
BiologyPages
Background Material
• Visualization of Genomic Data• Schroeder MP, et al, Visualizing multidimensional
cancer genomics data, Genome Medicine, 5:9, 2013
• Overview of Programming– R/Bioconductor
• http://www.r-project.org/• www.cyclismo.org/tutorial/R/
– Python• http://www.python.org/• https://developers.google.com/edu/python/
Center for Cancer Systems Biology(ccsb.stanford.edu)
• Monthly Seminar Series– GENOMIC BIOMAKERS OF CANCER PREVENTION AND TREATMENT– Friday April 11th at 11 am (Alway Building, Room M114)Andrea Bild, Department of Pharmacology
and Toxicology, University of Utah
• Annual Symposium (Friday October 17, 2014)
• R25T Training Grant– Two year postdoctoral training fellowship
Cancer as a Complex System
Pienta et al, Ecological Therapy for Cancer: Defining Tumors Using an Ecosystem Paradigm Suggests New Opportunities for Nove Cancer Treatments, Translational Oncology, 2008, 1(4):158-164.
Multiscale View of Cancer• Genes and proteins• Complex signaling and regulatory networks• Multiple cellular processes• Micro-environment• Host systems• Environmental factors• Population dynamics
Initiation Progression Metastasis Recurrence
Time - Progression
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
Hallmarks of Cancer
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.
http://www.cell.com/image/S0092-8674(11)00127-9?imageId=gr2&imageType=hiRes
Network types• Protein-protein• Protein-DNA• miRNA-RNA• Transcriptional
(expression) networks• Signaling networks
Sachs et al. http://www.sciencemag.org/content/308/5721/523.full
20
The Multiscale Challenge
• Many components and interactions of the “cancer system” are known
• Linkages between global dynamics and phenotypic properties from local interactions are not well known
http://circ.ahajournals.org/content/123/18/1996/F5.expansion.html
Goals of Cancer Systems Biology Research
• To derive a comprehensive understanding of cancer’s complexity by integrating diverse information to:– Identify cellular networks and cell-cell interactions
that drive cancer initiation and progression– Identify potential therapeutic targets and mechanisms
of action
Principles in Cancer Systems Biology Research
• Cancer networks are dynamic and response to genetic variants, epigenetics and the microenvironment
• Tumors may not be a random collection of malignant cells but cells that may be related through processes of developmental biology
Cancer Systems Biology The Past
Experimentation Computation
Cancer Systems Biology The Present
Experimentation Computation
Cancer Systems Biology The Future
Experimentation
Computation
Objective: Identify genes and networks differentially expressed in lymphoma transformation
• Glas et al. “Gene expression profiling in follicular lymphoma to assess clinical aggressiveness and to guide the choice of treatment.” Blood 2005
– 24 paired samples (12 FL/12 DLBCL)
– 88 FL/DLBCL arrays » 30 DLBCL» 40 FL-transforming (FL_t)» 18 FL-non-transforming (FL_nt)
FL DLBCL
Identify differentially expressed genes
• Average Fold Change (AFC)• Pro: Easy• Con: Does not account for
variance
• p-value, based on t-test statistic• Pro: Easy, accounts for
variance• Con: Does not account for
the problem of multiple hypothesis testing
Log2(Average Fold Change)
-Log
10(p
-val
ue)
Statistical Analysis of Microarrays (SAM)
http://www-stat.stanford.edu/~tibs/expected
obse
rved
Address the problem of Multiple Hypothesis Testing:
Suppose measure 10,000 genes and nothing changes.
At the %1 significance level, 100 genes could be selected as differentially expressed but all would be false positives.
SAM corrects for this by computing the False
Discovery Rate, based on
permutation testing.
GOminer• Identify enrichment in Gene Ontology (GO) terms
based a hierarchy describing biological process; cellular component; molecular function
Genes significantly differentially expressed in compact vs. non-compact tumors are related to cell death, Cell-to-cell signaling and interaction, cellular assembly and organization, DNA replication and Cellular movement
http://discover.nci.nih.gov/gominer/
Gene set enrichment analysis (GSEA)
• Evaluate enrichment of curated gene sets, such as– Pathways– Genes that share a motif– Genes at a similar chromosomal
location– Computationally predicted gene
sets– Your own favorite list of genes
• Evaluating related genes together adds statistical power
• http://broad.mit.edu/gsea
GSEA on Lymphoma Data
• Myc targets up-regulated, in agreement with Myc up-regulation found by SAM
• GSEA detects ~200 sets of differentially expressed genes at low FDR– Many metabolic pathways up-
regulated in DLBCL– Myc target genes significant
• In general, GSEA produces many “generic” gene sets– many metabolic– many a consequence of
aggressive phenotype– no graphical view of pathways
FLDLBCL
LegendUP
DOWN
Overlap expression levels on canonical pathways
IPA, Ingenuity Pathway Analysis (www.ingenuity.com)
Cellular assembly & organization network
Cellular assembly & organization network
• Expand network using interactions from the literature
• Visualization using cellular localization
IPA links to literature
Protein-protein Interaction Networks
Protein-protein interaction networks http://string-db.org
String-db.org - example• DNA repair genes
BARD1 FANCL POLD3 TOPBP1
BLM FEN1 POLE TREX1
BRCA1 GMNN POLE2 UNG
BRIP1 ING2 PRIM2A USP1
DCLRE1A MLH3 RAD51A
DCLRE1B MSH2 RAD54B
DDX11 MSH5 RECQL4
DNA2L MSH6 RFC3
EXO1 PARP2 RFC4
FANCG PCNA RPA2
Inferring Gene Regulatory Networks
Useful non-technical review:“Computational methods for discovering gene
networks from expression data” Lee & Tzou
Single gene focus is limiting
induced
repressed
gene A
FL DLBCL
individuals
Gene interaction is more powerful
induced
repressed
DLBCL
individuals
gene A
gene B
FL FL
A UPB DOWN
Interaction of gene clusters
induced
repressed
DLBCLFL FL
individuals
Module X
Module Y
X UPY DOWN
Module1
Module2
Module3
samples
gene1
gene2
geneN
Inferring Gene Regulation
Inferring Gene Regulation
Mod1
Mod8
Mod3
Mod6
samples
Average expression of each module
Key Idea of Regulatory Module Networks• Look for a set of regulatory factors that, in combination,
predict a gene’s expression level• Regulatory factors can include:
– mRNA level of regulatory proteins– Genotypic factors (SNPs, CNVs)– Epigenetic factors (methylation status)– TF binding (measured by ChIP-seq)– …
• Factors that robustly predict a target’s expression across different experiments are inferred to be its regulators
Segal et al., Nature Genetics 2003
Transcription factors, signal transduction proteins, mRNA binding proteins, chromatin modification factors, …
Computational Derived Regulatory Module
Group of co-expressed genes are driven by
a computationally derived
transcriptional regulatory program,
derived from a candidate list of
regulators.
Gene A
Gene B
OnOff
OnOff
Mod
ule
gene
s
Regulatory program
Segal E et al, Nature Genetics 2003.
Core module network of FL transformation
Gentles A et al, Blood 2009
Integration with survival data• Module A is single most predictive of survival data by Cox
regression (bad prognosis in FL)
• Define a linear predictor of survival:– LPS=1.14*ModuleA + 0.72*GFL3027 – 1.35*GFL2738
2738*35.13027*72.0*14.1 GFLGFLModuleALPS 2738*35.13027*72.0*14.1 GFLGFLModuleALPS
Bad Part: ESC like expression
Good Part: TGFB signaling
Gentles A et al, Blood 2009
Survival based on LPS
Gentles A et al, Blood 2009
DATABASES
• TCGA• CCLE• ENCODE
The Cancer Genome Atlas (TCGA)
• Phase I: Initiated in 2005 by the National Cancer Institute and National Human Genome Research Institute to catalog genetic mutations causing cancer, using genome sequencing; focused on GBM, lung and ovarian cancer
• Phase II: Expanded to 20-25 different cancer types, complement genome sequencing with genomic characterization, including gene expression profiling, copy number variation, DNA methylation, miRNA
TCGA:Cancer measured at multiple scales
– mRNA & miRNA expression
– Copy number– DNA Methylation– Mutation (NGS)– Pathology images– Medical Images– Treatment– Survival Outcome
Acute
Myeloid Le
ukemia
[LAML]
Bladder
Urotheli
al Carc
inoma [BLC
A]
Brain Lo
wer Grad
e Glio
ma [LG
G]
Breast
invasiv
e carc
inoma [BRCA]
Cervica
l squam
ous cell
carci
noma and en
docervic
al ad
enocar
cinoma [
CESC]
Colon aden
ocarcin
oma [COAD]
Esophag
eal ca
rcinoma [
ESCA]
Glioblas
toma multiform
e [GBM]
Head an
d Neck sq
uamous c
ell ca
rcinoma [
HNSC]
Kidney Chro
mophobe [KICH]
Kidney re
nal cle
ar cel
l carci
noma [KIRC]
Kidney re
nal pap
illary
cell ca
rcinoma [
KIRP]
Liver
hepato
cellular
carci
noma [LIH
C]
Lung a
denocar
cinoma [
LUAD]
Lung s
quamous c
ell ca
rcinoma [
LUSC
]
Lymphoid Neo
plasm Diffuse
Large
B-cell L
ymphoma [
DLBC]
Ovaria
n sero
us cyst
aden
ocarcin
oma [OV]
Pancre
atic a
denocar
cinoma [
PAAD]
Prosta
te ad
enocar
cinoma [
PRAD]
Rectum ad
enocar
cinoma [
READ]
Sarco
ma [SA
RC]
Skin Cutan
eous M
elanoma [
SKCM]
Stomach
aden
ocarcin
oma [ST
AD]
Thyro
id carci
noma [TH
CA]
Uterine C
orpus E
ndometrioid Carc
inoma [UCEC
]0
100
200
300
400
500
600
700
800
900
1000
TCGA Cancer Types
Num
ber o
f Pati
ents
with
Sam
ples
TCGA Organization TSS:Tissue Source Sites
BCR: Biospecimen Core Resources
DCC: Data Coordinating Center
GCC: Genome Characterization Centers
GSC: Genome Sequencing Center
CGSub: Cancer Genomics Hub
GDACS: Genome Data Analysis Centers
Major TCGA Publications• Comprehensive molecular characterization of human colon and rectal cancer.
Nature. 487 (7407):330-337, 2012. – Mutations in ARlD1A, SOX9, FAM123B/WTX;, IGF2; mutations in WNT pathway
• Comprehensive genomic characterization of squamous cell lung cancers. Nature. 489 (7417):519:525, 2012.
• Comprehensive molecular portraits of human breast tumors. Nature. 490 (7418):61-70, 2012.
- Mutations in ESR1, GATA3, FOXA1, XBP1, and cMYB.
• Integrated genomic analyses of ovarian carcinoma. Nature. 474 (7353):609-615, 2011.
– Mutations in TP53 occurred in 96% of the cases studied; mutations in BRCA1 and BRCA2 occurred in 21% of the cases
• An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR and NF1. Cancer Cell. 17 (1):98-110, 2010.
• Identification of a CpG Island Methylator Phenotype that Defines a Distinct Subgroup of Glioma. Cancer Cell. 17 (5):510-522 , 2010.
• Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 455 (7216):1061-1068, 2008.
– Mutations in NF1, ERBB2, TP53, PlK3R1
UCSC Cancer Browser – Chromosome View
https://genome-cancer.ucsc.edu
UCSC Cancer Browser Gene View
Cancer Browser – Survival Analysis
Cancer Cell Line Encyclopedia (CCLE)
• The Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, and Novartis to conduct a genetic and pharmacologic characterization of a large panel of human cancer cell lines
• Link distinct drug response to genomic patterns and to translate cell line integrative genomics into cancer patient stratification.
• Public access analysis and visualization of DNA copy number, mRNA expression and mutation data for about 1000 cell lines.
http://www.broadinstitute.org/ccle/home
Cellular Information Processing
ENCODE
http://genome.ucsc.edu/ENCODE/index.html
ENCODE
Summary
• Basic principles of molecular biology of cancer• Experimental high-throughput technologies• Design of perturbation studies, including drug screening.• Overview of publically available datasets, including GEO,
TCGA, CCLE, and ENCODE• Online biocomputational tools, including selected
accessible tools from the NCI Center for Bioinformatics• Network reconstruction from genomic data• Application of systems biology to identifying drug targets• Application of systems biology to personalized medicine
Top Related