Post on 11-Feb-2016
description
Emergent Biology Through Integration and MiningOf Microarray Datasets
Lance D. MillerGIS Microarray & Expression Genomics
Mining of expression data to understandthe molecular composition of human
cancers and to define componentsof the tumor molecular profile
with mechanistic and clinical importance.
FOCUS:
2001, PNAS
Molecular classes are predictive of outcome
overall survival: relapse-free survival:
70-gene prognosis classifier for predicting riskof distant metastasis within 5 years
Van’t veer, et. al.
Van’t veer, et. al.
Sotiriou, et. al.
Though each tumor is molecularly unique,there exist common transcriptional cassettesthat underly biological and clinical propertiesof tumors that may be of diagnostic,prognostic and therapeutic significance.
GOAL:
Mining of expression data to understandthe molecular composition of human
cancers and to define componentsof the tumor molecular profile
with mechanistic and clinical importance.
The GIS Perpetual Array Platform
Integration of Independent DatasetsPerou et. al., 1999 Sorlie et. al., 2001 West et. al., 2001
Meta-Analysis of Breast Cancer Datasets:
dataset source sample size array format
1. Miller-Liu: unpublished 61 tumors: 39 ER+, 22 ER- 19K spotted oligo
2. Sotiriou-Liu: submitted: PNAS 99 tumors: 34 ER+, 65 ER- 7.6K spotted cDNA
3. Gruvberger-Meltzer: Cancer Research 47 tumors: 23 ER+, 24 ER- 6.7K spotted cDNA
4. Sorlie-Borrensen-Dale: PNAS 74 tumors: 56 ER+, 18 ER- 8.1K spotted cDNA
5. van’t Veer-Friend: Nature 98 tumors: 59 ER+, 39 ER- 25K spotted oligo
6. West-Nevins: PNAS 49 tumors: 25 ER+, 24 ER- 7.1K Affymetrix
total: 428 tumors, ~73,500 probes
(Adaikalavan Ramasamy et. al.)
META MADB: The Construct
1. Extract and Format the Data 2. Link sample/probe info via unique keys3. Log Transform and Normalize4. Filter Genes and Arrays5. Apply Statistical Tests
Building the Matrix
Creating a Universe1. Apply UniGene ID as Unifying Key2. Remove Gene Redundancy 3. Extract p values, d values, z-scores4. Set p value threshold5. Merge Datasets
META MADB
META MADB
d values (difference of average expression)
T1 T2 T3 T4 T5 …Tn T1 T2 T3 T4 T5 …Tn
gene1 : e1 e2 e3 e4 e5 …en e1 e2 e3 e4 e5 …en
d = average e [ER+] average e [ER-]/
ER+ ER-
Identifying Grade-Specific Genesin Hepatocellular Carcinoma
• Sample: 10 cases of each class• Sample collection: HBV(+)• Array: Human 19K Oligonucleotide array• Analysis : 50 arrays
OAH AAH G1 G2 G3
HCC Progression
Pre-neoplastic lesions
Adenomatous hyperplasiaordinary atypical
HCC Grade 1, 2, 3
Identifying Grade-Specific Genesin Hepatocellular Carcinoma
Identifying Grade-Specific Genesin Hepatocellular Carcinoma
Breast Cancer Grade-Associated Genes asPredictors of HCC Grade?
HCC
BC
ORC6L DNA replicationTROAP M/G1 cell adhesionBUB1 G2/M mitotic spindle checkpoint; oncogenesisCKS2 G2/M cytokinesisMELK G2 tyr/ser/thr kinase activityCDC20 G2/M regulation of cell cycleHN1 G2/M UnknownMCM6 G1/S DNA replication initiationCDC2 G2 mitotic initiationUBE2C G2 cyclin catabolismTOP2A G2 DNA metabolismCDKN3 M/G1 regulation of CDK activityPTTG1 M/G1 mitotic regulation; oncogenesisE2-EPF M/G1 ubiquitin cycleFLJ23462 electron transportGATA3 embryogenesis
Breast Cancer Grade-Associated Genes asPredictors of HCC Grade?
HCC
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 10 (RNA helicase) 2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7
2.5 +UDP-Gal:betaGlcNAc beta 1,4- galactosyltransferase, polypeptide 1 3.7 + + ++Hypothetical protein FLJ14299/Similar to nocA zinc-finger protein 2.5 ++Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +
Estrogen Responsive Genes in vitro (Chin-Yo Lin)
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +Duodenal cytochrome -2.1 + +Thrombospondin 1 2.4 + +Putative transmembrane protein -3.8 + + +++Stromal cell-derived factor 1 3.8 ++ ++Retinoblastoma binding protein 8 2.2 ++ + + ++Janus kinase 1 (a protein tyrosine kinase) 4.9 ++ ++protein kinase H11 1.5Olfactomedin 1 3.0 ++
2.3 + +Hypothetical protein similar to mouse Dnajl1 2.5 + +++Putative protein kinase 1.7
2.5 +3.7 + + ++2.5 ++
Immunoglobulin superfamily, member 4 2.2 + ++Cyclin G2 -2.6 ++ +Sialyltransferase 1 beta-galactoside alpha-2,6-sialytransferase -2.0 +Chitobiase, di-N-acetyl- -1.9 ++Arachidonate 12-lipoxygenase, 12R type -4.0 ++ +Purinergic receptor (family A group 5) -2.3 +G protein-coupled receptor kinase 7/Binds Erbeta -1.8 + +
UG Description Fold Change 2T47D MCF-7 ZR75-1 SAGE ERE (-2)Interleukin 6 signal transducer (gp130, oncostatin M receptor) 2.5 ++ +Insulin-like growth factor binding protein 4 2.1 + + + +Seven in absentia homolog 2 (Drosophila) 1.7 + +Matrix metalloproteinase 7 (matrilysin, uterine) -1.7 ++ +Stanniocalcin 2 5.0 ++ + + ++Nuclear receptor interacting protein 1/RIP140 1.6 + + +GREB1 protein 3.1 +Serum-inducible kinase -2.0 + +Amphiregulin 3.9 ++ +CD7 antigen (p41) -2.5 + +
1 2 3 4 5 6
(p<0.001)
Estrogen-Responsive in vitro and ER Status-Associated in vivo
E2 E2 + ICI E2 + CHX
Identifying Cancer-Linked Genesin Epithelial Adenocarcinomas
Datasets: 3 gastric, 3 prostate, 2 liver, 1 lung
selection at p<0.001 242 Genes that Distinguish Tumor from Normalat p<0.001 in at least 3 of the 4 Tumor Types
database components: internal and external datasets derived from:
- tumor studies (clinical samples) - in vitro, pathway studies (eg, timecourse)- SAGE data- mouse studies (in vitro/in vivo)
An Integrated Database for Pan-CancerMeta-Analysis of Gene Expression Data
Summary
Derive expression signatures for all major factors known or suspected to have prognostic value
Determine the reliability of expression signatures in outcome prediction
Expand integrated database for pan- cancer meta-analysis
Integrate expression profiling into clinical decision making
Future Directions
Acknowledgements
Catholic University of KoreaSuk-Woo NamJung Yong Lee
GISAdai Ramasamy Liza VergaraPhil LongChin-Yo Lin Benjamin Mow