Transcriptional Diagnosis by Bayesian Network
description
Transcript of Transcriptional Diagnosis by Bayesian Network
![Page 1: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/1.jpg)
1
Harvard Medical School
Transcriptional Diagnosis by Bayesian Network
Hsun-Hsien Chang and Marco F. Ramoni
Children’s Hospital Informatics Program
Harvard-MIT Division of Health Sciences and Technology
Harvard Medical School
March 17, 2009
![Page 2: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/2.jpg)
2
Harvard Medical School
Background
• Microarray technology enables profiling expression of thousands of genes in parallel on a single chip.
• Comparative analysis of gene expression across tissue states extracts signature genes for disease diagnosis.
• Challenge: – Number of variables (i.e., genes) is much greater than the
number observations (i.e., biological samples), inducing the problem of overfitting.
• Existing methods:– Gene selection: compute statistics (eg., t-statistics, SNR,
PCA) of individual genes and select high rank genes.– Classification model: create a classification function of
selected genes.
![Page 3: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/3.jpg)
3
Harvard Medical School
Proposed Approach
• Issues:– Assumption on gene independencies is inadequate. – Other genes may be collinearly expressed with the signature.– Selection and classification are two non-integrated steps.
Need a cut-off threshold to select high rank genes.
• Proposed strategies:– Adopt system biology approach to infer the functional
dependence among genes.– Use the dependence network for tissue discrimination. – Integrate gene selection and classification model in Bayesian
network framework.
![Page 4: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/4.jpg)
4
Harvard Medical School
Data Representation by Bayesian Network
Gene 1
Gene 2
Gene N
Cas
e 1
.
.
.
.
.
.
Cas
e 2
. . . .
Tissue state 1
Cas
e M
Tissue state 2
G1
Pheno
G2
GN
.
.
.
.
.
.
• Bayesian networks are directed acyclic graphs where:– Node corresponds to random variables.– Directed arcs encode conditional probabilities of the target
nodes on the source nodes.
![Page 5: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/5.jpg)
5
Harvard Medical School
Gene Selection by Bayes Factor
Pheno
G1
G2
GN
Gp
Gq
G1
Pheno
G2
GN
.
.
.
.
.
.
gene selection by Bayes factor
![Page 6: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/6.jpg)
6
Harvard Medical School
Collinearity Elimination via Network Learning
Pheno
G1
G2
GN
Gp
Gq
Pheno
G2
GN
Gp
Gq
G1
Gp
GN
collinearity elimination
![Page 7: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/7.jpg)
7
Harvard Medical School
Sample Classification
• The phenotype variable is independent of the blue genes, given the green genes.
• Technically, the green genes are under the Markov blanket of the phenotype variable, and they are the signature genes used for phenotype determination.
• Tissue classification:
GN
Pheno
G2
Gp
Gq
G1
![Page 8: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/8.jpg)
8
Harvard Medical School
Algorithm Summary
Gene Selection by Bayes Factor
Collinearity Elimination
Sample Classification
Optimize Performance
......
...
...
Optimize Hyperparameters
(sensitivity analysis)
...
![Page 9: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/9.jpg)
9
Harvard Medical School
• Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are major subtypes of lung cancer:– AC and SCC are distinct in survival, chances of metastasis,
and responses to chemotherapy and targeted therapy.
– Physicians lack confidence in correct recognition when there are multiple primary carcinomas.
• Training: – 58 ACs and 53 SCCs.– 77 genes selected in the network.– 25 signature genes.
Discriminate Lung Carcinoma Subtypes
![Page 10: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/10.jpg)
10
Harvard Medical School
Bayesian Network for Lung Carcinoma
![Page 11: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/11.jpg)
11
Harvard Medical School
Large-Scale Testing on Independent Samples
• 422 samples (232 ACs and 190 SCCs) aggregated from 7 cohorts (including Caucasians, African-Americans, Chinese).
• Accuracy = 95.2% AUROC.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1ROC curves
1-specificity
sen
sitiv
ity
Proposed Bayes Net (95.2%)
![Page 12: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/12.jpg)
12
Harvard Medical School
Comparisons with Other Popular Methods
• Higher classification accuracy.• Small-sized signature to avoid overfitting.
Testing AUROC
p-value# signature
genes
Bayesian Network 95.2% --- 25
PCA/LDA 91.2% 0.0047 13PAM
(Tibshirani et al., PNAS 2002)91.0% 0.0014 77
Weighted Voting(Golub et al., Science 1999)
93.4% 0.6240 800
![Page 13: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/13.jpg)
13
Harvard Medical School
KRT6 Family Characterizes the Lung Carcinoma Discrimination
![Page 14: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/14.jpg)
14
Harvard Medical School
KRT6 Family Characterizes the Lung Carcinoma Discrimination
• Keratin-6 family genes (KRT6A, KRT6B, KRT6C) are important for distinguishing lung cancer subtypes.
– Accounting for 95% of the accuracy of the whole 25-gene signature.
– Located on chromosome 12q12-q13.
– A nonlinear, concave discriminative surface.
![Page 15: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/15.jpg)
15
Harvard Medical School
Verification by Chr12q12-q13 Aberrations• Investigate DNA copy number changes in comparative
genomic hybridization (CGH) array.– 12 ACs and 13 SCCs from
Vrije University Medical Center, Netherland.
– A dumbbell discriminative surface achieves 80% classification accuracy.
– Treat average CGH values of genes occupying q12, q13, and q12-13 respectively as three features to construct a Naïve Bayes Classifier.
![Page 16: Transcriptional Diagnosis by Bayesian Network](https://reader035.fdocuments.in/reader035/viewer/2022062801/568143a3550346895db025b9/html5/thumbnails/16.jpg)
16
Harvard Medical School
Conclusion
• Reverse engineer regulatory network information for tissue classification.
• Adopt the system biology approach to infer gene dependencies network.– Select genes by Bayes factor.– Eliminate collinearity via network learning.– Integrate gene selection and classification model
in a single Bayesian network framework.• Demonstrate the promising translational
value of the system biology approach in clinical study.