Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert...
-
Upload
leah-cutts -
Category
Documents
-
view
221 -
download
0
Transcript of Statistical Inference Using Graphs for Protein Complex Identification Denise Scholtens Robert...
Statistical Inference Using Graphs for Protein
Complex Identification
Denise ScholtensRobert Gentleman
Marc Vidal
Workshop on Statistical Inference, Computing, and Visualization for Graphs
Stanford UniversityAugust 1-2, 2003
Graphic from:U.S. Department of Energy Human Genome Program http://www.ornl.gov/hgmis
High-throughput Protein Complex Identification
• Gavin, et al. (Nature, 2002)– TAP : Tandem Affinity Purification
• Ho, et al. (Nature, 2002)– HMS-PCI: High-throughput Mass
Spectromic Protein Complex Identification
Protein Complex Identification Using TAP
Data Spoke Model
Matrix Model
Bader, et al. (Nature Biotechnology, 2002)
Protein-Complex Affiliation Network Incidence Matrix
C1 C2 C3 C4 C5 … Cm
P1
P2
P3
P4
P5
P6
P7…
Pn
1110000…0
0000111…0
1001000…0
1101000…0
1111000…0
A =
Cohesive vs. Dynamic Protein Complexes
Cohesive Complex: a complex of invariable composition whose proteins are associated only with that complex and its particular function
Cohesive Complex Affiliation Network Incidence Matrix
C1
Bait
Hit 1
Hit 2
Hit 3
Hit 4Hit 5
111111
A =
Cohesive vs. Dynamic Protein Complexes
Dynamic Complex: complex composed of proteins that may also be involved in other complexes
Dynamic Complex Affiliation Network Incidence Matrices
A =
C1 C2 C3 C4 C5
Bait 1 1 1 1 1
Hit 1 1 0 0 0 0
Hit 2 0 1 0 0 0
Hit 3 0 0 1 0 0
Hit 4 0 0 0 1 0
Hit 5 0 0 0 0 1
C1 C2
Bait 1 1
Hit 1 1 0
Hit 2 0 1
Hit 3 1 0
Hit 4 0 1
Hit 5 1 0
A =
C1 C2
Bait 1 1
Hit 1 1 1
Hit 2 1 1
Hit 3 0 1
Hit 4 0 1
Hit 5 0 1
A =
All 5 “complexes” above would yield the same TAP Data:
Statistical Inference Problem
• What is A?• A captures the cohesive/dynamic
distinction.• At best, we observe all but the main
diagonal of X=AA’.
• Current analyses focus on X, not on A.
Protein Complex Data as a Directed Graph
?
Cohesive Complex described in Gavin, et al.
Dynamic Complex described in Gavin, et al.
Largest Connected Component in Gavin, et al. using Bait Proteins Only,
Colored by Outdegree
Gavin Data Ho Data
SubGraph of Bait Proteins from Previous Graphs with Outdegree 7
Gavin Data Ho Data
Examples of Distinct Complexes Identified by Gavin, et al.
Back to Affiliation Networks
C1
B1 1
B2 1
B3 1
A =
B1 B2 B3
B1 1 1 1
B2 1 1 1
B3 1 1 1
X=AA’ =
One Three-Way Conversation
Affiliation Networks
C1 C2 C3
B1 1 1 0
B2 1 0 1
B3 0 1 1
A =
B1 B2 B3
B1 2 1 1
B2 1 2 1
B3 1 1 2
X=AA’ =
Three Two-Way Conversations
Statistical Inference Problem
• Which A is correct?– A uniquely defines X, but X does not
uniquely define the observable part of A.
• Extra information and directed graph model for the TAP data– Cellular Component Data– Gene Expression Data– Hit Data
Possible Use of Hit Data to Help Estimate A
Conclusions
• In the protein complex setting, directed graphs are useful for EDA, as well as framing the correct questions for statistical inference.
• Statistical inference problem for cohesive and dynamic protein complex identification should focus on A, not X.
• Digraph model of the TAP data better reflects what we actually observe, and is informative for estimating A.