Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Comparison of Networks Across Species
CS374 Presentation October 26, 2006Chuan Sheng Foo
In the beginning there was DNA…
Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334
…then came protein interactions
Arabidopsis
PPI network
E. Coli
PPI network
Yeast PPI network
Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional
relevance Sequence conservation implies functional
conservation Network conservation implies functional conservation
too!
What new insights might we gain from network comparisons? (Why should we care?)
Network comparisons allow us to:
Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions
Only possible with network comparisons
Possible with existing techniques, but improved with network comparisons
What is a Protein Interaction Network? Proteins are nodes Interactions are
edges Edges may have
weights
Yeast PPI network
H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)
The Network Alignment Problem
Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks
Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)
Example Network Alignment
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
General Framework For Network Alignment Algorithms
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
Network construction
Scoring function
Alignment algorithm
Covered in lecture on network integration
Two Algorithms Discussed Today
NetworkBLASTSharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.
Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006.
Overview of
Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.
Estimation of Interaction Probabilities In the preprocessing step, edges in the
network are given a reliability score using a logistic regression model based on three features:
1. Number of times an interaction was observed
2. Pearson correlation coefficient between expression profiles
3. Proteins’ small world clustering coefficient
Network Alignment Graphs
Construct a Network Alignment Graph to represent the alignment
Nodes contain groups of sequence similar proteins from the k organisms
Edges represent conserved interactions. An edge between two nodes is present if:
1. One pair of proteins directly interacts, the rest are distance at most 2 away
2. All protein pairs are of distance exactly 23. At least max(2, k – 1) protein pairs directly interact
Tries to account for interaction deletions
Example Network Alignment Graph
Nodes
a
b
c
a’
b’
c’
a’’
b’’
c’’
ab
c
a’
b’
c’
a’’
b’’
c’’
Network alignment graph
Individual species’ PPI network
Species X Species Y Species Z
Scoring Function
Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure
High scoring subgraphs correspond to structured sub-networks (cliques or pathways)
Only network topology is scored, node similarity is not
Log Likelihood Ratio Model
Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network
Randomly constructed network preserves degree distribution for nodes
logPr(Subgraph occurs | Conserved Network)
Pr(Subgraph occurs | Random Network)
Likelihood Ratio Scoring of a Protein Complex in a Single Species
U : a subset of vertices (proteins) in the PPI graphOU : collection of all observations on vertex pairs in UOuv : interaction between proteins u, v observedMs : conserved network modelMn: random network (null) modelTuv : proteins u, v interactFuv : proteins u, v do not interactβ : probability that proteins u, v interact in conserved modelpuv : probability that edge u, v exists in a random model
Probability of complex being observed in a conserved network model
Probability of subgraph being observed in a random network model
Likelihood Ratio Scoring of a Protein Complex in a Single Species
Hence, log likelihood for a complex occurring in a single species is given by
For multiple complexes across different species, it is the sum of the log likelihoods
L(A, B, C) = L(A) + L(B) + L(C)
Example of Complex Scoring
Nodes
a
b
c
a’
b’
c’
a’’
b’’
c’’
ab
c
a’
b’
c’
a’’
b’’
c’’
Conserved complex A in the Network alignment graph
Individual species’ PPI network
L(A) = L(X1) + L(Y1) + L (Z1)
Complex X1 in Species X
Complex Y1 in Species Y
Complex Z1 in Species Z
Alignment algorithm
Problem of identifying conserved sub-networks reduces to finding high scoring subgraphs
NP-complete problem Heuristic solution:
Greedy extension of high scoring seeds(Does this sound familiar? BLAST?)Common to both papers discussed
Alignment algorithm
1. Find seeds for each node v in the alignment graph
a. Find high scoring paths of 4 nodes by exhaustive search
b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed
Alignment algorithm
2. Iteratively add or remove nodes to increase the overall score of the node
Original seeds are preserved Limit size of discovered subgraphs to 15
nodes Record up to 4 highest scoring subgraphs
discovered around each node
Alignment algorithm
3. Filter subgraphs with a high degree of overlap
Iteratively find high scoring subgraph and remove all highly overlapping ones remaining
ResultsConserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.
ResultsPrediction of protein function
• ‘Guilt by association’
• If a conserved cluster or path is significantly enriched in a functional annotation
Prediction of protein interactions
Predictions based on 2 strategies:
• Evidence that proteins with similar sequences interact
• Co-occurrence of proteins in the same conserved cluster or path
• Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate
Overview of
Fast, scalable, network alignmentScales linearly in number of networks
comparedNetworkBLAST scales exponentially
Supports efficient querying of modules Speed-sensitivity control via user defined
parameterNot supported in NetworkBLAST
Input to the Algorithm
Weighted protein interaction graphsWeights represent probability that proteins
interactConstructed via network integration algorithm
covered in a later lecture A phylogenetic tree relating the species in
the desired alignmentUsed for progressive alignment
Definition of an alignment
A set of subgraphs chosen from the interaction networks of different species, together with a mapping between aligned proteins
Aligned proteins form equivalence classes Each class was derived from a common ancestral
protein Can contain multiple proteins from the same species
a a’ a’’ b’’
Equivalence class showing paralogs
Scoring Function
Log likelihood ratio model based onAlignment model M: modules are subject to
evolutionary constraintRandom model R: modules are not subject to
any constraints Scores equivalence classes and alignment
edges separately
Log Likelihood Ratio Model (Recap) Measures the likelihood that a module occurs if it
is subject to evolutionary constraint vs. that if it were a randomly constructed network
Randomly constructed network preserves degree distribution for nodes
logPr(Module occurs | Alignment Model M)
Pr(Module occurs | Random Model R)
Scoring Equivalence Classes
Reconstruct most parsimonious ancestral history of an equivalence class using Dynamic Programming based on five types of evolutionary events
Alignment model and random model give probabilities for each of these events, combined to give a log likelihood score
Scoring Alignment Edges
Alignment scores should reflect both network conservation and high connectivity – difficult to strike a balance
Introduction of a novel scoring approachEdge Scoring Matrix – Indexed by labelsAlgorithm assigns a label to each equivalence
class, scores according to distribution function in cells referenced by labels
Scoring: ESM
Alignment Algorithm:d-Clusters for Seed Generation A d-cluster consists of d
proteins close together in a network
“Close” means edge weights are high, so interaction is highly likely
Intuition is that high scoring alignments will have high scoring d-clusters
Alignment Algorithm:d-Clusters for Seed Generation Identify pairs of d-clusters
that score higher than a threshold T Score is defined by greedily
matching nodes from each d-cluster to obtain a high score
Uses these pairs as seeds Allows for speed-sensitivity
tradeoff
Alignment Algorithm: Generating An Initial Alignment From The Seed Determine highest scoring pair of nodes
(one from each d-cluster) when aligned Align these nodes and place these nodes
as well as their neighbors, into a frontier
3.0
1.5
5.0
Alignment Algorithm:Greedy Seed Extension Phase Examine all pairs of
nodes in frontier for pair that maximally increases score when added to alignment
Stops when no pair can further increase the score
Remove equivalence classes if it can further increase the score
Frontier
Current alignment
Alignment Algorithm:Multiple Alignment Progressive alignment
technique using the phylogenetic tree Successively aligns closest
pair of networks Places each aligned
network at the parent node of the two aligned species
Linear scaling in number of species
Performance Comparison:Speed-sensitivity / Linear Scaling
Performance Comparison: Multiple Alignment
Performance Comparison: Module Querying
ResultsFunctional module identification using network alignment
Functional module for transformation?
Results
Functional annotation using network alignment
Pairwise alignment
Multiple alignment of 9 networks
Conserved DNA replication module
Results
Multiple alignment of 10 networks showing possible cell division module
Functional annotation using network alignment
The Future of Network Comparison
Græmlin
Græmlin?
Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006
That’s all folks!
Thank you!
Questions?
Performance Comparison:Sensitivity
Scoring Sequence Mutations
Weighted sum of pairs scoring