Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

50
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    2

Transcript of Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Page 1: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Page 2: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Comparison of Networks Across Species

CS374 Presentation October 26, 2006Chuan Sheng Foo

Page 3: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

In the beginning there was DNA…

Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334

Page 4: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

…then came protein interactions

Arabidopsis

PPI network

E. Coli

PPI network

Yeast PPI network

Page 5: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional

relevance Sequence conservation implies functional

conservation Network conservation implies functional conservation

too!

What new insights might we gain from network comparisons? (Why should we care?)

Page 6: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Network comparisons allow us to:

Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions

Only possible with network comparisons

Possible with existing techniques, but improved with network comparisons

Page 7: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

What is a Protein Interaction Network? Proteins are nodes Interactions are

edges Edges may have

weights

Yeast PPI network

H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)

Page 8: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

The Network Alignment Problem

Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks

Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

Page 9: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Example Network Alignment

Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

Page 10: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

General Framework For Network Alignment Algorithms

Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

Network construction

Scoring function

Alignment algorithm

Covered in lecture on network integration

Page 11: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Two Algorithms Discussed Today

NetworkBLASTSharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.

Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006.

Page 12: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Overview of

Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.

Page 13: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Estimation of Interaction Probabilities In the preprocessing step, edges in the

network are given a reliability score using a logistic regression model based on three features:

1. Number of times an interaction was observed

2. Pearson correlation coefficient between expression profiles

3. Proteins’ small world clustering coefficient

Page 14: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Network Alignment Graphs

Construct a Network Alignment Graph to represent the alignment

Nodes contain groups of sequence similar proteins from the k organisms

Edges represent conserved interactions. An edge between two nodes is present if:

1. One pair of proteins directly interacts, the rest are distance at most 2 away

2. All protein pairs are of distance exactly 23. At least max(2, k – 1) protein pairs directly interact

Tries to account for interaction deletions

Page 15: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Example Network Alignment Graph

Nodes

a

b

c

a’

b’

c’

a’’

b’’

c’’

ab

c

a’

b’

c’

a’’

b’’

c’’

Network alignment graph

Individual species’ PPI network

Species X Species Y Species Z

Page 16: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring Function

Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure

High scoring subgraphs correspond to structured sub-networks (cliques or pathways)

Only network topology is scored, node similarity is not

Page 17: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Log Likelihood Ratio Model

Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network

Randomly constructed network preserves degree distribution for nodes

logPr(Subgraph occurs | Conserved Network)

Pr(Subgraph occurs | Random Network)

Page 18: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Likelihood Ratio Scoring of a Protein Complex in a Single Species

U : a subset of vertices (proteins) in the PPI graphOU : collection of all observations on vertex pairs in UOuv : interaction between proteins u, v observedMs : conserved network modelMn: random network (null) modelTuv : proteins u, v interactFuv : proteins u, v do not interactβ : probability that proteins u, v interact in conserved modelpuv : probability that edge u, v exists in a random model

Probability of complex being observed in a conserved network model

Probability of subgraph being observed in a random network model

Page 19: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Likelihood Ratio Scoring of a Protein Complex in a Single Species

Hence, log likelihood for a complex occurring in a single species is given by

For multiple complexes across different species, it is the sum of the log likelihoods

L(A, B, C) = L(A) + L(B) + L(C)

Page 20: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Example of Complex Scoring

Nodes

a

b

c

a’

b’

c’

a’’

b’’

c’’

ab

c

a’

b’

c’

a’’

b’’

c’’

Conserved complex A in the Network alignment graph

Individual species’ PPI network

L(A) = L(X1) + L(Y1) + L (Z1)

Complex X1 in Species X

Complex Y1 in Species Y

Complex Z1 in Species Z

Page 21: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment algorithm

Problem of identifying conserved sub-networks reduces to finding high scoring subgraphs

NP-complete problem Heuristic solution:

Greedy extension of high scoring seeds(Does this sound familiar? BLAST?)Common to both papers discussed

Page 22: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment algorithm

1. Find seeds for each node v in the alignment graph

a. Find high scoring paths of 4 nodes by exhaustive search

b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed

Page 23: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment algorithm

2. Iteratively add or remove nodes to increase the overall score of the node

Original seeds are preserved Limit size of discovered subgraphs to 15

nodes Record up to 4 highest scoring subgraphs

discovered around each node

Page 24: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment algorithm

3. Filter subgraphs with a high degree of overlap

Iteratively find high scoring subgraph and remove all highly overlapping ones remaining

Page 25: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

ResultsConserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.

Page 26: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

ResultsPrediction of protein function

• ‘Guilt by association’

• If a conserved cluster or path is significantly enriched in a functional annotation

Prediction of protein interactions

Predictions based on 2 strategies:

• Evidence that proteins with similar sequences interact

• Co-occurrence of proteins in the same conserved cluster or path

• Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate

Page 27: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Overview of

Fast, scalable, network alignmentScales linearly in number of networks

comparedNetworkBLAST scales exponentially

Supports efficient querying of modules Speed-sensitivity control via user defined

parameterNot supported in NetworkBLAST

Page 28: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Input to the Algorithm

Weighted protein interaction graphsWeights represent probability that proteins

interactConstructed via network integration algorithm

covered in a later lecture A phylogenetic tree relating the species in

the desired alignmentUsed for progressive alignment

Page 29: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Definition of an alignment

A set of subgraphs chosen from the interaction networks of different species, together with a mapping between aligned proteins

Aligned proteins form equivalence classes Each class was derived from a common ancestral

protein Can contain multiple proteins from the same species

a a’ a’’ b’’

Equivalence class showing paralogs

Page 30: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring Function

Log likelihood ratio model based onAlignment model M: modules are subject to

evolutionary constraintRandom model R: modules are not subject to

any constraints Scores equivalence classes and alignment

edges separately

Page 31: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Log Likelihood Ratio Model (Recap) Measures the likelihood that a module occurs if it

is subject to evolutionary constraint vs. that if it were a randomly constructed network

Randomly constructed network preserves degree distribution for nodes

logPr(Module occurs | Alignment Model M)

Pr(Module occurs | Random Model R)

Page 32: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring Equivalence Classes

Reconstruct most parsimonious ancestral history of an equivalence class using Dynamic Programming based on five types of evolutionary events

Alignment model and random model give probabilities for each of these events, combined to give a log likelihood score

Page 33: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring Alignment Edges

Alignment scores should reflect both network conservation and high connectivity – difficult to strike a balance

Introduction of a novel scoring approachEdge Scoring Matrix – Indexed by labelsAlgorithm assigns a label to each equivalence

class, scores according to distribution function in cells referenced by labels

Page 34: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring: ESM

Page 35: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment Algorithm:d-Clusters for Seed Generation A d-cluster consists of d

proteins close together in a network

“Close” means edge weights are high, so interaction is highly likely

Intuition is that high scoring alignments will have high scoring d-clusters

Page 36: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment Algorithm:d-Clusters for Seed Generation Identify pairs of d-clusters

that score higher than a threshold T Score is defined by greedily

matching nodes from each d-cluster to obtain a high score

Uses these pairs as seeds Allows for speed-sensitivity

tradeoff

Page 37: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment Algorithm: Generating An Initial Alignment From The Seed Determine highest scoring pair of nodes

(one from each d-cluster) when aligned Align these nodes and place these nodes

as well as their neighbors, into a frontier

3.0

1.5

5.0

Page 38: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment Algorithm:Greedy Seed Extension Phase Examine all pairs of

nodes in frontier for pair that maximally increases score when added to alignment

Stops when no pair can further increase the score

Remove equivalence classes if it can further increase the score

Frontier

Current alignment

Page 39: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Alignment Algorithm:Multiple Alignment Progressive alignment

technique using the phylogenetic tree Successively aligns closest

pair of networks Places each aligned

network at the parent node of the two aligned species

Linear scaling in number of species

Page 40: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Performance Comparison:Speed-sensitivity / Linear Scaling

Page 41: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Performance Comparison: Multiple Alignment

Page 42: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Performance Comparison: Module Querying

Page 43: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

ResultsFunctional module identification using network alignment

Functional module for transformation?

Page 44: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Results

Functional annotation using network alignment

Pairwise alignment

Multiple alignment of 9 networks

Conserved DNA replication module

Page 45: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Results

Multiple alignment of 10 networks showing possible cell division module

Functional annotation using network alignment

Page 46: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

The Future of Network Comparison

Græmlin

Græmlin?

Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

Page 47: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

That’s all folks!

Thank you!

Questions?

Page 48: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Page 49: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Performance Comparison:Sensitivity

Page 50: Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Scoring Sequence Mutations

Weighted sum of pairs scoring