Pizza club - March 2017 - Gaia

22 March 2017

Background & Aim

• There is more and more (genome-wide) data available that is still not optimally used• Genome-wide networks are too big and complex to be interpreted in a

meaningful way• Knowledge-based networks are in general non specific: e.g. canonical pathways,

PPI networks…

Develop a flexible method to identify context-specific subnetworks

Approach• Model the flow of information using chains of interactions• Chains = simple paths: sequence of interactions (e.g. protein modifications) that

connect one start and one ending point.• Multiple chains can exist between a couple of start and end protein: what is the

best meaningful subnetwork?• Prioritization of the chains based on many possible scores: gene expression,

functional module identification, …• Here they present a general tool for combining multiple biological information as

chain scores: ChainRank

Methods1. Search for all chains among user-defined start and end nodes in the network2. Annotate the nodes with scores in order to calculate chains score and p-value

Subnetwork

Restrict the network by heuristic breadth-first search from the fixed initial proteins to the final one with 2 criteria:1. Maximal length allowed = length of the shortest path between initial and final

node2. Prefer the integration of highly connected proteins (canonical signaling

interactors)

Scoring scheme• Chain score = • Node scores used

1. Localisation: mean expression variability across studied tissue vs. mean expression variability across all others -> gene expression

2. Relevance: occurrence of each protein among the significant ones across studies -> gene expression, protein modifications, metabolism…

3. Connectivity: degree centrality -> topology

• Combination of scores1. Weighted product of normalized scores2. Filtering: pre-filter chains by score S1 and rank them by score S23. Intersection: keep only chains that pass filter on all scores

Results• Application to chronic obstructive pulmonary disease (COPD)• Network used: experimental interactions from different public databases + COPD

knowledge base (10k nodes, 62k interactions)• Significance: comparison to chains in random networks• Evaluation: enrichment of the top ranked chains in gold standard pathways

proteins• Improvement metric:

Localisation: expression variability across studied tissue vs. across all others Relevance: occurrence of each protein among the significant ones across COPD-related studiesConnectivity: degree centralityCombination by weighted product: no improvement

Filtering: connectivity<0.05, ranked by localizationIntersection: connectivity and localization

Filtering: top quartile localization, ranked by relevanceIntersection: localization and relevance

IGF-Akt proximity subnetwork MAPK proximity subnetwork

Results for the best 50 chains

Other methods:recall 50-85%Precision 18-42%

Here (max): recall 67%, precision 30%

Conclusions and claims• 50% improvement in finding gold standard proteins (compared to random), and

combining scores even better (x2.5)• 11% improvement of the AUC (compared to random)

• Generic tool applicable to different network types (GRN, metabolic networks)• Importance of selected scores based on scientific question• Applications

• Causal, mechanistic connection?• Common mechanisms driving different diseases• Reduce the computational models • Synthetic lethality

Pizza club - March 2017 - Gaia

Science

Transcript of Pizza club - March 2017 - Gaia