Omic Data Integration Strategies

14
Approaches for Integration of multiple ‘Omic’ Data Dmitry Grapov, PhD

description

Discussion of 'Omic' (e.g. genomic, transcriptomic, proteomic and metabolimic) data integration approaches including: Gene ontology (GO) enrichment Genes + Metabolites functional enrichment Gene + protein + metabolite network mapping

Transcript of Omic Data Integration Strategies

Page 1: Omic Data Integration Strategies

Approaches for Integration of multiple ‘Omic’ Data

Dmitry Grapov, PhD

Page 2: Omic Data Integration Strategies

Examples

Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643

FBA = flux-balance analysis

• Topological enrichment can give broad overview of impacted genes, proteins and metabolites

• Changes in biochemical domains corroborated by multi-Omic data sets can be used to identify robust candidates responsible for phenotypic variation between comparisons

• Gene-gene, protein-protein or gene-protein interaction networks can be used to deconvolute ambiguous metabolic pathways

Page 3: Omic Data Integration Strategies

Common Approaches

Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643

Page 4: Omic Data Integration Strategies

Biochemical Domain Enrichment Analysis

• Genes/Proteins DAVID, AmiGo, etc GO:terms

• Genes/Proteins + Metabolites IMPaLA: Integrated Molecular Pathway Level Analysis (http://impala.molgen.mpg.de/) pathways

1. Classify all species domains (e.g. biological process, pathway, etc)

2. Calculate probability of observing changes in species by chance

Page 5: Omic Data Integration Strategies

IMPaLA: Gene + Metabolite pathway enrichment

Challenges:• Removal of redundant information• Preference of specific vs. generic pathways• Visualization of gene + metabolite + pathway relationships

Page 6: Omic Data Integration Strategies

Determining significance of the enrichment: Hypergeometric Test

How to calculate statistics to determine enrichment?

hit.num = 51 # number of significantly changed pathway metabolites set.num = 1455 # number of metabolites in pathway full = 3358 # all possible metabolites in organismq.size = 72 # number of significantly changed metabolites

phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)= 1.717553e-06

Page 7: Omic Data Integration Strategies

GO Enrichment analysis:Hierarchy of Redundancy (parents)

• GO is an ontology wherein enrichment is often shared by children and parents.

• Difficult to co-visualize term hierarchy and gene to term mapping

Page 8: Omic Data Integration Strategies

Enrichment networks: Removing the Hierarchy of Redundancy

Workflow:

1. If two nodes share all genes, drop least enriched (highest p-value)

2. Filter terms based on enrichment

3. Display term to gene/protein relationships as edges in a network

4. Map direction of change in genes/proteins to network node attributes

Page 9: Omic Data Integration Strategies

Enrichment NetworkMapping of parents through children

GO enrichment network displays:

• gene names associated with each overrepresented term

• Fold change in protein expression between two groups (can be extended k>2 groups)

• Can display enrichment p-value for each term

• Can incorporate metabolites as children of genes

Page 10: Omic Data Integration Strategies

Empirical Networks

• Correlation based networks (CN) (simple, tendency to hairball)

• GGM or partial correlation based networks (advanced, preference of direct over indirect relationships

• *Increase in robustness with sample size

10.1007/978-1-4614-1689-0_17

Page 11: Omic Data Integration Strategies

Topological Enrichment Networks

http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi

http://www.genome.jp/dbget-bin/www_bget?rn:R00975

Page 12: Omic Data Integration Strategies

Topological Enrichment Networks:genes + proteins + metabolites

Page 13: Omic Data Integration Strategies

MetaMapRBiological network generator

https://github.com/dgrapov/MetaMapR

Page 14: Omic Data Integration Strategies

[email protected] metabolomics.ucdavis.edu

This research was supported in part by NIH 1 U24 DK097154