Data integration across omics landscapes

24
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]

description

Data integration across omics landscapes. Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine [email protected]. Omics data integration. DNA. Elephant. mRNA. Protein. Informatics approaches to integrate genomic and proteomic data . - PowerPoint PPT Presentation

Transcript of Data integration across omics landscapes

Data integration across omics landscapes

Bing Zhang, Ph.D.Department of Biomedical Informatics

Vanderbilt University School of Medicine

[email protected]

Omics data integration

CNCP20122

DNA

mRNA

Protein

Elephant

Informatics approaches to integrate genomic and proteomic data

CNCP20123

Genomic data

Proteomic data

Novel biological insights

Genomic data

Improved proteomic data analysis

Protein expressionMS/MS

Protein PTMMS/MS, protein arraysProt

eom

e

CPTAC

CNV

LOH

DNA Methylation

Exon expression

Junction expression

Gene expression

Mutations

Sequence variants

arrayCGH, SNP Array

SNP Array

Methylation Array

Array, RNA-Seq

RNA-Seq

Array, RNA-Seq

Exome SequencingRNA-Seq

Exome SequencingRNA-Seq

Gen

ome

Tran

scrip

tom

eEG

Technology Data Type

TCGA

The Cancer Genome Atlas

Clinical Proteomic Tumor Analysis Consortium

Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein

databases to enhance protein identification in shotgun proteomics

Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis

Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding post-

transcriptional mechanisms regulating human gene expression Project 4. NetGestalt: viewing and correlating cancer omics

data within a biological network context

Informatics approaches to integrate genomic and proteomic data

CNCP20124

customProDB: motivation

CNCP20125

Database search

commonly used databaseExpressed proteins

Unexpressed proteins

Proteins with sequence variation

Increased sensitivity

Reduced ambiguity

Variant peptides

Customized protein database from RNA-Seq data

CNCP20126

Wang et al., J Proteome Res, 2012

R package Compatible with both DNA and RNA sequencing data Sample specific database and consensus database Application to the CPTAC project Spectral library

CustomProDB: moving forward

CNCP20127

Wang et al., manuscript in preparation

miRNA regulation: motivation

miRNA expression

mRNA expression

Protein/mRNA ratio

Protein expression

mRNA decay

Translation repression

Combined effect

Inverse correlation

8 CNCP2012

miRNA regulation: data preparation

9 colorectal cancer cell lines Protein expression data: Current study mRNA expression data: GSE10843 miRNA expression data: GSE10833

9 CNCP2012

miRNA regulation: data analysis workflow

10

Liu et al., manuscript in preparationCNCP2012

Early studies suggest a major role of translational repression Olsen et al. Dev Biol, 1999; Zeng et al., Molecular Cell, 2001

Recent large-scale studies suggest a predominant role of mRNA decay Baek et al., Nature, 2008; Selbach et al., Nature, 2008; Guo et al.,

Nature, 2010 Our study suggested equally important roles of mRNA decay and

translational repression Translational repression was involved in 58% and played a major role in

30% of all predicted miRNA-targeted interactions Most miRNAs exert their effect through both mRNA decay and

translational repression Sequence features known to drive site efficacy in mRNA decay were

generally not applicable to translational repression

miRNA regulation: mRNA decay or translational repression?

11 CNCP2012

miR-138 prefers translational repression

12 CNCP2012

NetGestalt: motivation

CNCP201213

DNAmutation

methylation

mRNAexpression

splicing

Proteinexpressionmodification

Phenotype

Network

NetGestalt: scalable network representation

CNCP201214

Total number of modules (size >30): 92 Functional homogeneity: 63 (69%) Spatial homogeneity: 55 (60%) Dynamic homogeneity: 69 (75%) Homogeneity of any type: 82 (89%)

3 2 1 0

Proteins

Viewing data as tracks Heat map (e.g. gene expression data) Bar chart (e.g. fold changes, p values) Binary track (e.g. significant genes,

GO) Comparing binary tracks

Clickable Venn diagram Enrichment analysis

Network modules GO terms Pathways

Navigating at different scales Zoom Pan 2D graph visualization

NetGestalt: viewing and cross-correlating data

CNCP201215Shi et al., manuscript under revision

CNCP201216

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201217

Luminal B

Basal

Prot

eom

ics -log(p) signed

Diff proteins

-log(p) signed

Diff proteins

Luminal B

Basal

-log(p) signed

Diff genes

PNN

LTC

GA

RulerNetwork modules

Vand

yM

icro

arra

y

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201218

Luminal B

Basal

Prot

eom

ics -log(p) signed

Diff proteins

-log(p) signed

Diff proteins

Luminal B

Basal

-log(p) signed

Diff genes

PNN

LTC

GA

RulerNetwork modules

Vand

yM

icro

arra

y

45%51%

4%

0%

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201219

VandyPNNL

-log(p) signed

-log(p) signed

Luminal B

Basal

-log(p) signed

RulerNetwork modules

MicroarrayLuminal BBasal

Enric

hed

Mod

ules

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

CNCP201220

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Vandy

PNNL

-log(p) signed (Vandy)

-log(p) signed (PNNL)

Luminal B

Basal

-log(p) signed

RulerNetwork modules

MicroarrayLuminal BBasal

Enr

iche

d M

odul

es

MRM targets

DNA damage response

Gene symbol

CNCP201221

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Vandy

PNNL

Luminal B

Basal

-log(p) signed

RulerNetwork modules

MicroarrayLuminal BBasal

Enr

iche

d M

odul

es

MRM targetsDNA damage response

Gene symbol

-log(p) signed (Vandy)

-log(p) signed (PNNL)

CNCP201222

Browsing data sources

Viewing data as tracks

Comparing tracks

Identifying modules

Annotating modules

Moving across scales

Luminal BBasal

Prot

eom

ics

-log(p) signed

Luminal B

Basal

-log(p) signed

RulerNetwork modules

Mic

roar

ray

Enric

hed

Mod

ules

ProteomicsMicroarray

T cell activation

Using genomic data to improve proteomic data analysis Project 1. customProDB: generating customized protein

databases to enhance protein identification in shotgun proteomics

Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis

Integrating genomic and proteomic data to gain novel biological insights Project 3. miRNA-mediated regulation: understanding post-

transcriptional mechanisms regulating human gene expression Project 4. NetGestalt: viewing and correlating cancer omics

data within a biological network context

Informatics approaches to integrate genomic and proteomic data

CNCP201223

Qi Liu Jing Wang Xiaojing Wang Jing Zhu

Dan Liebler Rob Slebos Dave Tabb

Zhiao Shi

Acknowledgement

CNCP201224

Funding: NIGMS R01GM088822NCI U24CA159988NCI P50CA095103