Identifying Differentially Regulated Genes
-
Upload
marcia-stephenson -
Category
Documents
-
view
39 -
download
0
description
Transcript of Identifying Differentially Regulated Genes
1
Identifying Differentially Regulated Genes
Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci
Bioinformatics Lab., CISE Department,University of Florida
2
Gene interaction through regulatory networks
• Gene networks: The genes are nodes and the interactions are directed edges.
• Neighbors– incoming neighbors and outgoing neighbors.
• A gene can changes the state of other genes– Activation– Inhibition
K-Ras Raf MEKERK
JNK
RalGDS Ral RalBP1
PLD1
Cob42Rac
Perturbation experiments
3
K-Ras Raf MEKERK
JNK
RalGDS Ral RalBP1
PLD1
Cob42Rac
Perturbation
• In a perturbation experiment stimulant (radiation, toxic element, medication), also known as perturbation, is applied on tissues.
• Gene expression is measured before and after the perturbation.• A gene can change its expression as a result of perturbation.
• Differentially expressed gene (DE).• Equally expressed gene (EE).
Differentially expressed genes
4
Perturbation experiment : single dataset
• Primarily affected genes : Directly affected by perturbation.
• Secondarily affected genes : Primarily affected genes affect some other genes.
K-Ras Raf MEKERK
JNK
RalGDS Ral RalBP1
PLD1
Cob42Rac
Perturbation
Primarily affected genes
Secondarily affected genes
Differentially and Equally regulated
• Some dataset inherently has two groups.– Fasting vs non-fasting, Caucasian American vs African American
• For these datasets, a gene is– Differentially regulated: DE in one group and EE in another.– Equally regulated: DE or EE in both the groups.– Here, gene g1 is DE in data DA and EE in DB. Hence, it is DR.
5
g1 g4 g5
g2 g3
g1 g4 g5
g2 g3
DADB
Differentially expressed
Equally expressed
666
Two datasets: Primary and secondary effects
• Primarily differentially regulated genes (PDR): Directly affected by perturbation.
• Secondarily differentially regulated genes (SDR): Primarily affected genes affect some other genes.
g1 g4 g5
g2 g3
g1 g4 g5
g2 g3
g0
DADB
Primarily differentially expressed
Secondarily differentially expressed
Equally expressed
7
Problem & method • Input: Gene expression (control and non-control) of
two data groups DA and DB.• Problem: Analyzing the primary and secondary
affects of the perturbation– Estimate probability that a gene is differentially regulated
because of the perturbation or because of the other genes (incoming neighbors)?
– What are the primarily differentially regulated genes? • Method
– Probabilistic Bayesian method, where we employ Markov Random Field to leverage domain knowledge.
Notation • Observed variables
– Microarray datasets:• Two data groups: DA, DB • A single gene gi in group C, (C ϵ
A,B):
• For All genes in group A:
– Neighborhood variables
• Hidden variables– State variables: – Regulation variables: Zi
– Interaction variables: Xij
8
M
1i CiC YY
EE is g ifEE,
DE is g ifDE,S
i
ii
otherwise 0
g tog from edgean if 1,W
ji
ij
'yyY CiiCCi
SAi SBi SAj SBj Zi Zj Xij
DE DE DE DE 1 1 1
DE DE DE EE 1 2 2
DE DE EE DE 1 3 3
DE DE EE EE 1 4 4
DE EE DE DE 2 1 5
DE EE DE EE 2 2 6
DE EE EE DE 2 3 7
DE EE EE EE 2 4 8
EE DE DE DE 3 1 9
EE DE DE EE 3 2 10
EE DE EE DE 3 3 11
EE DE EE EE 3 4 12
EE EE DE DE 4 1 13
EE EE DE EE 4 2 14
EE EE EE DE 4 3 15
EE EE EE EE 4 4 16
9
Problem formulation
• Input to the problem:– Microarray expression: Y – Gene network V = {G, W}
• G = {g0, g1, g2, …, gM} where g0 is metagene.
• Goal:– Estimate the density p(Xij| X- Xij, Y, V, Wij = 1 ) for all Wij.
This gene estimates the probability that a gene is DR due to the perturbation or due to an incoming neighbor gene.
– Note: A higher value for p(Xij ={2, 3}| X- Xij, Y, V, Wij = 1 ) indicates a higher chance that gj is affected by gi
10
Bayesian distribution• We propound a Bayesian model as it allows us to
incorporate our beliefs into the model.– The joint probability distribution over X
– We can derivate the density of Xij , p(Xij| X- Xij, Y, V, Wij =1) from the joint density function.
X XY
XYXY )θV,|p(X)θV,X,|p(Y
)θV,|p(X)θV,X,|p(Y)θ,θV,Y,|p(X
Posterior density Likelihood density Prior density
11
Prior density function : Markov random field
• MRF is an undirected graph Ψ = (X, E).– X = {Xij} represents an
edge in the gene network.
– E = {(Xij, Xpj)| Wpi = Wij= 1} U {(Xij, Xik) | Wjk= Wij
= 1} • An edge in MRF
corresponds to two edges in the gene network. – (X23, X25) corresponds to
(g2, g3) and (g3, g5)
g1 g4 g5
g2 g3
g1 g4 g5
g2 g3
g0
DA DB
X01 (2) X02 (1) X03 (1) X05 (3)
X04 (4) X12 (5) X23 (1) X35 (3)
X14 (8) X13 (5) X25 (7)
(a) Gene network
(b) Markov random field
12
Prior density function: Feature functions• Three beliefs relevant to our model:
– In a data group, the meta gene g0 can affect the states of all other genes. (modeled by adding directed edges from g0 to all other genes.)
– In a data group, a gene can affect the state of its outgoing neighbors.
– A gene has high probability of being equally regulated.• We incorporate these beliefs into the MRF graph using seven
feature functions.• Feature function: Unary or Binary function over the nodes of
MRF. A feature function allows us to introduce our belief on the graph.
13
Feature Functions• Unary: Capture the frequency of Xij.
• Binary: Encapsulates the second belief that In a data group, a gene can affect the state of its outgoing neighbors.
• Unary: Capture the third belief that a gene has high probability of being equally regulated.
• Prior density function
otherwise 0,
2X if 1,)(XF ij
ij1
1W1,Wp, pjij4ij4piij
)X,(Xf)(XF
1W1,Wk, ikij5ij5jkij
)X,(Xf)(XF
Left External Equality
Right External Equality
))(XFγexp(Δ
1)θ|p(X
}7{1,2,...,k1,Wj,i, ijkkXij
Feature functions
otherwise 0,
3X if 1,)(XF
ijij2
)(XF)(XF)(XF ij2ij1ij3
3,...,16}{1,...,4,1t1,W ij6ij6ij
)t,(Xf)(XF
,12,13,16}{1,4,5,8,9t1,W ij7ij7ij
)t,(Xf)(XF
Left Internal Equality
Right Internal Equality
Binary: External feature functions
• The external feature functions encapsulate the belief that in a data group, a gene can affect the state of its outgoing neighbors.
• Left Equality– Xij = Xpj Zi = Zp
• Right Equality– Xij = Xik Zj = Zk
14
g1 g2 g3 g4
X23
X12
X34
X13 X24
(a) Gene network
(a) MRF network
Left equality for X23
Right equality for X23
Unary: Internal feature functions
• The internal feature function represents the belief that a gene has high probability of being equally regulated.
• gi is equally regulated.– Xij = {1,2,3,4} Zi = 1 (DE)
– Xij = {13,14,15,15} Zi = 4 (EE)
• gj is equally regulated.– Xij = {1,5,9,13} Zj = 1 (DE)
– Xij = {4,8,12,16} Zj = 4 (EE)
15
16
Objective function optimization
Obtain an initial estimate of state variables.
Estimate parameters for likelihood density.
Estimate parameters that maximize the prior density.
Estimate parameters that maximize the pseudo-likelihood density.
ICM
Differential evolution
Student’s t
Rank the DE genes based on the likelihood w.r.t the metagene.
17
Dataset and experimental setup• DataSet
– Real: Adapted from Smirnov et al. generated using 10 Gy ionizing radiation over immortalized B cells obtained from 155 doner.
– Real/Synthetic: We created synthetic data to simulate the perturbation experiment based on the real dataset. The simulated model is taken from “Modeling of Multiple Valued Gene Regulatory Networks,” by Garg et. al.
– Gene regulatory network: 24,663 genetic interactions over 2,335 genes collected from KEGG database.
• Experimental setup– Implemented our method in MATLAB and java.– Ran our code on a quad core AMD Opteron 2 Ghz workstation with
32GB memory.
Comparison with other methods
• We compared our method with three other methods:– SMRF: Our old method, developed to analyze the effect of
external perturbation on a single data group.– SSEM: A method to differentiate between primary and
secondary effect of perturbation on gene expression dataset.
– Two sample t-test (Student’s t test)
18
20
Conclusions
• Our method could find primarily affected genes with high accuracy.
• It achieved significantly better accuracy than SMRF, SSEM and the student’s t test method.
• Our method produces a probability distribution rather than a fixed binary decision.
21
Acknowledgement
This work was supported partially by NSF under grants CCF-0829867 and IIS-0845439.