Simulation and Application on learning gene causal relationships

23
Simulation and Application on learning gene causal relationships Xin Zhang

description

Simulation and Application on learning gene causal relationships. Xin Zhang. Introduction. High-throughput genetic technologies empowers to study how genes interact with each other ; Simulation to evaluate how well IC algorithm learns gene causal relationships; - PowerPoint PPT Presentation

Transcript of Simulation and Application on learning gene causal relationships

Page 1: Simulation and Application on learning gene causal relationships

Simulation and Application on learning gene causal relationships

Xin Zhang

Page 2: Simulation and Application on learning gene causal relationships

Introduction• High-throughput genetic technologies empowers

to study how genes interact with each other; • Simulation to evaluate how well IC algorithm

learns gene causal relationships;• We present an algorithm (mIC algorithm) for

learning causal relationship with knowledge of topological ordering information, and apply it on Melanoma dataset;

• Apply mIC algorithm on Melanoma dataset;

Page 3: Simulation and Application on learning gene causal relationships

Steps for Simulation Study

• Construct a causal network N;• Generate datasets based on the causal network;• Learning the simulated data using causal

algorithms (e.g. IC algorithm) to obtain network N´;

• Compare the original network N with obtained network N´ w.r.t precision and recall;

Page 4: Simulation and Application on learning gene causal relationships

Modeling and simulation of a causal Boolean network (BN)

• Boolean network:A

C

Bf

C=f(A,B)

• Constructing a causal structure;• Assign parameters (proper functions) for each

node with casual parents;• Assign probability distribution;

Page 5: Simulation and Application on learning gene causal relationships

Constructing Boolean Network

1. Generate M BNs with up to 3 causal parents for each node;

2. For each BN, generate a random proper function for each node;

3. Assign random probabilities for the root gene(s);4. Given one configuration, get probability distribution;5. Collect 200 data points for each network;6. Repeat above steps 3-5 for all M networks.

Page 6: Simulation and Application on learning gene causal relationships

Constructing Causal Structure

A

C

B

E

D

Page 7: Simulation and Application on learning gene causal relationships

Steps for constructing causal structure

Page 8: Simulation and Application on learning gene causal relationships

Proper function (1)

Proper function: The function that reflects the influence of the operators.

Example:

By simplifying f, c is a function of a with c = ab is a pseudo predictor of c, and has no effect on c.

f is not a proper function.

Page 9: Simulation and Application on learning gene causal relationships

Proper function (2)

• Definition:

With n predictors, the number of proper function is given by:

Page 10: Simulation and Application on learning gene causal relationships

Probability Distribution

Page 11: Simulation and Application on learning gene causal relationships

Generating dataset

Page 12: Simulation and Application on learning gene causal relationships

Steps of learning gene causal relationships

• Step1: obtain the probability distribution and data sampling;

• Step2: apply algorithms to find causal relations;• Step3: compare the original and obtained networks

based on the two notions of precision and recall;• Step4: repeat step 1-3 for every random network;

Page 13: Simulation and Application on learning gene causal relationships

Comparing two networks

A

DC

B A

DC

B

Original Network Obtained Network

Page 14: Simulation and Application on learning gene causal relationships

Precision and Recall

• Original graph is a DAG, while obtained graph has both directed and undirected edges;

Orig Graph Obt. Graph

FNTPTN

FPPFN, PTPPTN, PFP

Recall = ATP/(AFN+ATP), Precision = ATP/(ATP + AFP)

Page 15: Simulation and Application on learning gene causal relationships

Observational equivalence and Transitive Closure

• Two DAGs are said to be observational equivalent (OE) if they have the same skeleton and the same set of v-structure;

A

DC

B A

DC

BOE

Transitive closure (TC): A ->B -> C with A -> C

cc(x,y): is true if there is a directed or an undirected edge from x to y;

pcc(x,y): is true if there is a path from x to y consisting of properly directed and undirected edges

pcc(x,y):= cc(x,y) | pcc(x,z) pcc(z,y)

Page 16: Simulation and Application on learning gene causal relationships

Result for IC algorithm

Page 17: Simulation and Application on learning gene causal relationships

How to improve IC algorithm

• The original IC algorithm did not have good results on learning gene causal relationships;

• A possible way to improve the performance is to incorporate extra information;

• If we know the topological ordering of the regulatory network, it would be helpful to improve the learning result;

Page 18: Simulation and Application on learning gene causal relationships

Gene topological ordering

• If a specific gene is the causal parent of another gene;

• In a pathway, if one gene appears before another gene;

• If one gene is at the beginning or at the end of the pathway;

IC algorithm + topological ordering information

Page 19: Simulation and Application on learning gene causal relationships

mIC algorithm

• mIC algorithm based on IC, but incorporates both topological ordering information with steady state data to infer causality;

• 3 Steps of mIC algorithm:– Find conditional independence:

For each pair of gene gi and gj in a dataset, test pairwise conditional independence. If they are dependent, search for a setSij = {gk | gi and gj are independent given gk, with i<k<j, or j<k<i}.

Construct an undirected graph G such that gi and gj are connected with an edge if an only if they are pairwise dependent and no Sij can be found;

– Find v-structure:For each pair of nonadjacent genes gi and gj with common neighbor gk, if gk Sij, and k>i, k>j, add arrowheads pointing at gk, such as gi ->gk <- gj;

– Orientate more directed edges according to rules:Orientate the undirected edges without creating new cycles and v-structures;

Page 20: Simulation and Application on learning gene causal relationships

Results from mIC algorithm

Page 21: Simulation and Application on learning gene causal relationships

Melanoma dataset

• The 10 genes involved in this study chosen from 587 genes from the melonoma data;

• Previous studies show that WNT5A has been identified as a gene of interest involved in melanoma;

• Controlling the influence of WNT5A in the regulation can reduce the chance of melanoma metastasizing;

Page 22: Simulation and Application on learning gene causal relationships

Applying mIC algorithm on Melanoma Dataset

WNT5A

Partial biological prior knowledge:MMP3 is expected to be the end of the

pathway

Pirin causatively influences WNT5A – In order to maintain the level of

WNT5A we need to directly control WNT5A or through pirin.

WNT5A directly causes MART-1

Page 23: Simulation and Application on learning gene causal relationships

Conclusion• Evaluated IC algorithm using simulation data;• We presented mIC algorithm that can infer gene causal

relationship from steady state data with gene topological ordering information;

• Performed simulation based on Boolean network to evaluate the performance of the causal algorithms;

• We applied mIC algorithm to real biological microarray data Melanoma dataset;

• The result showed that some of the important causal relationships associated with WNT5A gene have been identified using mIC algorithm.