Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

Inferring Causal Graphs

Computing 882

Simon Fraser University

Spring 2002

Applications of Bayes Nets (I)

• Windows Office “Paper Clip.”

• Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.”

• UBC Intelligent Tutoring System (ASI X-change).

Applications of Bayes Nets (II)

• University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates.

• Tetrad II recalibrates Mass Spectrometer on earth satellite.

• Tetrad II predicts relation between corn exports and exchange rates.

Bayes Nets: Basic Definitions

• Defn: A and B are independent iff P(A and B) = P(A) x P(B).

• Exercise: Prove that A and B are independent iff P(A|B) = P(A).

• Thus independence implies irrelevance.

Independence Among Variables

• Let X,Y,Z be random variables.• X is independent of Y iff

P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0.

• X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0.

• Notation: (X Y|Z).• Intuitively: given information Z,

Y is irrelevant to X.

Axioms for Informational Relevance

• Pearl (2000), p.11. It’s possible to read the symbol as “irrelevant”. Then we can consider a number of axioms for as axiomatizations of relevance, for example:

• Symmetry: if (X Y|Z) then (Y X|Z).

• Decomposition: if (X YW|Z) then (X Y|Z).

Markovian Parents

• In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables.

• Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj.

Markovian Parents and Bayes Nets

• Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes.

• Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering.

• Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).

Independence in Bayes Nets

• Note how useful irrelevance information is – think of a Prolog-style logical database.

• A typical problem: Given some information Z, and a query about X, is Y relevant to X?

• For Bayes nets, the d-separation criterion is a powerful answer.

d-separation

• In principle, information can flow along any path between two variables X and Y.

• Provisos: A path is blocked by any collider.

• Conditioning on a node reverses its status.

– Conditioning on non-collider makes it block.

– Conditioning on collider or its descendant makes it unblocked.

d-separation characterizes independence

• If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G.

• If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].

Observational Equivalence

• Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.).

• How does prob constrain graph?• Two causal graphs G1,G2 are

compatible with the same probs iff.– G1 has the same adjacencies as

G2 and the same v-structures (basically, colliders).

Observational Equivalence: Examples

(I)• In sprinkler network, cannot tell

whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5.

• General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.

Observational Equivalence: Examples

(II)• Vancouver sun, March 29, 2002.

“Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?”

[Science, Dr. Johnson, Columbia U.]

Two Models of Aggressive behaviour

Disposition to aggression

Violent behavourTV watching

Disposition to aggression

Violent behavourTV watching

Are these two graphs observationally distinguishable?

Minimal Graphs

• A graph G is minimal for a probability distribution P iff– G is compatible with P, and– no subgraph of G is compatible

with P.

• Example: not minimal if A {B,C,D}

A

C D

B

Note on minimality

• Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B.

• In statistical tests, dependence is observable but independence is not.

• So minimality amounts to “assume independence until dependence is observed”.

• That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).

Stable Distributions

• A distribution P is stable iff there is a graph G such that (X Y |Z) in P iff X and Y are d-separated given Z in G.

• Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.

Inferring Causal Structure: The IC

Algorithm• Assume a stable probability

distribution P.

• Find a minimal graph for P with as many edges directed as possible.

• General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.

Inferring Causal Structure: The IC

Algorithm

1. For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y.

2. For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y.

3. Orient edges without creating cycles or v-structures.

Rules for Orientation

• Given a b, b – c add b c if a,c are not linked (no new collider).

• Given a c b, a – b add a b (no cycle).

• Given a – c d and c d b and a – b add a b if c,d are not linked (no cycle + no new collider).

• Given a – c b and a – d b and a – b add a b if c,d are not linked (no cycle + no new collider).

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

Documents

Transcript of Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.