Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.
Applications of Bayes Nets (I)
• Windows Office “Paper Clip.”
• Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.”
• UBC Intelligent Tutoring System (ASI X-change).
Applications of Bayes Nets (II)
• University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates.
• Tetrad II recalibrates Mass Spectrometer on earth satellite.
• Tetrad II predicts relation between corn exports and exchange rates.
Bayes Nets: Basic Definitions
• Defn: A and B are independent iff P(A and B) = P(A) x P(B).
• Exercise: Prove that A and B are independent iff P(A|B) = P(A).
• Thus independence implies irrelevance.
Independence Among Variables
• Let X,Y,Z be random variables.• X is independent of Y iff
P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0.
• X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0.
• Notation: (X Y|Z).• Intuitively: given information Z,
Y is irrelevant to X.
Axioms for Informational Relevance
• Pearl (2000), p.11. It’s possible to read the symbol as “irrelevant”. Then we can consider a number of axioms for as axiomatizations of relevance, for example:
• Symmetry: if (X Y|Z) then (Y X|Z).
• Decomposition: if (X YW|Z) then (X Y|Z).
Markovian Parents
• In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables.
• Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj.
Markovian Parents and Bayes Nets
• Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes.
• Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering.
• Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).
Independence in Bayes Nets
• Note how useful irrelevance information is – think of a Prolog-style logical database.
• A typical problem: Given some information Z, and a query about X, is Y relevant to X?
• For Bayes nets, the d-separation criterion is a powerful answer.
d-separation
• In principle, information can flow along any path between two variables X and Y.
• Provisos: A path is blocked by any collider.
• Conditioning on a node reverses its status.
– Conditioning on non-collider makes it block.
– Conditioning on collider or its descendant makes it unblocked.
d-separation characterizes independence
• If X,Y d-separated by Z in a DAG G, then (X Y|Z) in all probability distributions compatible with G.
• If X,Y not d-separated by Z in a DAG G, then not [(X Y|Z) in all probability distributions compatible with G.].
Observational Equivalence
• Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.).
• How does prob constrain graph?• Two causal graphs G1,G2 are
compatible with the same probs iff.– G1 has the same adjacencies as
G2 and the same v-structures (basically, colliders).
Observational Equivalence: Examples
(I)• In sprinkler network, cannot tell
whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5.
• General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.
Observational Equivalence: Examples
(II)• Vancouver sun, March 29, 2002.
“Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?”
[Science, Dr. Johnson, Columbia U.]
Two Models of Aggressive behaviour
Disposition to aggression
Violent behavourTV watching
Disposition to aggression
Violent behavourTV watching
Are these two graphs observationally distinguishable?
Minimal Graphs
• A graph G is minimal for a probability distribution P iff– G is compatible with P, and– no subgraph of G is compatible
with P.
• Example: not minimal if A {B,C,D}
A
C D
B
Note on minimality
• Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B.
• In statistical tests, dependence is observable but independence is not.
• So minimality amounts to “assume independence until dependence is observed”.
• That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).
Stable Distributions
• A distribution P is stable iff there is a graph G such that (X Y |Z) in P iff X and Y are d-separated given Z in G.
• Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.
Inferring Causal Structure: The IC
Algorithm• Assume a stable probability
distribution P.
• Find a minimal graph for P with as many edges directed as possible.
• General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.
Inferring Causal Structure: The IC
Algorithm
1. For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y.
2. For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y.
3. Orient edges without creating cycles or v-structures.
Rules for Orientation
• Given a b, b – c add b c if a,c are not linked (no new collider).
• Given a c b, a – b add a b (no cycle).
• Given a – c d and c d b and a – b add a b if c,d are not linked (no cycle + no new collider).
• Given a – c b and a – d b and a – b add a b if c,d are not linked (no cycle + no new collider).