Weakening the Causal Faithfulness Assumption

28
Weakening the Causal Faithfulness Assumption Jiji Zhang Lingnan University Based on joint work with Peter Spirtes

description

Weakening the Causal Faithfulness Assumption. Jiji Zhang Lingnan University Based on joint work with Peter Spirtes. Markov and Faithfulness Assumptions. - PowerPoint PPT Presentation

Transcript of Weakening the Causal Faithfulness Assumption

Page 1: Weakening the Causal Faithfulness Assumption

Weakening the Causal Faithfulness Assumption

Jiji ZhangLingnan University

Based on joint work with Peter Spirtes

Page 2: Weakening the Causal Faithfulness Assumption

2

Markov and Faithfulness Assumptions

Suppose the set of observed variables V is causally sufficient and its causal structure can be properly represented by a DAG over V.

A statement of conditional independence is said to be entailed by a DAG if it is entailed by the Markov property of the DAG.

Causal Markov Assumption: Every conditional independence statement entailed by the causal DAG over V is satisfied by the joint distribution over V.

Causal Faithfulness Assumption: Every conditional independence statement satisfied by the joint distribution over V is entailed by the causal DAG over V.

Page 3: Weakening the Causal Faithfulness Assumption

3

Simple Examples of Unfaithfulness

X

Y

Z-

+ +

X Y

Z

X

[0, 1]

Y Z

[0, 1, 2] [0, 1]

Entailed: none; Extra: X Z.

Entailed: X Z | Y; Extra: X Z.

Entailed: X Y; Extra: X Z; Y Z

Page 4: Weakening the Causal Faithfulness Assumption

4

Testing Faithfulness?• Without knowing the true causal DAG, the Faithfulness

assumption is not fully testable.

• But given the Markov assumption, the Faithfulness assumption has a testable consequence: the distribution of V is (Markov and) faithful to some DAG.

• Unfaithfulness is in principle detectable if the distribution is not faithful to any DAG.

It is undetectable if the distribution is faithful to some (false) DAG.

Page 5: Weakening the Causal Faithfulness Assumption

5

SGS AlgorithmS1. Form the complete undirected graph H over V.

S2. For each pair of variables X and Y, search for S V\{X, Y} such that X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found.

S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y and Z are adjacent, but X and Z are not adjacent),

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

S4. More orientation rules …

Page 6: Weakening the Causal Faithfulness Assumption

6

Justification of S2S2. For each pair of variables X and Y, search for S V\{X, Y} such that

X and Y are independent conditional on S. Remove the edge between X and Y in H iff such a set is found.

• Inference of adjacencies is justified by the Markov assumption.

• Inference of non-adjacencies is justified by a consequence of the Faithfulness assumption.

Adjacency-Faithfulness: For every X, YV, if X and Y are adjacent in the true causal DAG, then they are not independent conditional on any subset of V\{X,Y}.

Page 7: Weakening the Causal Faithfulness Assumption

7

Justification of S3S3. For each unshielded triple <X, Y, Z> (i.e., X and Y are adjacent, Y

and Z are adjacent, but X and Z are not adjacent),

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

• (1) and (2) are both justified by the Markov assumption.

• What about the Faithfulness assumption?

Page 8: Weakening the Causal Faithfulness Assumption

8

Justification of S3 (con’t)• The antecedent of clause (1) and that of clause (2) do not exhaust the

logical possibilities.

• The remaining logical possibility is ruled out by the following consequence of the Faithfulness assumption:

Orientation-Faithfulness: For every unshielded triple <X, Y, Z> in the true causal DAG,

– If X Y Z, then X and Z are not independent conditional on any subset of V\{X,Y} that contains Y.

– Otherwise, X and Z are not independent conditional on any subset of V\{X,Z} that does not contain Y.

X Y Z

Entailed: X Z | Y; Extra: X Z.

Page 9: Weakening the Causal Faithfulness Assumption

9

First Weakening of Faithfulness• It follows that given the Markov and Adjacency-Faithfulness

assumptions, violations of Orientation-Faithfulness are detectable, and a there is a straightforward test:

S3*. For each unshielded triple <X, Y, Z>,

(1) If X and Z are not independent conditional on any subset of V\{X, Y} that contains Y, then mark the triple as a collider: X Y Z.

(2) If X and Z are not independent conditional on any subset of V\{X, Y} that does not contain Y, then mark the triple as a non-collider (i.e., not X Y Z).

(3) Otherwise, mark the triple as ambiguous or unfaithful.

Page 10: Weakening the Causal Faithfulness Assumption

10

Conservative SGS• Replace S3 with S3*, and we get what we call the Conservative

SGS (CSGS) algorithm.

• The CSGS algorithm is correct under the causal Markov and Adjacency-Faithfulness assumptions.

• When Orientation-Faithfulness happens to hold, the output of CSGS is the same as that of SGS.

Page 11: Weakening the Causal Faithfulness Assumption

11

E-pattern• We call the (supposed) output of CSGS an extended pattern (e-pattern), which represents a set of patterns (each of which represents a Markov equivalence class of

DAGs).

X

Y

Z U

W

X

Y

Z U

W

X

Y

Z U

W

X

Y

Z U

W

Page 12: Weakening the Causal Faithfulness Assumption

12

Violations of Adjacency-Faithfulness• Some violations of Adjacency-Faithfulness are also detectable.

• Compare to an undetectable violation:

X

Y

ZExtra: X Z.

X Y

Z

Extra: X Z; Y Z.

X

Y

Z

W

Extra: X Z.

Page 13: Weakening the Causal Faithfulness Assumption

13

Triangle-Faithfulness

Triangle-Faithfulness: For every triangle <X, Y, Z> (i.e., they are adjacent to one another) in the true causal DAG,

(1) If Y is a non-collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that does not contain Y.

(2) If Y is a collider on the path <X, Y, Z>, then X and Z are not independent conditional on any subset of V\{X,Y} that contains Y.

• Triangle-Faithfulness is weaker than Adjacency-Faithfulness.

X

Y

Z X

Y

Z

X

Y

Z

Page 14: Weakening the Causal Faithfulness Assumption

14

Further Weakening of Faithfulness• Another weak condition entailed by the Adjacency-Faithfulness

assumption is known as the causal Minimality condition: no proper subgraph of the true causal DAG satisfies the Markov condition with the joint distribution.

• Theorem: Given the causal Markov, Minimality and Triangle-Faithfulness assumptions, any violation of the Faithfulness assumption is detectable.

• What if we only make the Markov, Minimality and Triangle-Faithfulness assumptions?

Page 15: Weakening the Causal Faithfulness Assumption

15

CSGS under the Weaker Assumptions• Given the Markov assumption, in the adjacency step S2, the inferred adjacencies

are still correct.

• The inferred non-adjacencies, however, are not necessarily correct, since Adjacency-Faithfulness is not assumed. (Mark the non-adjacencies as ‘apparent’).

• Given the Markov and Triangle-Faithfulness assumptions, the orientation step S3* is still correct!

(For an ‘apparently’ unshielded triple <X, Y, Z>, either it is really unshielded or it is a triangle. In the former case, S3* is correct by the Markov assumption; in the latter case, S3* is correct by the Triangle-Faithfulness assumption.)

Page 16: Weakening the Causal Faithfulness Assumption

16

Testing Adjacency-Faithfulness?• Therefore, given only the Markov and Triangle-Faithfulness assumptions,

CSGS is still correct, provided that we take the non-adjacencies in the output as uninformative.

• Can we somehow test Adjacency-Faithfulness and confirm non-adjacencies if the test returns affirmative?

• What we have for now: take the output of CSGS and check the Markov condition for each pattern represented by the output. If every pattern satisfies the Markov condition, then the non-adjacencies are correct (assuming Minimality in addition to Markov and Triangle-Faithfulness).

Page 17: Weakening the Causal Faithfulness Assumption

17

Conjecture• The condition should be improvable. In particular, it is sufficient but not necessary

for Adjacency-Faithfulness.

• A necessary condition for Adjacency-Faithfulness is: some pattern represented by the CSGS output satisfies the Markov condition.

• Conjecture: The necessary condition is also sufficient.

That is, assuming Markov, Minimality, and Triangle-Faithfulness, Adjacency-Faithfulness holds iff some pattern represented by the CSGS output satisfies the Markov condition.

Page 18: Weakening the Causal Faithfulness Assumption

18

Still Further Weakening• Let G and H be DAGs over V. H is an I-structure of G if every conditional

independence entailed by G is also entailed by H. H is a proper I-structure of G if H is an I-structure of G but G is not an I-structure of H.

P-minimality assumption: No proper I-structure of the true causal DAG satisfies the Markov condition with the joint distribution.

• The causal Faithfulness assumption is equivalent to a conjunction of (1) the P-minimality assumption and (2) that the joint distribution is faithful to some DAG.

Page 19: Weakening the Causal Faithfulness Assumption

19

Still Further Weakening (con’t)• The causal Faithfulness assumption is often regarded as a methodological

assumption of simplicity; that is only part of its content, namely, the P-minimality assumption.

• Violations of the P-minimality assumption are not detectable; Given the P-minimality assumption, violations of (the rest of) the Faithfulness assumption are detectable.

• The causal (SGS-)minimality assumption plus the Triangle-Faithfulness assumption entail the P-minimality assumption.

• Conversely, the P-minimality assumption entails the causal (SGS-)minimality assumption, but does not entail Triangle-Faithfulness.

Page 20: Weakening the Causal Faithfulness Assumption

20

Example

• Triangle-Faithfulness is violated, but P-minimality is not.

• Assuming Markov and P-minimality, the violation of Triangle-Faithfulness is detectable.

ZX

Y

W

Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}.

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

ZX

Y

W

Page 21: Weakening the Causal Faithfulness Assumption

21

Example (con’t)

• I suspect that VCSGS (i.e., CSGS in which non-adjacencies are regarded as ambiguous, unless a check of Markov condition in the end confirms them) is also correct under the causal Markov and P-minimality assumptions.

ZX

Y

W

Entailed: Y W | {X, Z}; Extra: X Z | {Y, W}.

ZX

Y

W

Output of CSGS:

Page 22: Weakening the Causal Faithfulness Assumption

22

Further Questions• Are there feasible versions (or approximations)?

• How about causal inference without causal sufficiency?

Page 23: Weakening the Causal Faithfulness Assumption

23

PC and CPC

• The PC algorithm is a much more efficient version of SGS.

• The key efficiency-improving ideas are also applicable to CSGS (when Adjacency-Faithfulness is assumed to hold). The resulting algorithm was called Conservative PC (CPC).

• Joe Ramsey did simulations and found that even when the Faithfulness assumption is true, (1) CPC produces significantly fewer errors than PC at moderate sample sizes; (2) outputs about as much correct information as PC does; and (3) runs almost as fast.

Page 24: Weakening the Causal Faithfulness Assumption

24

Almost Unfaithfulness

• The reason, we think, is that CPC not only guards against strict failure of orientation-faithfulness, but also guards against almost violations.

• Intuitively, CPC suspends judgments when it detects “almost unfaithfulness” at a given sample size, just as it suspends judgments when it detects unfaithfulness in the large sample limit.

Page 25: Weakening the Causal Faithfulness Assumption

25

Uniform Consistency

• A negative result due to Robins et al. (2003) is that causal inference can only be pointwise consistent but not uniformly consistent under the Causal Markov and Faithfulness assumptions.

• The basis of their proof is related to almost unfaithfulness.

Page 26: Weakening the Causal Faithfulness Assumption

26

Uniform Consistency of Inferring Causal Direction

• Suppose that we have the right adjacencies, and use procedures like PC to infer causal directions.

• Robins et al.’s results do not apply here.

• But we can still show that the PC procedure is not uniformly consistent in the inference of causal direction given the right adjacencies.

Page 27: Weakening the Causal Faithfulness Assumption

27

Uniform Consistency of Inferring Causal Direction (con’t)

• Our argument is based on a theorem that no procedure can be uniformly consistent in, for example, deciding between an unshielded collider (X Y Z) and an unshielded non-collider without sometimes suspending judgments.

• This argument does not apply to CPC, and we can show that CPC can be made uniformly consistent in its inference of causal directions (given the right adjacencies).

Page 28: Weakening the Causal Faithfulness Assumption

28

References

P. Spirtes and J. Zhang (forthcoming) “A uniformly consistent estimator of causal effects under the k-triangle-faithfulness assumption”, Statistical Science.

J. Zhang (2013) “A comparison of three Occam’s razors for Markovian causal models”, British Journal for the Philosophy of Science, 64(2): 423-448.

J. Zhang (2008) “Error probabilities for the inference of causal direction”, Synthese 163: 409-418.

J. Zhang and P. Spirtes (2008) “Detection of unfaithfulness and robust causal inference”, Minds and Machines 18(2): 239-271.

J. Ramsey, P. Spirtes, and J. Zhang (2006) “Adjacency-faithfulness and conservative causal inference”, UAI proceedings: 401-408.