From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia.

37
From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia

Transcript of From Association Analysis to Causal Discovery Prof Jiuyong Li University of South Australia.

From Association Analysis to Causal Discovery

Prof Jiuyong Li

University of South Australia

Association analysis• Diapers -> Beer• Bread & Butter -> Milk

Positive correlation of birth rate to stork population

• increasing the stork population would increase the birth rate?

Further evidence for Causality ≠ AssociationsSimpson paradox

Recovered Not recovered Sum Recover rate

Drug 20 20 40 50%

No Drug 16 24 40 40%

36 44 80

Female Recovered Not recovered Sum Recover rate

Drug 2 8 10 20%

No Drug 9 21 30 30%

11 29 40

Male Recovered Not recovered Sum Recover rate

Drug 18 12 30 60%

No Drug 7 3 10 70%

25 15 40

Association and Causal Relationship

• Two variables X and Y.• Prob(Y | X) ≠ P(Y), X is associated with Y (association

rules)• Prob(Y | do X) ≠ Prob(Y | X)• How does Y vary when X changes?

• The key, How to estimate Prob(Y | do X)? • In association analysis, the relationship of X and Y

is analysed in isolation. • However, the relationship between X and Y is

affected by other variables.

5

Causal discovery 2• Bayesian network based

causal inference – Do-calculus (Pearl 2000)– IDA (Maathuis et al. 2009) – To infer causal effects in

a Bayesian network. – However– Constructing a Bayesian

network is NP hard– Low scalability to large

number of variables

Leaning causal structures• PC algorithm (Spirtes,

Glymour and Scheines)– Not (A ╨ B | Z), there is an

edge between A and B.– The search space

exponentially increases with the number of variables.

• Constraint based search– CCC (G. F. Cooper, 1997)– CCU (C. Silverstein et. al.

2000)– Efficiently removing non-

causal relationships.

A C

B

ABC

CCU

A C

B

ABC, ABC, CAB

CCC

Association rules

• Many efficient algorithms

• Hundreds of thousands to millions of rules.– Many are spurious.

• Interpretability– Association rules do

not indicate causal effects.

Causal rules• Discover causal relationships using partial association

and simulated cohort study. • Do not rely on Bayesian network structure learning. The

discovery of causal rules also have strong theoretical support.

• Discover both single cause and combined causes.• Can be discovered efficiently.

• Z. Jin, J. Li, L. Liu, T. D. Le, B. Sun, and R. Wang, Discovery of causal rules using partial association. ICDM, 2012

• J. Li, T. D. Le, L. Liu, J. Liu, Z. Jin, and B. Sun. Mining causal association rules. In Proceedings of ICDM Workshop on Causal Discovery (CD), 2013.

Problem

A B C D E F Y #repeats

1 1 1 1 1 1 1 14

1 0 1 1 1 1 1 8

1 1 0 1 0 1 1 15

0 1 1 1 1 1 1 8

0 1 0 0 0 0 0 5

0 0 0 0 1 0 1 6

1 0 0 0 0 1 0 4

1 0 1 1 1 0 0 3

0 1 0 1 1 0 0 3

0 1 0 0 1 0 0 5

Discover causal rules from large databases of binary variables

A YC YBF YDE Y

Partial association test

I J

K

I JK

I J

K

M. W. Birch, 1964.

Nonzero partial association

Partial association test – an example

4. Partial association test.A B C D E F Y G #repeat

1 1 1 1 1 1 1 0 14

1 0 1 1 1 1 1 0 8

1 1 0 1 0 1 1 0 15

0 1 1 1 1 1 1 0 8

0 1 0 0 0 0 0 0 5

0 0 0 0 1 0 1 0 6

1 0 0 0 0 1 0 0 4

1 0 1 1 1 0 0 0 3

1 1 1 1 0 1 1 1 3

0 1 0 0 1 0 0 0 5

k kk

kkkk

k k

kkkk

nnnnnn

nnnnn

KYXPA

)1(

)21

|(|

),,(

2001.1

201100011

),,( ACDEYBFPA

68.125

8031401100011

k

kkkk

n

nnnn 6776.0)125(25

3112214

)1( 22001.1

kk

kkkk

nn

nnnn

Fast partial association test

• K denotes all possible variable combinations, the number is very large.

• Counting the frequencies of the combinations is also time consuming.

• Our solution: – Sort data and count frequencies of the

equivalence classes.– Only use the combinations existing in the data set.

Pruning strategies Definition (Redundant causal rules): Assume that X W, if X → Y is a causal rule, ⊂rule W → Y is redundant as it does not provide new information.

Definition (Condition for testing causal rules): We only test a combined causal rule XV → Y if X and Y have a zero association and V and Y have a zero association (cannot pass the qui-square test in step 3).

AlgorithmA B C D E F G Y #repeats

1 1 1 1 1 1 0 1 14

1 0 1 1 1 1 0 1 8

1 1 0 1 0 1 0 1 15

0 1 1 1 1 1 0 1 8

0 1 0 0 0 0 0 0 5

0 0 0 0 1 0 0 1 6

1 0 0 0 0 1 0 0 4

1 0 1 1 1 0 0 0 3

1 1 1 1 1 1 1 0 3

0 1 0 0 1 0 0 0 5

1. Prune the variable set (support)

2. Create the contingency table for each variable X

x

Y=1 Y=0 Total

X=1 n11 n12 n1.

X=0 n21 n22 n2.

Total n.1 n.2 n

3. Calculate the • If go to next step

2, YX

22, YX

22, YX

2,2

1,1

22

, )(

))((ji

ji ij

ijijYX nE

nEn

4. Partial association test.• If PA(X, Y, K) is nonzero

then XY is a causal rule.

5. Repeat 1-4 for each variable

which is the combination of variables in set N

• If move X to a set N

positive association

zero association

2),,( KYXPA

Experimental evaluations• We use the Arrhythmia data set in UCI machine learning

repository.

– We need to classify the presence and absence of cardiac arrhythmia. The data set contains 452 records and each record obtains 279 data attributes and one class attribute

• Our results are quite consistent with the results from CCC method.

• Some rules in CCC are removed by our method as they cannot pass the partial association test.

• Our method can discover the combined rules. CCC and CCU methods are not set to discover these rules.

Comparison with CCC and CCU

Experimental evaluations

Figure 1: Extraction Time Comparison (20K Records) Figure 1: Extraction Time Comparison (100K Records)

Summary 1

• Simpson paradox– Associations might be inconsistent in subsets

• Partial association test– Test the persistency of associations in all possible

partitions. – Statistically sound.– Efficiency in sparse data.

• What else?

Cohort study 1

Defined population

Expose Not expose

Not havea disease

Have a disease

Not have a disease

Have a disease

• Prospective: follow up.• Retrospective: look back. Historic study.

Cohort study 2

• Cohorts: share common characteristics but exposed or not exposed.

• Determine how the exposure causes an outcome.

• Measure: odds ratio = (a/b) / (c/d)Diseased Healthy

Exposed a bNot exposed c d

Limitations of cohort study• Need to know a hypothesis beforehand• Domain experts determine the control

variables.• Collect data and test the hypothesis. • Not for data exploration.

• We need– Given a data set without any hypotheses.– An automatic method to find and validate

hypotheses.– For data exploration.

Control variables

• If we do not control covariates (especially those correlated to the outcome), we could not determine the true cause.

• Too many control variables result too few matched cases in data.– How many people with the same race, gender, blood type,

hair colour, eye colour, education level, …. • Irrelevant variables should not be controlled.

– Eye colour may not relevant to the study.

Cause Outcome

Other factors

Matches• Exact matching

– Exact matches on all covariates. Infeasible.• Limited exact matching

– Exact matches on a few key covariates. • Nearest neighbour matching

– Find the closest neighbours• Propensity score matching

– Based on the predicted effect of a treatment of covariates.

Method1

A B C D E F Y

1 1 1 1 1 1 1

1 0 1 1 1 1 1

1 1 0 1 0 1 1

0 1 1 1 1 1 1

0 1 0 0 0 0 0

0 0 0 0 1 0 1

1 0 0 0 0 1 0

1 0 1 1 1 0 0

0 1 0 1 1 0 0

0 1 0 0 1 0 0

Discover causal association rules from large databases of binary variables

A YA B C D E F Y

1 1 1 1 1 1 1

1 0 1 0 1 1 1

1 1 0 1 0 1 0

1 0 1 0 1 0 0

0 1 1 1 1 1 0

0 0 1 0 1 1 0

0 1 0 1 0 1 1

0 0 1 0 1 0 1

Fair dataset

Methods

A B C D E F Y

1 1 1 1 1 1 1

1 0 1 0 1 1 1

1 1 0 1 0 1 0

1 0 1 0 1 0 0

0 1 1 1 1 1 0

0 0 1 0 1 1 0

0 1 0 1 0 1 1

0 0 1 0 1 0 1

Fair dataset• A: Exposure variable

• {B,C,D,E,F}: controlled variable set.

• Rows with the same color for the controlled variable set are called matched record pairs.

A=0

A=1 Y=1 Y=0

Y=1 n11 n12

Y=0 n21 n22

• An association rule is a causal association rule if: A Y

1)( YAOddsRatiofD

Algorithm

28

A B C D E F G Y

1 1 1 1 1 1 0 1

… … …

1 1 0 1 0 1 0 1

1. Remove irrelevant variables (support, local support, association)

2. Find the exclusive variables of the exposure variable (support, association), i.e. G, F.

The controlled variable set = {B, C, D, E}.

x

3. Find the fair dataset. Search for all matched record pairs

4. Calculate the odds-ratio to identify if the testing rule is causal

5. Repeat 2-4 for each variable which is the combination of variables. Only consider combination of non-causal factors.

For each association rule (e. g. ) A Y

A B C D E Y

1 1 1 1 1 1

… … …

0 1 1 1 1 0

… …

x

Experimental evaluations

Experimental evaluations

Figure 1: Extraction Time Comparison (20K Records)

CAR CCC CCU

Experimental evaluations

Causality – Judea Pearl

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000.32

X1 X2 … Xn-1 Xn

5.2 7.5 6.5 5.2

5.6 7.2 6.6 5.3

… … … … …

5.4 7.1 7.1 5.7

5.7 6.9 6.9 5.8

+1

+0.8

Methods

• IDA– Maathuis, H. M.,

Colombo, D., Kalisch, M., and Buhlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods, 7(4), 247–249.

33

Conclusions• Association analysis has been widely used in data

mining, but associations do not indicate causal relationships.

• Association rule mining can be adapted for causal relationship discovery by combining some statistical methods.

– Partial association test

– Cohort study

• They are efficient alternatives for causal Bayesian network based methods.

• They are capable of finding combined causal factors.

Discussions• Causality and classification

– Estimate prob (Y| do X) instead of prob (Y|X).

• Feature section versus controlled variable selection.

• Evaluation of causes.– Not classification accuracy– Bayesian networks??

Research Collaborators

• Jixue Liu• Lin Liu• Thuc Le• Jin Zhou• Bin-yu Sun

Thank you for listening

Questions please ??