Download - 1 Outline Criticism to support/confidence Loglinear modeling Casual modeling.

1

Outline

• Criticism to support/confidence• Loglinear modeling • Casual modeling

Background — Interaction analysis

• Association rule, Creighton & Hanash 03 Associations instead of interaction Undirected Need to discretize data

• Loglinear Modeling, Wu et al. 03 multi-way non-linear interactions Undirected Need to discretize data

• Graphical gaussian model, Kishino & Waddell 00 Pairwise interactions Undirected Efficient

• Causal Network Pairwise Directed High complexity

3

Background on Association Rule

• An association rule X Y satisfies with minimum confidence and support

support, s = P(XUY), probability that a transaction contains {X U Y}

confidence, c = P(Y|X), conditional probability that a transaction having X also contains Y

• Efficient algorithms Apriori by Agrawal & Srikant, VLDB94 FP-tree by Han, Pei & Yin, SIGMOD 2000 etc.

Customerbuys Y

Customerbuys both

Customerbuys X

4

Criticism to Support and Confidence

• Example 1: (Aggarwal & Yu, PODS98) Among 5000 students

3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal

play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%.

play basketball not eat cereal [20%, 33.3%] is far more accurate, although with lower support and confidence

basketball not basketball sum(row)cereal 2000 1750 3750not cereal 1000 250 1250sum(col.) 3000 2000 5000

5

• We need a measure of dependent or correlated events

• P(Y|X)/P(Y) is also called the lift of rule X => Y

Criticism to Support and Confidence

)(

)|(

)()(

)(, YP

XYP

YPXP

YXPcorr YX

6

Criticism to lift

• Suppose a triple ABC is unusually frequent because Case 1: AB and/or AC and/or BC are unusually frequent Case 2: there is something special about the triple that all three

occur frequently.

• Example 2: (DuMouchel & Pregibon, KDD 01) Suppose in a db of patient adverse drug reactions, A and B are

two drugs, and C is the occurrence of kidney failure Case 1: A and B may act independently upon the kidney, many

occurrences of ABC is because A and B are sometimes prescribed together

Case 2: A and B may have no effect on the kidney if taken alone, but when taken together a drug interaction occurs that often leads to kidney failure

Case 3: A and B may have small effect on the kidney if taken alone, but when taken together, there is a strong effect.

7

Criticism to lift• EXCESS2

FAlleeEXCESS 22

Predicted count of all-two-factor model based on two-way distributions

Shrinkage estimates, (or we can use raw count)

an estimate of the number of transactions containing the item set over and above those that can be explained by the pairwise associations of the items

8

Motivation• EXCESS2

can separate case 2 and 3 from case 1, but can not separate between case 2 and 3.

need to build many all-two-factor models. For itemset ABCDE, they need to build 15 all-two-factor models, one for each multi-item set (ABC, ABD, …ABCD,… ABCDE)

can not fully analyze the interestingness of multi-item associations E.g., even we know the EXCESS2 for ABCD is large, is it due to ABC, ABD, or

ABCD?

• Fit to get one optimal loglinear model to describe all the possible associations instead of building many all-two-factor models

The -terms can precisely describe the interactions of items By analyzing residues, we can pick up the multi-item associations that can

not be explained by all the associations included in the fitted model.

9

Saturated log-linear model

ABCDijkl

BCDjkl

ACDikl

ABDijl

ABCijk

CDkl

BDjl

BCjk

ADil

ACik

ABij

Dl

Ck

Bj

Aiijkly

ˆlog

main effect 1-factor effect

2-factor effect which shows the dependency within the distributions of A,B.

10

Computing -term

0

...

0...

0

....

....

....

ABCDijk

ABCDkli

ABCDlij

ABCDijk

CDl

ACi

ABj

ABi

DCBA

• Linear constraints of coefficients

• UpDown method (Sarawagi et al, EDBT98)

Loglinear parameters sum to 0 over all indices

Ck

Bj

Ai

BCjk

ACik

ABijijk

ABCijk

Bj

Aiij

ABij

iAi

l

l

l

l

.

..

...

...

....

KDD’03 Washington, D.C. 11

Interpreting associations• Comparison with lift, EXCESS2

• Derive association patterns by examining -terms E.g. we can derive

positive interaction between AC, negative interaction between AC, no significant interaction between BC, and positive three-factor interaction among ABC

ABDABCCDBDBC

ACDCBAfitted

CDBDBCADAC

ABDCBApairwise

DCBAlift

y

y

y

ˆlog

ˆlog

ˆlog

Independence model

pairwise model

fitted model

223.0,765.0,681.0 ABCBCAC

12

Decomposition

• Decomposition is necessary as The contingency table from market basket data is too sparse The complexity is exponential in the number of dimensions

• Step 1.1, build one independence graph• Step 1.2, apply graph-theoretical results to decompose

the graph into non-decomposable irreducible components

13

Independence graph

F I

E H

G

B

C D

A

J

• Every vertex of the graph corresponds to an variable.

• Each edge denotes the dependency of the two variables linked

• A missing edge represents the conditional independence of two variables associated with the two edges

• Test conditional independence for every pair of variables, controlling for the other variables.

Cochran-Mantel-Hasenzel test etc.

14

Independence graph decomposition

• Graph-theoretical result:

If a graph corresponding to a graphical model for one contingency table is decomposable into subgraphs by a clique separator, the MLEs for the parameters of the model can easily be derived by combining the estimates of the models on the lower dimensional tables represented by the simpler subgraphs.

• Divide and conquer

F I

E H

G

B

C D

A

J

F I

E H

G

B

C D

A

J

C

A

G

A

15

Data generator• http://www.cs.loyola.edu/~cgiannel/assoc_gen.html

parameter value meaning

ntrans 10k-1M Number of transactions

nitems 50,100 Number of different items

tlen 10 Average items per transaction

npats 10000 Number of patterns(large item sets)

patlen 4 Average length of maximal pattern

corr 0.25 Correlation between patterns

conf 0.75 Average confidence in a rule

http://www.cs.loyola.edu/~cgiannel/assoc_gen.html

Other measures

2 x 2 contingency table

Objective measures for A=>B

17

Outline

• Criticism to support/confidence• Loglinear modeling • Casual modeling

Partial correlation

• Partial correlation The correlation between two variables after the common effects of the

third variables are removed

)1)(1( 22.

yzxz

yzxzxyzxy

rr

rrrpr

Causal Interaction Learning• Bayesian approaches (search and score), Friedman et al. 00

Apply heuristic searching methods to construct a model and then evaluate it using some scoring measure (e.g., bayesian scoring, entropy, MDL etc.)

Averaging over the space of structures is computationally intractable as the number of DAGs is super-exponential in the number of genes

Sensitive to the choice of local model

• Constraint-based conditional independence approaches, PC by Spirtes et al. 93

Instead of searching the space of models, it starts from the complete undirected graph, then thins this graph by removing edges with zero order conditional independence relations, thins again with first order conditional independence relations and so on so force.

Slow when dealing with large amount of variables

PC Algorithm

D-Separation

• X is d-separated from Y, given Z, if all paths from a node in X to a node in Y are blocked, given Z.

let p be any path between a vertex in X and a vertex in Y, Z is said to block p if there is a vertex w on p satisfying one of the following:

w has converging arrows along p, and neither w nor any of its descendants are on Z or

w does not have converging arrows along p, and w is in Z.

• Equally we can say that X and Y are independent conditional on Z.

Path BlockageA path is active, given evidence Z, if

• Whenever we have the configuration

•B or one of its descendents are in Z

• No other nodes in the path are in Z

A path is blocked, given evidence Z, if it is not active.

•A •C

•B