Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach
description
Transcript of Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach
![Page 1: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/1.jpg)
Mining Quantitative Correlated Patterns Using an Information-
Theoretic Approach
Yiping Ke, James Cheng, Wilfred Ng
Presented By:
Chibuike Muoh
![Page 2: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/2.jpg)
Presentation Outline:
• Contributions of the paperIntroduction
• What are QCPs?DefinitionsBackground Information Theory (entropy, MI, NMI)
• Mining QCPsAll-confidenceDiscretization problem (interval combining)Attribute-level pruningInterval-level pruningQCoMine algorithm
![Page 3: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/3.jpg)
Contributions of the paper
• Presents a new algorithm for mining patterns on databases based on theory borrowed from information theory: entropy & mutual information
• Achieves discretization of attribute domain using supervised interval combining to preserve dependency between attributes
![Page 4: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/4.jpg)
Introduction
• Similar to association rule mining in principle but evaluating for association rules can be too expensive on VLDBs
• Trivial result-set {pregnant} {edema} & {pregnant, female} {edema}
• Unproductive rules as a result of co-occurrence effects {pregnant, dataminer} {edema}.
– So occupation and edema condition are related?
• Unlike association mining, mining for QCP consider the dependency of the attribute sets of the database to generate highly correlated patterns– Similar to generating “maximal informative k-itemsets”, but here
we consider dependency in the attribute sets
![Page 5: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/5.jpg)
Introduction…contd.
• The idea behind mining QCP– Evaluate the attribute set and look for ‘strong’
dependencies between attributes– Next find correlated interval sets in the
dependent attributes and generate patterns from them
• Thus, QCPs are not restricted by frequently co-occurring attributes
![Page 6: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/6.jpg)
Definitions: Quantitative Database
• A pattern X, is a set of attributes or random variable = {x1, x2, x3, …, xm} whose outcomes can be numerical or quantitative and have possibilities p(vx) = {p1, p2, p3, …, pm}– These attributes can be either categorical in which case
domain of xi, dom(x) is in the interval {lx, ux} where lx = ux
– And it is quantitative if where xi[lx, ux] is the interval of xi,
lx <= ux
– A pattern X is called a k-pattern if |attr(X)| = k• Consider a quantitative database, D, as a set of
transactions, T. and transaction in D are a vector of items <v1, v2, v3, …, vm> where vi E dom(xi) for 1 <= I <= m.
![Page 7: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/7.jpg)
Definition…contd.
• So we say a transaction supports a pattern X if every attribute in X is represented in T– The frequency of a pattern X in D, freq(X), is
the number of transactions in D that supports X
– The support of X, supp(X) = freq(X)/|D| which is the probability a transaction T in D supports X
![Page 8: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/8.jpg)
Example
• The database table above consists of (6) attributes of which (3) are quantitative {age, salary, service years} and two are categorical {gender, married}
• The last column records the support of each transaction• E.g. For pattern X = age[4, 5]gender[1,1], supp(X) =
0.25+0.19 = 0.44
![Page 9: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/9.jpg)
Background: Information Theory
• Mining QCPs makes use of fundamental concepts in information theory
• Entropy: measures the information content/uncertainty of a random variable, x
![Page 10: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/10.jpg)
Background: Information Theory…contd.
• Mutual information (MI): measures the average reduction in uncertainty about a random variable X, given the knowledge of Y (or vice versa)
– MI is a symmetric measure, so the greater the value of I(x; y), the more information x and y tell about each other.
![Page 11: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/11.jpg)
The example above shows that age causes a reduction of 0.47 in the uncertainty of married
Similarly as an exercise, we can compute I(gender;education) =
Example
• Consider the pattern X = (age;married) from Table 1, we can compute I(age,married) =
}5,4,3,2,1{ }2,1{
47.0)()(
),(log),(
age marriedv vmarriedage
marriedagemarriedage
pp
vvpvvp
0.40
![Page 12: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/12.jpg)
Normalized Mutual Information
• But by how much does X actually tell us about Y?
• Entropy of different attributes vary greatly, so MI only returns us an absolute value, which would not be so helpful in our case
• We can try normalizing the MI among our set of attributes to get a global relative measure
![Page 13: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/13.jpg)
NMI…contd.
• Normalizing the MI measure among the attribute sets returns us the minimum percentage of reduction in the uncertainty of one attribute given the knowledge of another
where
![Page 14: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/14.jpg)
Example 2• From the previous
example we can compute
• Also we can determine
Note that although I(age;married) > I(gender;education) its NMI is less this can be attributed to the high entropy value of H(age) = 2.19 > H(education) = 1.34
This implies that a much larger absolute value of uncertainty can be reduce by knowing age than a relative amount.
![Page 15: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/15.jpg)
Definition: Quantitative Pattern
• A more formal definition of quantitative pattern X follows below:
• Thus given a minimum threshold (μ) and minimum all-confidence threshold (ς ), a quantitative pattern has strong co-dependency between attributes and high confidence level in the dataset
![Page 16: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/16.jpg)
allconf(X)
• All confidence is a correlation measure for determining the minimum confidence of association rules that can be derived from a given pattern.
• For a quantitative pattern, allconf(X) is defined as:
• This is different from association rule mining where conf(XY) only indicates an implication of sets on left to sets on right
![Page 17: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/17.jpg)
allconf(X)…contd.
• All confidence has the downward closure property thus a pattern has all-confidence no less than ς, so do all its sub-patterns
)(]),[sup()sup( xdomxulxX xx
![Page 18: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/18.jpg)
Example
• allconf(X) = gender[1,1]education[1,1]
53.0
}09.009.011.019.0,08.009.009.009.011.019.025.0{
09.009.011.019.0
])}1,1[sup(]),1,1[{sup(
])1,1[]1,1[sup(
MAX
educationgenderMAX
educationgender
Similarly allconf(gender[1,1]married[1,1]) = 0.9
![Page 19: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/19.jpg)
allconf(X)
• A caveat about allconf is that since it is applied at fine granularity to intervals of attributes it can’t solely be used as a measure for co-related patterns.– Quantitative attributes can span huge intervals creating a co-
occurrence problem
• The above, points explain the need to first perform pruning at attribute level
Example
For the given employee database in the previous example, we set μ= 0.2 and ς = 0.5. The pattern Y = gender[1,1]married[1,1] is not a QCP because
Ί(gender,married)= 0 < μ although allconf(Y) = 0.9
this is because, gender & married are independent of each other, but then p(gender[1,1]) and p(married[1,1]) are very high
![Page 20: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/20.jpg)
QCP Mining
• Problem description: – Given a quantitative database, D, a minimum
information threshold μ, and a minimum all-confidence threshold, ς, the mining problem is to find all QCPs from D
![Page 21: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/21.jpg)
QCP Mining: Process Outline
Quantitative Database
Attributepruning
Interval pruning
IntervalCombining/Discretization
QCoMine Algorithm
- Attribute pruning finds dependent attribute sets
- Interval pruning generates correlated patterns
![Page 22: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/22.jpg)
Interval Combining
• When dealing with quantitative data, continuous attributes we need to discretize the intervals of the attribute.
• Challenges– Preventing the intervals from being to trivial
• Eg: age[0,2] vs age[0,0], age[1,1], age[2,2]
– Considering the dependency of the attributes when combining their intervals
• Example: the pattern (age,gender) can produce a different interval than (age,married)
![Page 23: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/23.jpg)
Interval combining…contd.• Interval combing for quantitative patterns can be
considered an optimization problem, for an objective function Φ :
• Goal for this stage is:– Given two attributes x and y, where x is quantitative
and y can be either quantitative or categorical we want to obtain the optimal combined intervals of x with respect to y.
• Note that since this optimization is performed locally (btw. pairs of attribute) we use MI instead of NMI
![Page 24: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/24.jpg)
Interval combining: Algorithm.
The idea is to pick up at each time the maximum Φ [ix[j],ix[j+1]](x,y) among all pairs of consecutive intervals ix[j] and ix[j+1], and combine corresponding ix[j] and ix[j+1] into xj’
•Let Φ[ix1,ix2](x,y) denote the value of Φ(x,y) when ix1 and ix2 are combined with respect to y
•At each time, two consecutive intervals, ix1 & ix2 are considered for combination.
To prevent the intervals from being to trivial a termination condition is set as minimum value for the interval specified
![Page 25: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/25.jpg)
Attribute level pruning
• At this stage pruning at the attribute level is performed such that the attributes in a pattern have NMI of at least μ
The above definition considers attribute patterns as vertices in a graph, and cliques in the graph represent QCPs
![Page 26: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/26.jpg)
Attribute Level pruning…contd.• From the previous definition, QCP’s are cliques in the
NMI-graph having NMI >= μ– Without pruning at the attribute level i.e. u=0 the search space
for cliques in the graph becomes more complex– And enumerating for cliques in a graph can be an exhaustive
process
• Authors of the paper introduce a prefix tree structure for prefixing correlated attributes attribute prefix tree, Tattr
• Clique enumeration in the NMI-graph is done using a the prefix tree– The only extra action required when enumerating cliques using
the prefix tree is to check if (u,v) is an edge in the G
![Page 27: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/27.jpg)
Prefix tree construction• To create the prefix tree
1. First a root node is created at level 0 of Tattr 2. Then at level 1 we create a node for each
attribute I as a child of the root3. For each node u at level k (k >= 1) and for each
right sibling v of u, if (u,v) is an edge in G, we create a child node for u with the same attribute label as that of v
4. Repeat step 3 until for u’s children at level k+1
Step 3 of the prefix tree construction creates the prefix tree in a depth-first manner
![Page 28: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/28.jpg)
Interval-level pruning
• Even though the cliques found using the NMI-graph have high NMI they differ on the intervals of their continuous attributes– Since intervals are combined in a supervised way, the same
attribute may have difference set of combined intervals with respect to different attributes
– Thus patterns with low all-confidence may still be generated from correlated attributes
• The Interval-level pruning process uses all-confidence to ensure that only high confidence patterns are generated from a pattern X and all its super-patterns– Follows from its downward closure property
![Page 29: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/29.jpg)
Interval-level pruning…contd.
• Note that an easy way to perform pruning at the interval level for a k+1 pattern, is to compute the intersection of the prefixing (k-1) intervals of the two k-patterns– Example
Given age[30,40]married[1,1] and age[25,35]salary[2000,3000] intersect the intervals of age to obtain the new pattern age[30,35]married[1,1]salary[2000,3000]
• However producing a new (k+1) pattern using intersection violates the downward closure property of all-confidence– Shrinking the intervals in the (k+1)-pattern may cause a great
decrease in the support value of a single item so its all-confidence may be higher than its composite k-patterns
![Page 30: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/30.jpg)
Interval-level pruning…contd.
• We can avoid intersection in the interval pruning by enumerating all sub-intervals of a combined interval Sx and Sy of the attribute set {x,y} at level-2 of Tattr and prune at that level before generating a pattern
• We need to consider all pairs of sub-intervals of x and y as each of them represents a pattern– Thus for each interval set {i’x, i’y}, where
– We create a QCP X if x[i’x]y[i’x] if allconf(X) >= ς
• This process of evaluating all possible sub-interval combinations at 2-patterns ensures down closure on all k-patterns generated from it
yyxxyyxx SiandSiiiii ,','
![Page 31: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/31.jpg)
QCoMine AlgorithmFirst combine the base intervals of each quantitative attribute with respect to another attribute
Step 2-4 constructs the NMI graph G and uses it to guide the construction of the attribute prefix tree Tattr to perform attribute pruning
Steps 5-13 construct level-2 of Tattr and also perform interval pruning (steps 10-13) which produces all 2-pattern QCPs
Twinterval is an interval-prefix tree,
that keep the interval sets of all patterns generated by a node u in Tattr it is used as a memoization variable for speedup and space saving
Steps 14-15 invoke RecurMine on the child nodes of u in G to generate all k-QCPs for k > 2
![Page 32: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/32.jpg)
QCoMine Algorithm…contd.The steps in the RecurMine algorithm continue to build the prefix tree Tattr from k>2
Interval pruning is aided by using the interval-prefix tree to speed up joins of two k-patterns.
At step 6 of the algorithm when two k-patterns are combined, it is ensured that all their prefixing (k-1)-intervals are the same in both patterns to prevent performing interval combining
![Page 33: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/33.jpg)
Performance of QCoMine• Performance test of the QCoMine algorithm were
performed to test the efficiency of its three major components
1. Supervised interval combining2. Attribute-level pruning by NMI3. Interval-level pruning by all confidence
• Three-variants of the algorithm were createda. QCoMine, which performs all operations as described originally in
the paperb. QCoMine-0, a control variant of the original algorithm which
performs the interval combining process but sets μ=0c. QCoMine-1, is another control variant that does not perform
interval combining process but utilizes μ as described originally in the paper
• The tests were performed with all-confidence from ς = 60% to 100%
![Page 34: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/34.jpg)
Performance of QCoMine…contd.
When interval combining is not applied, results on the dataset can only be obtain when ς = 100%. In all other cases the algorithm will run out of memory.
This is because QCoMine-1 is inefficient since it allows the interval of an item to become too trivial so patters would easily gain all-confidence > ς simply by co-occurrence.
![Page 35: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/35.jpg)
Performance of QCoMine…contd.
The running time for both QCoMine and QCoMine-0 increases only slightly for smaller ς this is because the majority of the time is spent on computing the 2-patterns.No matter the value of ς we need to test every 2-pattern to determine if it’s a QCP, before we can employ downward property of all-confidence to prune.
![Page 36: Mining Quantitative Correlated Patterns Using an Information-Theoretic Approach](https://reader035.fdocuments.in/reader035/viewer/2022070405/56813f59550346895daa2662/html5/thumbnails/36.jpg)
References
1. Mining quantitative correlated patterns using an information-theoretic approach, Y Ke, J Cheng, W Ng - Proceedings of the 12th ACM SIGKDD international conference 2006
2. Discovering significant rules, GI Webb - Proceedings of the 12th ACM SIGKDD international conference 2006
3. Maximally informative k-itemsets and their efficient discovery, AJ Knobbe, EKY Ho - Proceedings of the 12th ACM SIGKDD international conference 2006