Building Global Models from Local Patterns A.J. Knobbe.
-
Upload
olivia-wilkins -
Category
Documents
-
view
219 -
download
1
Transcript of Building Global Models from Local Patterns A.J. Knobbe.
Two-phased process
Break discovery up into two phases Transform complex problem into more simple one
frequent patterns correlated patterns interesting subgroups decision boundaries …
frequent patterns correlated patterns interesting subgroups decision boundaries …
redundancy reduction dependency modeling global model building …
redundancy reduction dependency modeling global model building …
Pattern Discovery phase
Pattern Combinationphase
Pattern Teams pattern networks global predictive
models …
Task: Subgroup Discovery
Subgroup Discovery:
Find subgroups that show substantially different distribution of target concept.
top-down search for patterns inductive constraints (sometimes monotonic) evaluation measures: novelty, X2, information gain also known as rule discovery, correlated pattern mining
Novelty
Also known as weighted relative accuracy Balance between coverage and unexpectedness nov(S,T) = p(ST) – p(S)p(T) between −.25 and .25, 0 means uninteresting
T F
T .42 .13 .55
F .12 .33
.54 1.0
nov(ST) = p(ST)−p(S)p(T)= .42 − .297 = .123
subgroup
target
Demo Subgroup Discovery
0
50
100
150
200
250
300
350
400
450
500
1 335 669 1003 1337 1671 2005 2339 2673 3007 3341 3675 4009 4343 4677
Pattern Combination phase
Feature selection, redundancy reduction– Pattern Teams
Dependency modeling– Bayesian networks– Association rules
Global modeling– Classifiers, regression models
Pattern Teams
Pattern Discovery typically produces very many patterns with high levels of redundancy
Report small informative subset with specific properties
Promote dissimilarity of patterns reported Additional value of individual patterns Consider extent of patterns
– Treat patterns as binary features/items
Intuitions
No two patterns should cover same set of examples No pattern should cover complement of another pattern No pattern should cover logical combination of two or
more other patterns Patterns should be mutually exclusive The pattern set should lead to the best performing
classifier Patterns should lie on convex hull in ROC-space
Quality measures for pattern sets
Judge pattern sets on the basis quality function
Joint Entropy (miki) Exclusive Coverage Wrapper accuracy Area Under Curve in ROC-space Bayesian Dirichlet equivalent uniform
unsupervised
supervised
Pattern Teams
-1
0
1
2
3
4
5
6
7
8
9
-4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0
-1
0
1
2
3
4
5
6
7
8
9
-4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0
82 subgroups discovered 4 subgroups in pattern team
Pattern Network
Again, treat patterns as binary features Bayesian networks
– conditional independence of patterns
Explain relationships between patterns Explain role of patterns in Pattern Team
Demo Pattern Team & Network
redundancy removed to find truly divers patterns,in this case using maximization of joint entropy
Demo Pattern Team & Network
peak around 89k
peak around 16k
peak around 39k
pattern team, and related patterns can be presented in a bayesian network
Properties of SD phase in PC
What knowledge about Subgroup Discovery parameters can be exploited in Combination?
Interestingness– Are interesting subgroups diverse?– Are interesting subgroups correlated?
Information content Support of patterns
joint entropy of 2 interesting subgroups
0
0.5
1
1.5
2
2.5
0 0.05 0.1 0.15 0.2 0.25
subgroups are very novel, 1 bit of information
subgroups are relatively novel, up to 2 bits of information
correlation of interesting subgroups
novelty
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0 0.05 0.1 0.15 0.2 0.25
novelty of subgroups
inte
r n
ov
elt
y
subgroups are novel, but potentially independent
subgroups are very novel, and correlate
Combination strategies
How to interpret a pattern set? Conjunctive (intersection of patterns) Disjunctive (union of patterns) Majority vote (equal weight linear separator) … Contingencies/Classifiers
Decision Table Majority (DTM)
Treat every truth-assignment as contingency Classification based on conditional probability Use majority class for empty contingencies Only works with Pattern Team (else overfitting)
Support Vector Machine (SVM)
SVM with linear kernel Binary data All dimensions have same scale Works with large pattern sets Subgroup discovery has removed XOR-like
dependencies Interesting subgroups correlate
Division of labour between 2 phases
Subgroup Discovery Phase– Feature selection– Decision boundary finding/thresholding– Multivariate dependencies (XOR)
Pattern Combination Phase– Pattern selection– Combination (XOR?)– Class assignment
Combination-aware Subgroup Discovery
Better global model Superficially uninteresting patterns can be
reported pruning of search space (new rule-measures)
subgroups are not novel, team is optimal
Combination-aware Subgroup Discovery
Subgroup Discovery ++:
Find a set of subgroups that show substantially different distribution of target concept.
Considerations – support of pattern– diversity of pattern– …