Adaptive XML Tree Mining on Evolving Data Streams
-
Upload
albert-bifet -
Category
Technology
-
view
1.001 -
download
1
description
Transcript of Adaptive XML Tree Mining on Evolving Data Streams
![Page 1: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/1.jpg)
Adaptive XML Tree Mining on Evolving Data Streams
Albert Bifet
Laboratory for Relational Algorithmics, Complexity and Learning LARCADepartament de Llenguatges i Sistemes Informàtics
Universitat Politècnica de Catalunya
Porto, 21 May 2009
![Page 2: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/2.jpg)
Mining Evolving Massive Structured Data
The Disintegration of Persistenceof Memory 1952-54
Salvador Dalí
The basic problemFinding interesting structureon data
Mining massive data
Mining time varying data
Mining on real time
Mining XML data
2 / 30
![Page 3: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/3.jpg)
XML Tree Classification on evolving datastreams
D
D
B
C
A
C
D
B
C
B
D
B
C C
B
D
B
C
A
B
CLASS1 CLASS2 CLASS1 CLASS2
Figure: A dataset example
3 / 30
![Page 4: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/4.jpg)
Tree Pattern Mining
Trees are sanctuaries.Whoever knows how
to listen to them,can learn the truth.
Herman Hesse
Given a dataset of trees, find thecomplete set of frequent subtrees
Frequent Tree Pattern (FT):
Include all the trees whosesupport is no less than min_sup
Closed Frequent Tree Pattern(CT):
Include no tree which has asuper-tree with the samesupport
CT ⊆ FT
4 / 30
![Page 5: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/5.jpg)
Mining Closed Frequent Trees
Our trees are:
Labeled and Unlabeled
Ordered and Unordered
Our subtrees are:
Induced
Top-down
Two different ordered treesbut the same unordered tree
5 / 30
![Page 6: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/6.jpg)
A tale of two trees
Consider D = {A,B}, where
A:
B:
and let min_sup = 2.
Frequent subtreesBA
6 / 30
![Page 7: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/7.jpg)
A tale of two trees
Consider D = {A,B}, where
A:
B:
and let min_sup = 2.
Closed subtreesBA
6 / 30
![Page 8: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/8.jpg)
XML Tree Classification on evolving datastreams
D
D
B
C
A
C
D
B
C
B
D
B
C C
B
D
B
C
A
B
CLASS1 CLASS2 CLASS1 CLASS2
Figure: A dataset example
7 / 30
![Page 9: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/9.jpg)
XML Tree Classification on evolving datastreams
Tree Trans.Closed Freq. not Closed Trees 1 2 3 4
c1
D
B
C C
B
C C 1 0 1 0
c2
D
B
C
A
B
C
A
C
A
A
1 0 0 1
8 / 30
![Page 10: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/10.jpg)
XML Tree Classification on evolving datastreamsFrequent Trees
c1 c2 c3 c4Id c1 f 1
1 c2 f 12 f 2
2 f 32 c3 f 1
3 c4 f 14 f 2
4 f 34 f 4
4 f 54
1 1 1 1 1 1 1 0 0 1 1 1 1 1 12 0 0 0 0 0 0 1 1 1 1 1 1 1 13 1 1 0 0 0 0 1 1 1 1 1 1 1 14 0 0 1 1 1 1 1 1 1 1 1 1 1 1
Closed MaximalTrees Trees
Id Tree c1 c2 c3 c4 c1 c2 c3 Class1 1 1 0 1 1 1 0 CLASS12 0 0 1 1 0 0 1 CLASS23 1 0 1 1 1 0 1 CLASS14 0 1 1 1 0 1 1 CLASS2
9 / 30
![Page 11: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/11.jpg)
XML Tree Framework on evolving datastreams
XML Tree Classification Framework Components
An XML closed frequent tree miner
A Data stream classifier algorithm, which we will feed with tuplesto be classified online.
10 / 30
![Page 12: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/12.jpg)
Mining Evolving Tree Data Streams
ProblemGiven a data stream D of rooted and unordered trees, findfrequent closed trees.
D
We provide three algorithms,of increasing power
Incremental
Sliding Window
Adaptive
11 / 30
![Page 13: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/13.jpg)
Mining Closed Unordered Subtrees
CLOSED_SUBTREES(t ,D ,min_sup,T )
123 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789
10 return T
12 / 30
![Page 14: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/14.jpg)
Mining Closed Unordered Subtrees
CLOSED_SUBTREES(t ,D ,min_sup,T )
1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789
10 return T
12 / 30
![Page 15: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/15.jpg)
Mining Closed Unordered Subtrees
CLOSED_SUBTREES(t ,D ,min_sup,T )
1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6 do if Support(t ′) = Support(t)7 then t is not closed8 if t is closed9 then insert t into T
10 return T
12 / 30
![Page 16: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/16.jpg)
ExampleD = {A,B}
min_sup = 2.
〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)
(0) (0,1)
(0,1,1)
(0,1,2)
(0,1,2,1)
(0,1,2,2)
(0,1,2,3)
(0,1,2,2,1)
(0,1,2,3,1)
13 / 30
![Page 17: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/17.jpg)
ExampleD = {A,B}
min_sup = 2.
〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)
(0) (0,1)
(0,1,1)
(0,1,2)
(0,1,2,1)
(0,1,2,2)
(0,1,2,3)
(0,1,2,2,1)
(0,1,2,3,1)
13 / 30
![Page 18: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/18.jpg)
Experimental results
TreeNat
Unlabeled Trees
Top-Down Subtrees
No Occurrences
CMTreeMiner
Labeled Trees
Induced Subtrees
Occurrences
14 / 30
![Page 19: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/19.jpg)
Closure Operator on Trees
D : the finite input dataset of trees
T : the (infinite) set of all trees
DefinitionWe define the following the Galois connection pair:
For finite A⊆D
σ(A) is the set of subtrees of the A trees in T
σ(A) = {t ∈T∣∣ ∀ t ′ ∈ A(t � t ′)}
For finite B ⊂T
τD (B) is the set of supertrees of the B trees in D
τD (B) = {t ′ ∈D∣∣ ∀ t ∈ B (t � t ′)}
Closure OperatorThe composition ΓD = σ ◦ τD is a closure operator.
15 / 30
![Page 20: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/20.jpg)
Galois Lattice of closed set of trees
1 2 3
12 13 23
123
16 / 30
![Page 21: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/21.jpg)
Galois Lattice of closed set of trees
D
B = { }
1 2 3
12 13 23
12317 / 30
![Page 22: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/22.jpg)
Galois Lattice of closed set of trees
B = { }
τD(B) = { , }
1 2 3
12 13 23
12317 / 30
![Page 23: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/23.jpg)
Galois Lattice of closed set of trees
B = { }
τD(B) = { , }
ΓD(B) = σ ◦τD(B) = { and its subtrees }
1 2 3
12 13 23
12317 / 30
![Page 24: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/24.jpg)
Algorithms
Algorithms
Incremental: INCTREENAT
Sliding Window: WINTREENAT
Adaptive: ADATREENAT Uses ADWIN to monitor change
ADWIN
An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.
ADWIN has rigorous guarantees (theorems)
On ratio of false positives and false negatives
On the relation of the size of the current window and changerates
18 / 30
![Page 25: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/25.jpg)
Experimental Validation: TN1
INCTREENAT
CMTreeMiner
Time(sec.)
Size (Milions)2 4 6 8
100
200
300
Figure: Experiments on ordered trees with TN1 dataset
19 / 30
![Page 26: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/26.jpg)
What is MOA?
{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.
It is closely related to WEKA
It includes a collection of offline and online as well as tools forevaluation:
boosting and baggingHoeffding Trees
with and without Naïve Bayes classifiers at the leaves.
20 / 30
![Page 27: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/27.jpg)
WEKA: the bird
21 / 30
![Page 28: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/28.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
22 / 30
![Page 29: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/29.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
22 / 30
![Page 30: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/30.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
22 / 30
![Page 31: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/31.jpg)
Data stream classification cycle
1 Process an example at atime, and inspect it onlyonce (at most)
2 Use a limited amount ofmemory
3 Work in a limited amountof time
4 Be ready to predict at anypoint
23 / 30
![Page 32: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/32.jpg)
Environments and Data Sources
Environments
Sensor Network: 100Kb
Handheld Computer: 32 Mb
Server: 400 Mb
Data Sources
Random Tree Generator
Random RBF Generator
LED Generator
Waveform Generator
Function Generator
24 / 30
![Page 33: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/33.jpg)
Algorithms
Naive Bayes
Decision stumps
Hoeffding Tree
Hoeffding Option Tree
Bagging and Boosting
Prediction strategies
Majority class
Naive Bayes Leaves
Adaptive Hybrid
25 / 30
![Page 34: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/34.jpg)
Hoeffding Option Tree
Hoeffding Option TreesRegular Hoeffding tree containing additional option nodes thatallow several tests to be applied, leading to multiple Hoeffdingtrees as separate paths.
26 / 30
![Page 35: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/35.jpg)
GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher
27 / 30
![Page 36: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/36.jpg)
GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher
27 / 30
![Page 37: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/37.jpg)
Ensemble Methods
http://www.cs.waikato.ac.nz/∼abifet/MOA/
New ensemble methods:
ADWIN bagging: When a change is detected, the worst classifieris removed and a new classifier is added.
Adaptive-Size Hoeffding Tree bagging
28 / 30
![Page 38: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/38.jpg)
XML Tree Framework on evolving datastreams
Maximal Closed
# Trees Att. Acc. Mem. Att. Acc. Mem.
CSLOG12 15483 84 79.64 1.2 228 78.12 2.54CSLOG23 15037 88 79.81 1.21 243 78.77 2.75CSLOG31 15702 86 79.94 1.25 243 77.60 2.73CSLOG123 23111 84 80.02 1.7 228 78.91 4.18
Table: BAGGING on unordered trees.
29 / 30
![Page 39: Adaptive XML Tree Mining on Evolving Data Streams](https://reader033.fdocuments.in/reader033/viewer/2022060107/554c82e5b4c905203f8b4e9a/html5/thumbnails/39.jpg)
Conclusions
XML tree stream classifier system.
Using Galois Latice Theory, we present methods for miningclosed trees
IncrementalSliding WindowAdaptive: using ADWIN to monitor change
We use MOA data stream classifiers.
30 / 30