Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT...
-
Upload
matthew-flynn -
Category
Documents
-
view
216 -
download
1
Transcript of Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT...
![Page 1: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/1.jpg)
Index Driven Selective Sampling for CBR
Nirmalie Wiratunga Susan Craw Stewart Massie
THEROBERT GORDON
UNIVERSITYABERDEEN
School of Computing
![Page 2: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/2.jpg)
Overview
Selective sampling
Cluster creation using an index
Cluster and case utility scores
Evaluation
![Page 3: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/3.jpg)
Selective Sampling
selected cases
labelled cases
select interesting cases
unlabelled cases(pool)
Index
case-base•Relevance feedback•Distance learning•Patient monitoring
![Page 4: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/4.jpg)
Uncertainty and Representativeness
+ -? ?
+ -?
?
??
??
![Page 5: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/5.jpg)
Sampling Procedure
L = set of labelled casesU = set of unlabelled casesLOOP
model <= create-domain-model (L)clusters <= create-clusters(model, L, U)k-clusters <= select-clusters(k, clusters, L, U)FOR 1 to Max-Batch-Size
case <= select-case(k-clusters, L, U)L <= L U get-label(case, oracle)U <= L \ case
UNTIL stopping-criterion
![Page 6: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/6.jpg)
Overview
Selective sampling
Cluster creation using an index
Cluster and case utility scores
Evaluation
![Page 7: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/7.jpg)
Forming Clusters
5 labelled(4X, 1Y)
6 unlabelled
0 labelled 6 unlabelled
f35 labelled
(2X, 2Z, 1Y) 0 unlabelled
< N >= N
5 labelled(2X, 2Y, 1Z) 6 unlabelled
f1
f2
a b
d e
5 labelled(4Y, 1Z)
0 unlabelled
c
![Page 8: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/8.jpg)
Analysing Clusters
X
X X
Y
X
Y
X X
Y
Z
Z
Y Y
Y
YZ
X X
Y
Z
![Page 9: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/9.jpg)
Overview
Selective sampling
Cluster creation
Cluster and case utility scores
Evaluation
![Page 10: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/10.jpg)
Ranking Clusters - Cluster Utility Score
![Page 11: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/11.jpg)
Ranking Cases - Case Utility Score
![Page 12: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/12.jpg)
Overview
Selective sampling
Cluster creation
Cluster and case utility scores
Evaluation
![Page 13: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/13.jpg)
Evaluation
Selection Heuristics Rnd : randomly select cluster and cases Rnd-Cluster : random cluster with highest ranked cases Rnd-Case : highest ranked cluster random cases Informed-S : highest ranked cluster and cases Informed-M : highest ranked clusters and case
UCI ML (6 datasets) smaller data sets (Zoo, Iris, Lymph, Hep) medium data sets (house votes, breast cancer)
![Page 14: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/14.jpg)
Experimental Design
Index
case-base
sampling pool
Inc 2Inc 3Inc 4Inc 5Inc
test set
case base size = L + selected cases
selected cases = sampling iterations * Max-Batch-Size
kNNaccuracy
![Page 15: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/15.jpg)
Results I
70
75
80
85
90
50 75 100 125 150
Zoo: Sampling Pool Size
Acc
urac
y on
Tes
t Set
80
85
90
95
50 75 100 125 150
Iris: Sampling Pool Size
Acuu
racy
on
Test
Set
Rnd Rnd-cluster Rnd-case Informed-M Informed-S
Zoo (7C, 18F, A, P9) Iris (3C, 4F, #+A, P3)
![Page 16: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/16.jpg)
Results II
65
70
75
80
50 75 100 125 150
Lymphography: Sampling Pool Size
Accu
racy
on
Test
Set
80
81
82
83
84
50 75 100 125 150
Hepatitis: Sampling Pool Size
Accu
racy
on
Test
Set
Rnd Rnd-cluster Rnd-case Informed-M Informed-S
Lymphography (4C, 19F, #+A, P9) Hepatitis (2C, 20F, A+?, P7)
![Page 17: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/17.jpg)
Results III
80
84
88
92
150 200 250 300 350
House Votes: Sampling Pool Size
Accu
racy
on
Test
Set
62
63
64
65
66
67
68
69
150 200 250 300 350Breast Cancer: Sampling Pool Size
Accu
racy
on
Test
Set
Rnd Rnd-cluster Rnd-case Informed-M Informed-S
House (2C, 16F, A+?, P3 ) Breast (2C, 9F, A+?, P7)
![Page 18: Index Driven Selective Sampling for CBR Nirmalie Wiratunga Susan Craw Stewart Massie THE ROBERT GORDON UNIVERSITY ABERDEEN School of Computing.](https://reader035.fdocuments.in/reader035/viewer/2022062619/5515e696550346d46f8b4f65/html5/thumbnails/18.jpg)
Conclusions
Developed a case selection mechanism exploiting case base partitions
Utility Scores to rank clusters and cases ClUS captures uncertainty within clusters and uses
entropy to further weight this score CaUS captures the impact on other cases
Significant improvement with informed selection on 6 data sets
The influence of votes, partitions and entropy needs further investigation