Active Learning based on Bayesian Networks

Active Learning based on Bayesian Networks

Luis M. de Campos, Silvia Acid and Moisés Fernández

2

Index of Contents

1. Introduction The scenario is pool-based active learning cycle.

2. Data and evaluation We have participated in 5 from the six datasets considered. The evaluation

is realized with AUC and ALC.

3. Methods Features, modules implemented, general procedure, how to query labels

and a practical example.

4. Results The best result is in sixth position.

5. Conclusions

6. Acknowledgments

3

1. Introduction

4

2. Data and evaluation

There are 6 datasets of test-final phase. We have participated in five

from the six: A, C, D, E and F.

These datasets are from different application domains: Chemoinformatics. Embryology. Marketing Text ranking.

Evaluation with: Area under the ROC curve (AUC)

Area under the Learning Curve (ALC).

5

3. Methods. Features

Hardware used: laptop with platform Ubuntu 8.10, 4GB of memory

and Intel core duo to 2.53GHz.

We have used three base classifiers from Bayesian Networks: Naive Bayes. It was used in dataset D. TAN (Tree Augmented Network) with score BDeu. It was used in dataset F. CHillClimber. New classifier that moves in a reduced search space centered on the node class. It was used with score BDeu and in dataset A, C and E.

Method of discretization for numerical variables: Fayyad & Irani MDL in TAN and CHillClimber. None in Naive Bayes.

6

3. Methods. Features and Modules

Active learning method: uncertainty sampling.

We didn’t use unlabeled data for training.

Software implemented (several modules): Matlab: main module. It calls the module C++. C++: intermediate module. It calls the module Weka-Java. Weka-Java: final module. It’s implemented with Java in Weka with several modifications.

1

5 4

3

2

7

3. Methods. Procedure The procedure is as follows:

1. Algorithm trains with all known instances, initially it only has got the seed.

2. It selects new examples to query using a particular method (a,b,c). See the following transparency.

3. It joins all of known instances.

4. Are they all instances known? No: go to 1. Yes: end.

Number of instances to query in each iteration is fixed (three

different ways): Exponencial.

Equal10-All.

All-Equal10.“n” is the total labels of dataset.

(n/2)/10 (n/2)/10 (n/2)/10 (n/2)…

(n/2) (n/2)/10 (n/2)/10 (n/2)/10…

1 2 4 8 …16 32 64

Iteration 1

Iteration 2

Iteration 3

Iteration 4

Iteration 5

…

Iteration 2

Iteration 3

…Iteration 1

Iteration 4

8

3. Methods. How to query examples (a, b or c)

For each iteration we sort the examples in increasing ordering of

the probabilities of the most probable class. Then we choose “x”

examples with the particular method elected:

a. We query the “x” examples having the lowest probabilities.

b. We query “x1” and “x2” examples having the lowest probabilities corresponding to class -1 and to class 1 respectively maintaining the proportion of examples of each class known so far.. x = x1 + x2.

c. like method b, but “x1” and “x2” are calculated using the proportion of examples of each class estimated from both the tags returned by the oracle and values returned by our classifier.

9

3. Methods. An example.

Prior knowledge: 6 examples corresponding to class -1 and 4 to class 1.

In addition, our classifier shows the next probabilities:

Our strategy of type exponencial indicates that we have to choose 4

examples (we are in the iteration three): With method a: we would choose examples 3,5,4,6. With method b: we would choose examples 3,5,2,1. With method c: we would choose examples 3,5,4,2.

Example Class -1 Class 1

1 0.10 0.90

2 0.20 0.80

3 0.60 0.40

4 0.70 0.60

5 0.65 0.35

6 0.75 0.25

… … …

Example MaxProb Class

1 0.90 1

2 0.80 1

3 0.60 -1

4 0.70 -1

5 0.65 -1

6 0.75 -1

… … …

Example MaxProb Class

3 0.60 -1

5 0.65 -1

4 0.70 -1

6 0.75 -1

2 0.80 1

1 0.90 1

… … …

Select Max probability Sort

10

4. Results Our results are rather modest, obtaining reasonable performance only in

two datasets, C and E.

To the left we can see the plot of dataset E and to the right the plot of

dataset C.

Dataset A C D E F

Method CHillClimber, exponencial, a)

TAN, equal10-all, c)

NaiveBayes, all-equal10, a)

CHillClimber, exponencial, b)

TAN, exponencial, b)

Ranking 20/22 6/14 15/19 12/20 13/16

11

5. Conclusions

We can improve our process if we apply further processing by

clustering when we have a few instances.

Advantages: Simple. No time consuming.

Disadvantages: Static behavior. Lack of knowledge in early stages of the process.

12

Acknowledgments

This work has been supported by the Spanish research programme

Consolider Ingenio 2010: MIPRCV (CSD2007-00018).

Active Learning based on Bayesian Networks

Documents

Transcript of Active Learning based on Bayesian Networks