Learning Bayesian Networks through evolution

42
Learning Bayesian Networks through evolution Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel

description

Learning Bayesian Networks through evolution . Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel. Outline. What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture - PowerPoint PPT Presentation

Transcript of Learning Bayesian Networks through evolution

Learning Bayesian Networks through evolution

Learning Bayesian Networks through evolution Rotem Golan

Department of Computer ScienceBen-Gurion University of the Negev, Israel

Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesDefinitionABayesian network,belief networkordirected acyclic graphical modelis aprobabilistic graphical modelthat represents a set ofrandom variables and theirconditional dependencies via adirected acyclic graph (DAG).For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.A long storyYou have a new burglar alarm installed at home. It is fairly reliable at detecting a burglary, but also responds on occasion to minor earthquakes. You also have two neighbors, John and Mary, who have promised to call you at work when they hear the alarm. John always calls when he hears the alarm, but sometimes confuses the telephone ringing with the alarm and calls then, too. Mary, on the other hand, likes rather loud music and sometimes misses the alarm altogether. Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.A short representation

ObservationsIn our algorithm, all the values of the network are known except the genre value, which we would like to estimate. The variables in our algorithm are continuous and not Boolean (except the genre variable).We divide the possible values of each variables into fixed size intervals.The number of intervals is changed throughout the evolution.We refer to this process as the discretization of the variable.We refer to the Conditional Probability Table of each variable (node) as CPTNave Bayesian Network

Bayesian Network constructionOnce we determined the chosen variables (amount and choice), their fixed discretization and the structure of the graph, we can easily compute the CPT values for each of the nodes in the graph (according to the training set).For each vector in the training set, we will update all the networks CPTs by increasing the appropriate entry by one.After this process, we will divide each value with the sum of its row (Normalization).Exact Inference in Bayesian NetworksFor each vector in the verification/test set, we compute six different probabilities (Multiplying the appropriate entries of all the networks CPTs) and chose the highest one as the genre of this vector. Each probability is for a different assumption on the genre variable value (Rock, Pop, Blues, Jazz, Classical and Metal).We will discuss the issue of zeroes in the CPTs later on.

Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesCompetition overviewA database of 60 music performers has been prepared for the competition. The material is divided into six categories: classical music, jazz, blues, pop, rock and heavy metal. For each of the performers 15-20 music pieces have been collected. All music pieces are partitioned into 20 segments and parameterized.The feature vector consists of 191 parameters.Competition overview (Cont.)Our goal is to estimate the music genre of newly given fragments of music tracks.Input:A training set of 12,495 vectors and their genreA test set of 10,269 vectors without their genreOutput: 10,269 labels (Classical, Jazz, Rock, Blues, Metal or Pop). One for each vector in the test set.The metric used for evaluating the solutions is standard accuracy, i.e. the ratio of the correctly classified samples to the total number of samples.Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesPreprocessingI divided the training set into two sets.A training set used for constructiong each Bayesian Network in the population.A verification set used for computing the fitness of each network in the population.

These sets has the same amount of vectors for each category (Rock vectors, Pop vectors, etc.)

The three dimensions genetic algorithmThe three dimensions are:Variables amount.Variables choice.Fixed discretization of the variables.Every network in the population is a Nave Bayesian Network, which means that its structure is already determined.

Fitness functionIn order to compute the fitness of a network, we estimate the genre of each vector in the verification set, and compare it to its known genre.

The metric used for computing the fitness is standard accuracy, i.e. the ratio of the correctly classified vectors to the total number of vectors in the verification set.

SelectionIn each generation, we choose population_size/2 different networks at most.We prefer networks that have the highest fitness and are distinct from each other.After choosing these networks we use them to build a fully sized population by mutating each one of them.We use bitwise mutation to do so.Notice that we may use a mutated network to generate a new mutated network.

MutationBitwise mutation.Parent:BitSetDis

Child:BitSetDis

1100011010011001500184900018CrossoverSingle point crossover.Parent 1:

Parent 2:

Child 1:

Child2:

11000149000181111015510100151111014910100151100015500018ResultsModel - Naive Bayesian Population size - 40Generations - 400Variables - [1,191]discretization - [5,15]First population score (verification set) - 0.7756Best score (verification set) - 0.8327Websites score (test set) - 0.7031Zeroes = 0ObservationNotice that as the discretization interval increases, the CPTs of the network are getting bigger.The number of vectors in the training set is fixed, so we get more occurrences of the number zero in the CPTs.These zeroes can harm the computation of the different genre probabilities.As a solution, for each node, we will take the minimum value of its CPT, divide it by 10 and replace all zeroes in that CPT to the result. Results (Cont.)Model - Naive Bayesian Population size - 40Generations - 400Variables - [1,191]discretization - [5,15]First population score - 0.7878Best score - 0.8415Websites score - 0.7317Zeroes = cpt_min/10ObservationNotice that theres approximately 10% difference between my score and the websites score.We will discuss this issue later on.Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesAdding the forth dimensionThe forth dimension is the structure of the Bayesian Network

Now, the population includes different Bayesian Networks. Meaning, networks with different structures, variables choice, variables amount and Discretization array.

Initial populationThe networks in the initial population are distributed uniformly in the search space.Ive noticed that the algorithm tends to keep networks with a high number of variables.Therefore, when generating the initial population, I increased the probability for getting networks with a low number of variables.

Evolution operations The selection process is the same as in the previous algorithm. The crossover and mutation are similar.First, we start like the previous algorithm (Handling the BitSet and the discretization array)Then, we add all the edges we can from the parent (mutation) or parents (crossover) to the childs graph.Finally, we make sure that the childs graph is a connected acyclic graph.

ResultsModel - Bayesian NetworkPopulation size 20Generations Crashed on generation 104Variables - [1,191]discretization - [2,6]First population score - 0.4920Best score - ~0.8559Websites score Not available, Since it Crashed.

Memory problemsThe program was executed on amdsrv3, with a 4.5 GB memory limit.Even though the discretization interval is [2-6], the program has crashed due to java heap space error. As a result I decided to decrease the population size to 10 instead of 20.

Results (Cont.)Model - Bayesian NetworkPopulation size 10Generations 800Variables - [1,191]discretization - [2,10]First population score - 0.5463Best score - 0.8686Websites score - 0.7085Results (Cont.)Model - Bayesian NetworkPopulation size 10Generations 800Variables - [1,191]discretization - [2,20]First population score - 0.5978Best score - 0.8708Websites score - 0.6972

OverfittingAs we increase the discretization interval, my score increases and the websites score decreases.One explanation can be that increasing the search space may cause the algorithm to find patterns with strong correlation to the specific input data I received. While these patterns has no correlation at all to the real life data.One possible solution can be to replace the training set or the verification set while the algorithm is running.The problem is that we dont have enough input data to do so.

Final competition scores

My score

Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesThe big pictureIn order to really find patterns that describes the real life data, we have to find the best probabilistic model which represent this data.Choosing the probabilistic model and building it are key factors in achieving such a goal.The field of Data Mining suggests numerous techniques such as association rules, decision trees, frequent sequences, Markov Networks and clustering in order to build different classifiers.

The big picture (Cont.)The Bayesian network classifier seems like a good tool at first, but it might miss some patterns that are vital for the perfect classifier.These pattern might be identified using other classifiers.

Ideas for improvementParameter increasing may yield better results, but it also makes the program crash.Therefore, better programming style or maybe parallel computing might help overcome this problem.Instead of using fixed size discretization, we might want to use a more complex discretization technique such as clustering.The idea is that, dividing a variable into intervals with different sizes, or even not continuous intervals, may cause this variable to improve the entire probabilistic model.

Ideas for improvement (Cont.)We might want to use bitwise crossover, instead of single point crossover. Since, the order of the vectors variables is insignificant.

Outline What is a Bayesian Network? Competition overview The three dimensions genetic algorithm Adding the fourth dimension The Big picture ReferencesReferencesArtificial Intelligence A Modern Approach, Stuart Russell and Peter Norvig (Second edition).Contest website:http://tunedit.org/challenge/music-retrieval/genresBattery power example:http://www.bayesia.com