Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin...

24
Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu -Paz

Transcript of Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin...

Linkage Problem, Distribution Estimation, and Bayesian

NetworksEvolutionary Computation 8(3)

Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz

Linkage problem

• The problem of building block disruption– Due to crossover

• Solutions– Changing the representation of solutions– Evolving the recombination operators – Extracting some information from the

entire set of promising solutions in order to generate new solutions

Evolving Representation or Operators

• Representation of solutions in the algorithm is to make the interacting components of partial solutions less likely to be broken by recombination.– Various reordering and mapping

operators.– Too slow, not sufficiently powerful– Premature convergence.

– Messy Genetic Algorithm– Linkage Learning Genetic Algorithm

Probabilistic Modeling

• Estimation of Distribution Algorithms– No crossover– New solutions are generated by using

the information extracted from entire set of promising solutions.

• How to extract the information?

No Interaction

• Population Based Incremental Learning (PBIL) (1994)

• Compact Genetic Algorithm (cGA) (1998)• Univariate Marginal Distribution Algorithm

(UMDA) (1997)

Pairwise Interaction

• Dependency tree (1997)• Mutual-Information-Maximization Input Clu

stering (MIMIC) (1997)• Bivariate Marginal Distribution Algorithm (B

MDA) (1999)

Multivariate Interactions

• Factorized Distribution Algorithm (FDA) (1998)

• Extended Compact Genetic Algorithm (ECGA) (1999)

• Bayesian Optimization Algorithm (BOA) (1999)

Multivariate Interactions

• Iterative Density Estimation Evolutionary Algorithm (IDEA) (2000)

• Bayesian Network (1999)• Gaussian Network (1999)

• Bayesian Evolutionary Optimization (Helmholtz Machine) (2000)

• Probabilistic Principle Component Analysis (PPCA) (2001)

Capabilities & Difficulties

• No interactions– Efficient on linear problems.– Higher order BBs.

• Pairwise – Efficient with BBs of order 2.– Higher order BBs.

Capabilities & Difficulties

• FDA– Efficient on decomp. Prob.– Prior information is essential.

• ECGA– Efficient on separable prob.– Highly overlapping BBs.

• BOA– General.

The Bayesian Optimization Algorithm (BOA)

• BOA uses the identical class of distributions as the FDA.– does not require a valid distribution

factorization as input.– able to learn the distribution on the fly

without the use of any problem-specific information.

– Prior information can be incorporated.

BOA1. Set t 0.

randomly generate initial population P(0)2. Select a set of promising strings S(t) from P(t).3. Construct the network B using a chosen metric

and constraints.4. Generate a set of new strings O(t) according to

the joint distribution encoded by B.5. Create a new population P(t+1) by replacing

some strings from P(t) with O(t).Set t t+1.

6. If the termination criteria are not met, go to 2.

Bayesian Networks

• The Bayesian Dirichlet metric (BDe)– Parametric learning

• Greedy algorithms– Structure learning

Greedy algorithm for network construction

1. Initialize the network B.2. Choose all simple graph operations that

can be performed on the network without violating the constraints.

3. Pick the operation that increases the score of the network the most

4. Perform the operation picked in the previous step.

5. If the network can no longer be improved under given constraints on its complexity or a maximal number of iterations has been reached, finish

6. Go to 2.

Generation of a new instance

1. Mark all variable as unprocessed.2. Pick up an unprocessed variable Xi with al

l parents processed already.3. Set Xi to xi with probability p

(Xi = xi|Xi = xi).

4. Mark Xi as already processed.

5. If there are unprocessed variables left, go to 2.

Additively Decomposable Functions

• Additively decomposable functions (ADF)– Can be decomposable into smaller subp

roblems• Order-k decomposable function

– There exists a set of l functions fi over subsets of variables Si for i = 0, …, l-1, each of the size at most k,

1

0

)()(l

iii SfXf

ADF, the Interactions

• ADFs that can be decomposed by using only nonoverlapping sets.– Subfunctions are independent.

• Overlapping sets.

Experiments

1

0max )(

n

iione XXf

1

0max )(

n

iione XXf

otherwise. 1

2 if 0

1 if 8.0

0 if 9.0

)(

g)overlappin(with )()(

g)overlappin(without )()(

3

2/)3(

0

33

13/

0

33

u

u

u

Xf

SfXf

SfXf

deceptive

n

iideceptiveoverlapdec

n

iideceptivedeceptive

Experiments

otherwise 5

5 if 4)(

)()(

5

15/

0

55

uuXf

SfXf

trap

n

iitraptrap

|)3(|)(

)()(

36

16/

0

66

ufXf

SfXf

deceptivebipolar

n

iibipolarbipolar

Experiments

otherwise 0

3 if ),(

otherwise 0

3 if 1

0 if

),(

)()()(

32

31

13

2

2

0

31

ullXf

ul

ul

lXf

SfSfXf l

l

iipeako

1

0 )(,sin )(

n

i iNjjjiigI YJYXf

Results of the Experiments

Results of the Experiments

Results of the Experiments

Future Works

• Bayesian Optimization Algorithm, Population Sizing, and Time to convergence

• Hierachical Problem Solving by the Bayesian Optimization Algorithm

• Genetic Algorithms, Clustering, and Breaking of Symmetry (PPSN 2000)

• Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor