Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and...

8/14/2019 Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap

1/22

Performance of Evolutionary Algorithms on NK Landscapes with Nearest NeighborInteractions and Tunable Overlap

Martin Pelikan, Kumara Sastry, David E. Goldberg, Martin V. Butz, and Mark Hauschild

MEDAL Report No. 2009002

January 2009

Abstract

This paper presents a class of NK landscapes with nearest-neighbor interactions and tunable overlap. The

considered class of NK landscapes is solvable in polynomial time using dynamic programming; this allows us to

generate a large number of random problem instances with known optima. Several variants of standard genetic

algorithms and estimation of distribution algorithms are then applied to the generated problem instances. The

results are analyzed and related to scalability theory for selectorecombinative genetic algorithms and estimation

of distribution algorithms.

Keywords

NK fitness landscape, hierarchical BOA, genetic algorithm, univariate marginal distribution algorithm, performance

analysis, scalability, crossover, hybridization.

Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)

Department of Mathematics and Computer Science

University of MissouriSt. Louis

One University Blvd., St. Louis, MO 63121

E-mail: [email protected]

WWW: http://medal.cs.umsl.edu/


2/22


3/22

the effects of overlap between subproblems on algorithm performance, the number of bits that over-lap between consequent subproblems is controlled by a user-specified parameter. All consideredtypes of NK landscape instances are challenging yet solvable in polynomial time using dynamicprogramming; this allows us to consider a large number of random instances with known optimain practical time. The results are related to existing scalability theory and interesting directionsfor future research are outlined. The work presented in this paper combines and extends some

of the facets of two previous studies on similar problems, specifically, the analysis of evolutionaryalgorithms on random additively decomposable problems (Pelikan, Sastry, Butz, & Goldberg, 2006;Sastry, Pelikan, & Goldberg, 2007) and the analysis of evolutionary algorithms on standard NKlandscapes (Pelikan, Sastry, Butz, & Goldberg, 2008).

The paper starts by describing NK landscapes in section 2. Section 3 outlines the dynamicprogramming algorithm which can find guaranteed global optima of the considered instances inpolynomial time. Section 4 outlines the compared algorithms. Section 5 presents experimentalresults. Section 6 discusses future work. Finally, section 7 summarizes and concludes the paper.

2 NK Landscapes

This section describes NK landscapes. First, approaches to testing evolutionary algorithms arebriefly discussed. The general form of NK landscapes is then described. Next, the classes of NKlandscape instances considered in this paper are discussed and the method used to generate randomproblem instances of these classes is outlined.

2.1 Testing Evolutionary Algorithms

There are three basic approaches to testing optimization techniques:

(1) Testing on the boundary of the design envelope using artificial, adversarial test problems. Forexample, fully deceptive concatenated traps (Ackley, 1987; Deb & Goldberg, 1991) representa class of artificial test problems that can be used to test whether the optimization algorithm

can automatically decompose the problem and exploit the discovered decomposition effectively.Testing on artificial problems on the boundary of the design envelope is also a common practiceoutside standard optimization; for example, consider testing car safety using car crash tests ortesting durability of cell phones using drop tests.

(2) Testing on classes of random problems. For example, to test algorithms for solving maximumsatisfiability (MAXSAT) problems, large sets of random formulas in conjunctive normal formcan be generated and analyzed (Cheeseman, Kanefsky, & Taylor, 1991). Similar approaches totesting are common outside standard optimization as well; for example, new software productsare often tested by groups of beta testers in order to discover all problems in situations thatwere not expected during the testing on the boundary of the design envelope.

(3) Testing on real-world problems or their approximations. For example, the problem of designingmilitary antennas can be considered for testing (Santarelli, Goldberg, & Yu, 2004). As anexample outside optimization, when a new model of an airplane has been designed and thor-oughly tested on the boundary of its design envelope, it is ready to take its real-world testthefirst actual flight.

In this paper, we focus on the testing on random classes of problems. More specifically, weconsider random instances of a restricted class of NK landscapes with nearest-neighbor interactions

2


4/22

and tunable overlap. All considered instances are solvable in low-order polynomial time with adynamic programming algorithm. This allows us to generate a large number of random probleminstances with known optima in practical time. The main focus is on simple hybrids based onstandard selectorecombinative genetic algorithms, estimation of distribution algorithms, and thedeterministic hill climbing based on single-bit flips.

The general form of NK landscapes is described next. Then, the considered class of NK land-

scapes and the procedure for generating random problem instances are outlined.

2.2 Problem Definition

An NK fitness landscape (Kauffman, 1989; Kauffman, 1993) is fully defined by the following com-ponents:

The number of bits, n.

The number of neighbors per bit, k.

A set of k neighbors (Xi) for the i-th bit, Xi, for every i {0, . . . , n 1}.

A subfunction fi defining a real value for each combination of values of Xi and (Xi) forevery i {0, . . . , n 1}. Typically, each subfunction is defined as a lookup table with 2k+1

values.

The objective function fnk to maximize is defined as

fnk(X0, X1, . . . , X n1) =n1

i=0

fi(Xi, (Xi)).

The difficulty of optimizing NK landscapes depends on all of the four components defining anNK problem instance. One useful approach to analyzing complexity of NK landscapes is to focus on

the influence ofk on problem complexity. For k = 0, NK landscapes are simple unimodal functionssimilar to onemax or binint, which can be solved in linear time and should be easy for practicallyany genetic and evolutionary algorithm. The global optimum of NK landscapes can be obtained inpolynomial time (Wright, Thompson, & Zhang, 2000) even for k = 1; on the other hand, for k > 1,the problem of finding the global optimum of unrestricted NK landscapes is NP-complete (Wrightet al., 2000). The problem becomes polynomially solvable with dynamic programming even fork > 1 if the neighbors are restricted to only adjacent string positions (Wright et al., 2000) orif the subfunctions are generated according to some distributions (Gao & Culberson, 2002). Forunrestricted NK landscapes with k > 1, a polynomial-time approximation algorithm exists withthe approximation threshold 1 1/2k+1 (Wright et al., 2000).

2.3 NK Instances with Nearest Neighbors and Tunable Overlap

In this paper we consider NK instances with the following two restrictions:

1. Neighbors of each bit are restricted to the k bits that immediately follow this bit. When thereare fewer than k bits left to the right of the considered bit, the neighborhood is restricted tocontain all the bits to the right of the considered bit.

3


5/22

2. Some subproblems may be excluded to provide a mechanism for tuning the size of the overlapbetween consequent subproblems. Specifically, the fitness is defined as

fnk(X0, X1, . . . , X n1) =

jn1

step

k

i=0

fi(Xistep, (Xi)),

where step {1, 2, . . . , k + 1} is a parameter denoting the step with which the basis bits areselected. For standard NK landscapes, step = 1. With larger values of step, the amountof overlap between consequent subproblems can be reduced. For step = k + 1, the problembecomes separable (the subproblems are fully independent).

The reason for restricting neighborhoods to nearest neighbors was to ensure that the probleminstances can be solved in polynomial time even for k > 1 using a simple dynamic programmingalgorithm. The main motivation for introducing the step parameter was to provide a mechanism fortuning the strength of the overlap between different subproblems. The resulting class of problems isa subset of standard, unrestricted NK landscapes (Kauffman, 1989; Kauffman, 1993). Furthermore,the resulting instances are a superset of the polynomially solvable random additively decomposable

problems introduced in Pelikan, Sastry, Butz, and Goldberg (2006).The subfunctions in the considered class of NK landscapes are encoded as look-up tables; thus,

the subfunctions can be defined arbitrarily.

2.4 Generating Random Instances

The overall number of subfunctions and the set of neighbors for each of these subfunctions arefully specified by parameters n, k, and step. The only component that varies from instance toinstance for any valid combination of values of n, k, and step are the subfunctions themselves andthe encoding, as described below.

The lookup table for all possible instantiations of bits in each subfunction is generated randomlyusing the same distribution for each entry in the table. Each of the values is generated using theuniform distribution over interval [0, 1).

To make the instances more challenging, string positions in each instance are shuffled randomly.This is done by reordering string positions according to a randomly generated permutation usingthe uniform distribution over all permutations.

The following section describes the dynamic programming approach which can be used to findguaranteed optima for the aforementioned class of NK instances in polynomial time.

3 Dynamic Programing for Nearest-Neighbor NK Landscapes

The dynamic programming algorithm used to solve the described class of NK landscape instances

is based on Pelikan et al. (2006). It uses the knowledge of the location of subproblems and thepermutation imposed on the string positions, and considers subproblems in order from left to rightaccording to the original permutation of string positions before shuffling. For example, consider theproblem with n = 7, k = 2, and step = 2, which contains 4 subproblems defined in the followingsubsets of positions (according to the original permutation of the string positions): {0, 1, 2} for thesubproblem f0, {2, 3, 4} for f1, {4, 5, 6} for f2, and {6} for f3. The dynamic programming algorithmprocesses the subproblems in the following order: (f0, f1, f2, f3). For each subproblem, the optimalfitness contribution of this and the previous subproblems is computed for any combination of bits

4


6/22

that overlap with the next subproblem to the right. The global optimum is then given by thecomputed fitness contribution of the last subproblem on the right.

Denoting by o = kstep+1 the maximum number of bits in which the subproblems overlap andby m the overall number of subproblems, the dynamic programming algorithm starts by creating amatrix G = (gi,j) of size m 2

o. The element gi,j for i {0, 1, . . . , m 1} and j {0, 1, . . . , 2o 1}

encodes the maximum fitness contribution of the first (i + 1) subproblems where the o (or fewer)

bits that overlap with the next subproblem to the right are equal to j using integer representationfor these o bits. The last few subproblems may overlap with the next subproblem in fewer than obits; that is why some entries in the matrix G are not going to be used. For example, for the aboveexample problem with n = 7, k = 2, and step = 2, g1,0 represents the best fitness contribution off0and f1 (ignoring f2 and f3) under the assumption that the 5th bit is 0; analogically, g1,1 representsthe best fitness contribution of f0 and f1 under the assumption that the 5th bit is 1.

The algorithm starts by considering all 2k+1 instances of the k + 1 bits in the first subproblem,and records the best found fitness for each combination of values of the o (or fewer) bits thatoverlap with the second subproblem; the resulting values are stored in the first row of G (elementsg0,j). Then, the algorithm goes through all the remaining subproblems from left to right. Forthe subproblem fi, all 2

k+1 instances of the k + 1 bits in this subproblem are examined; the only

exception may be the right-most subproblems, for which the neighborhood may be restricted dueto the fixed string length. For each instance, the algorithm first looks at the column j of G thatcorresponds to the o (or fewer) bits of the subproblem fi that overlap with the previous subproblemfi1. The fitness contribution is computed as the sum of gi1,j and the fitness contribution ofthe considered instance of fi. For each possible instantiation of bits that overlap with the nextsubproblem, the optimum fitness contribution is recorded in matrix G, forming the next row (thatis, row i) of the matrix.

After processing all subproblems, the value of the global optimum is equal to the fitness contri-bution stored in the first element of the last row of G. The values that lead to the optimum fitnesscan be found by examining all choices made when choosing the best combination of bits in eachsubproblem.

4 Compared Algorithms

This section outlines the optimization algorithms discussed in this paper: (1) the genetic algo-rithm (GA) (Holland, 1975; Goldberg, 1989), (2) the univariate marginal distribution algorithm(UMDA) (Muhlenbein & Paa, 1996), and (3) the hierarchical Bayesian optimization algorithm(hBOA) (Pelikan & Goldberg, 2001; Pelikan, 2005). Additionally, the section describes the deter-ministic hill climber (DHC) (Pelikan & Goldberg, 2003), which is incorporated into all comparedalgorithms to improve their performance. In all compared algorithms, candidate solutions arerepresented by binary strings of n bits.

The genetic algorithm (GA) (Holland, 1975; Goldberg, 1989) evolves a population of candidate

solutions with the first population generated at random according to the uniform distributionover all binary strings. Each iteration starts by selecting promising solutions from the currentpopulation; we use binary tournament selection without replacement. New solutions are created byapplying variation operators to the population of selected solutions. Specifically, crossover is usedto exchange bits and pieces between pairs of candidate solutions and mutation is used to perturb theresulting solutions. Here we use uniform or two-point crossover, and bit-flip mutation (Goldberg,1989). To maintain useful diversity in the population, the new candidate solutions are incorporatedinto the original population using restricted tournament selection (RTS) (Harik, 1995). The run is

5


7/22

terminated when termination criteria are met. In this paper, each run is terminated either whenthe global optimum has been found or when a maximum number of iterations has been reached.

The univariate marginal distribution algorithm (UMDA) (Muhlenbein & Paa, 1996) proceedssimilarly as GA. However, instead of using crossover and mutation to create new candidate solu-tions, UMDA learns a probability vector (Juels, 1998; Baluja, 1994) for the selected solutions andgenerates new candidate solutions from this probability vector. The probability vector stores the

proportion of 1s in each position of the selected population. Each bit of a new candidate solutionis set to 1 with the probability equal to the proportion of 1s in this position; otherwise, the bit isset to 0. Consequently, the variation operator of UMDA preserves the proportions of 1s in eachposition while decorrelating different string positions.

The hierarchical Bayesian optimization algorithm (hBOA) (Pelikan & Goldberg, 2001; Pelikan,2005) proceeds similarly as UMDA. However, to model promising solutions and generate newcandidate solutions, Bayesian networks with local structures (Chickering, Heckerman, & Meek,1997; Friedman & Goldszmidt, 1999) are used instead of the simple probability vector of UMDA.

The deterministic hill climber (DHC) is incorporated into GA, UMDA and hBOA to improvetheir performance. DHC takes a candidate solution represented by an n-bit binary string on input.Then, it performs one-bit changes on the solution that lead to the maximum improvement of

solution quality. DHC is terminated when no single-bit flip improves solution quality and thesolution is thus locally optimal. Here, DHC is used to improve every solution in the populationbefore the evaluation is performed.

5 Experiments

This section describes experiments and presents experimental results. First, problem instances andthe experimental setup are discussed. Next, the analysis of hBOA, UMDA and several GA variantsis presented. Finally, all algorithms are compared and the results of the comparisons are discussed.

5.1 Problem Instances

The parameters n, k, and step were set as follows: n {20, 30, 40, 50, 60, 70, 80, 90, 100, 120},k {2, 3, 4, 5}, and step {1, 2, . . . , k + 1}. For each combination of n, k, and step, we generated10,000 random problem instances. Then, we applied GA, UMDA and hBOA to each of theseinstances and collected empirical results, which were subsequently analyzed. That means thatoverall 180,000 unique problem instances were generated and all of them were tested with everyalgorithm included in this study.

For UMDA, largest instances were infeasible even with extremely large population sizes of morethan 106; that is why some problem sizes are excluded for this algorithm and the main focus is onGA and hBOA.

5.2 Compared Algorithms

The following list summarizes the algorithms included in this study:

(i) Hierarchical BOA (hBOA).

(ii) Univariate marginal distribution algorithm (UMDA).

(iii) Genetic algorithm with uniform crossover and bit-flip mutation.

6


8/22

(iv) Genetic algorithm with two-point crossover and bit-flip mutation.

5.3 Experimental Setup

To select promising solutions, binary tournament selection without replacement is used. Newsolutions (offspring) are incorporated into the old population using RTS with window size w =

min{n,N/5} as suggested in Pelikan (2005). In hBOA, Bayesian networks with decision trees (Chick-ering et al., 1997; Friedman & Goldszmidt, 1999; Pelikan, 2005) are used and the models areevaluated using the Bayesian-Dirichlet metric with likelihood equivalence (Heckerman et al., 1994;Chickering et al., 1997) and a penalty for model complexity (Friedman & Goldszmidt, 1999; Pelikan,2005). All GA variants use bit-flip mutation with the probability of flipping each bit pm = 1/n. Twocommon crossover operators are considered in a GA: two-point and uniform crossover. For bothcrossover operators, the probability of applying crossover is set to 0.6. A stochastic hill climberwith bit-flip mutation has also been considered in the initial stage, but the performance of thisalgorithm was far inferior compared to any other algorithm included in the comparison and mostproblem instances included in the comparison were intractable with this algorithm; that is why theresults for this algorithm are omitted.

For each problem instance and each algorithm, an adequate population size is approximated withthe bisection method (Sastry, 2001; Pelikan, 2005); here, the bisection method finds an adequatepopulation size to find the optimum in 10 out of 10 independent runs. Each run is terminated whenthe global optimum has been found (success) or when the maximum number of generations n isreached before the optimum is reached (failure). The results for each problem instance comprise ofthe following statistics: (1) the population size, (2) the number of iterations (generations), (3) thenumber of evaluations, and (4) the number of flips of DHC. The most important statistic relating tothe overall complexity of each algorithm is the number of flips of DHC, since this statistic combinesall important statistics and can be consistently compared regardless of the used algorithm. Thatis why we focus on presenting the results with respect to the overall number of DHC flips until theoptimum has been found.

For each combination of values of n, k and step, all observed statistics were averaged over the

10,000 random instances. Since for each instance, 10 successful runs were performed, for each n, kand step and each algorithm the results are averaged over the 100,000 successful runs. Overall, forall algorithms except for UMDA, the results correspond to 1,800,000 successful runs on a total of180,000 unique problem instances.

5.4 Initial Performance Analysis

The number of DHC flips until optimum is shown in figure 1 for hBOA, figure 2 for UMDA,figure 3 for GA with uniform crossover, and figure 4 for GA with twopoint crossover. The numberof evaluations for k = 2 and k = 5 with step = 1 is shown in figure 5; for brevity, we omit analogicalresults for other values of k and step.

There are three main observations that can be made from these results. First of all, for hBOA,the number of DHC flips as well as the number of evaluations both appear to be upper bounded bya low-order polynomial for all values ofk and step. However, for GA with both crossover operators,for larger values ofk the growth of the number of DHC flips and the number of evaluations appearsto be worse than polynomial with respect to n and it can be expected that the results will geteven worse for k > 5. The worst performance is obtained with UMDA, for which the growth of thenumber of evaluations and the number of DHC flips appears to be faster than polynomial for allvalues of k and step.

7


9/22

20 40 60 80 10010

2

103

104

Problem size

N

umberofflips(hBOA)

k=2, step=1k=2, step=2k=2, step=3

20 40 60 80 10010

2

103

104

Problem size

N

umberofflips(hBOA)

k=3, step=1k=3, step=2k=3, step=3k=3, step=4

20 40 60 80 10010

2

103

104

Problem size

Numberofflips(hBOA)

k=4, step=1k=4, step=2k=4, step=3k=4, step=4k=4, step=5

20 40 60 80 100

103

104

105

Problem size

Numberofflips(hBOA)

k=5, step=1k=5, step=2k=5, step=3k=5, step=4k=5, step=5k=5, step=6

Figure 1: Average number of flips for hBOA.

To visualize the effects of k on performance of all compared algorithms, figure 6 shows thegrowth of the number of DHC flips with k for hBOA and GA on problems of size n = 120; theresults for UMDA are not included, because UMDA was incapable of solving many instances ofthis size in practical time. Two cases are considered: (1) step = 1, corresponding to standardNK landscapes and (2) step = k + 1, corresponding to the separable problem with no interactionsbetween the different subproblems. For both cases, the vertical axis is shown in log-scale to supportthe hypothesis that the time complexity of selectorecombinative genetic algorithms should growexponentially fast with the order of problem decomposition even when recombination is capableof identifying and processing the subproblems in an adequate problem decomposition. The resultsconfirm this hypothesisindeed, the number of flips for all algorithms appears to grow at leastexponentially fast with k, regardless of the value of the step parameter.

5.5 Comparison of All Algorithms

How do the different algorithms compare in terms of performance? While it is difficult to comparethe exact running times due to the variety of computer hardware used and the accuracy of timemeasurements, we can easily compare other recorded statistics, such as the number of DHC flipsor the number of evaluations until optimum. The main focus is again on the number of DHC

flips because for each fitness evaluation at least one flip is typically performed and that is why thenumber of flips is expected to be greater or equal than both the number of evaluations as well asthe product of the population size and the number of generations.

One of the most straightforward approaches to quantify relative performance of two algorithmsis to compute the ratio of the number of DHC flips (or some other statistic) for each probleminstance. The mean and other moments of the empirical distribution of these ratios can then beestimated for different problem sizes and problem types. The results can then be used to better

8


10/22

20 40 60 80 100

102

103

104

Problem size

Numberofflips(UMDA)


20 40 60 80 10010

2

103

104

Problem size

Numberofflips(UMDA)


20 40 6010

2

103

104

105

Problem size

Numberofflips(UMDA)


20 40 6010

2

103

104

105

Problem size

Numberofflips(UMDA)


Figure 2: Average number of flips for UMDA.

understand how the differences between the compared algorithms change with problem size or otherproblem-related parameters. This approach has been used for example in Pelikan et al. (2006) andPelikan et al. (2008).

The mean ratio of the number of DHC flips until optimum for GA with uniform crossover andthat for hBOA is shown in figure 7. The ratio encodes the multiplicative factor by which hBOAoutperforms GA with uniform crossover. When the ratio is greater than 1, hBOA outperforms GAwith uniform crossover by this factor; when the ratio is smaller than 1, GA with uniform crossover

outperforms hBOA.The results show that when measuring performance by the number of DHC flips, for k = 2,

hBOA is outperformed by GA with uniform crossover on the entire range of tested instances.However, even for k = 2 the ratio grows with problem size and the situation can thus be expectedto change for bigger problems. For larger values of k, hBOA clearly outperforms GA with uniformcrossover and for the largest values of n and k, that is, for n = 120 and k = 5, the ratio for thenumber of flips required is more than 3 regardless of step; that means that for the largest values ofnand k, hBOA requires on average less than third of the flips to solve the same problem. Even moreimportantly, the ratio appears to grow faster than polynomially, indicating that the differences willbecome much more substantial as the problem size increases.

The ratio between the number of DHC flips until optimum for GA with twopoint crossover and

that for hBOA is shown in figure 8. The results show that for twopoint crossover, the differencesbetween hBOA and GA are even more substantial than for the uniform crossover. This is supportedby the results presented in figure 9, which compares the two crossover operators used in GA; uniformcrossover clearly outperforms twopoint crossover for all tested problem instances.

In summary, the performance of hBOA is substantially better than that of other comparedalgorithms especially for the most difficult problem instances with large neighborhood size. Thereresults are highlighted in tables 1 and 2, which show the average number of DHC flips and evalua-

9


11/22


12/22

20 40 60 80 100

10

2

103

104

Problem size

Num

berofflips(GA,twopoint)


20 40 60 80 10010

2

103

104

Problem size

Num

berofflips(GA,twopoint)


20 40 60 80 10010

2

103

104

105

Problem size

Numberofflips(GA,twopoint)


20 40 60 80 10010

2

103

104

105

106

Problem size



Figure 4: Average number of flips for GA with twopoint crossover.

subsection. First, the effects of parameters n, k, and step are discussed. Then, the results are relatedto existing scalability theory. Of course, performance of selectorecombinative genetic algorithmsalso strongly depends on their ability to preserve and mix important partial solutions or buildingblocks (Goldberg, 2002; Thierens, 1999) and the importance of effective recombination is alsoexpected to vary from instance to instance.

The results presented thus far in figures 1, 2, 3, 4, and 5 show that the larger the number of bits n,the more difficult the problem becomes whether we measure algorithm performance by the numberof DHC flips or the number of evaluations until optimum. The main reason for this behavior isthat as n grows, the signal-to-noise ratio decreases and the complexity of selectorecombinative GAsas well as hBOA is expected to grow with decreasing signal-to-noise ratio (Goldberg & Rudnick,1991; Thierens & Goldberg, 1994; Harik et al., 1997; Goldberg, 2002; Pelikan et al., 2002); therelationship between instance difficulty and the signal-to-noise ratio will be discussed also laterin this section. For hBOA, time complexity expressed in terms of the number of DHC flips orthe number of fitness evaluations appears to grow polynomially fast with n; for the remainingalgorithms, time complexity appears to grow slightly faster than polynomially fast. These resultsare not surprising. It was argued elsewhere that if the problem can be decomposed into subproblemsof bounded order, then hBOA should be capable of discovering such a decomposition and solvethe problem in low-order polynomial time with respect to n (Pelikan, Sastry, & Goldberg, 2002;

Pelikan, 2005; Yu, Sastry, Goldberg, & Pelikan, 2007). Furthermore, it is known that fixed crossoveroperators are often not capable of solving such problems in polynomial time because they oftenbreak important partial solutions to the different subproblems or do not juxtapose these partialsolutions effectively enough (Thierens & Goldberg, 1993; Thierens, 1995). While it is possibleto create adversarial decomposable problems for which some model-building algorithms used inmultivariate estimation of distribution algorithms, such as hBOA, fail (Coffin & Smith, 2007), thishappens only in very specific cases and is unlikely to be the case with random problem instances

11


13/22

20 40 60 80 10010

1

102

103

104

105

Problem size

Numberofevaluations

UMDAGA (uniform)GA (twopoint)hBOA

(a) k = 2, step = 1

20 40 60 80 10010

2

103

104

Problem size

Numberofevaluations

UMDAGA (twopoint)GA (uniform)hBOA

(b) k = 5, step = 1

Figure 5: Average number of evaluations for k = 2 and k = 5 with step = 1.

2 3 4 510

3

104

105

Neighborhood size, k

Numberofflips

GA (twopoint)GA (uniform)hBOA

(a) step = 1

2 3 4 510

3

104

105

Neighborhood size, k

Numberofflips


(b) step = k + 1

Figure 6: Growth of the number of DHC flips with k for step = 1 (most overlap) and step = k + 1(no overlap). All results are for n = 120.

or real-world decomposable problems.The effects of k on performance were visualized in figure 6, which showed the growth of the

number of DHC flips until optimum with k for n = 120 and step {1, 6}; a similar relationshipcan be observed for the number of evaluations (results omitted). The results show that for allalgorithms included in the comparison, performance grows at least exponentially with the valueof k. Furthermore, the results show that both variants of GA are much more sensitive to thevalue of k than hBOA. This is not a surprising result because hBOA is capable of identifying thesubproblems and recombining solutions to respect the discovered decomposition whereas GA usesa fixed recombination strategy regardless of the problem. The results for other values of n and stepare qualitatively similar and are thus omitted.

The effects of step on performance are somewhat more intricate. Intuitively, instances where

all subproblems are independent should be easiest and this is also supported with all experimentalresults presented thus far. The results also indicate that the effects of overlap vary between thecompared algorithms as is shown in figure 13. More specifically, for both variants of GA, the mostdifficult percentage of overlap (relative to the size of the subproblems) is about 0.5, whereas forhBOA it is about 0.7.

As was mentioned earlier, time complexity of selectorecombinative GAs as well as hBOA isdirectly related to signal-to-noise ratio where the signal is the difference between the fitness con-

12


14/22

20 40 60 80 1000.4

0.5

0.6

0.7

0.8

Problem size

Nu

mberofflips(GA,uniform)/

Numberofflips(hBOA)


20 40 60 80 1000.5

0.75

1

1.25

Problem size

Nu

mberofflips(GA,uniform)/

Numberofflips(hBOA)


20 40 60 80 100

1

2

3

4

Problem size

Numberofflips(GA,uniform)/

Numberofflips(hBOA)


20 40 60 80 100

1

2

3

4

567

Problem size

Numberofflips(GA,uniform)/

Numberofflips(hBOA)


Figure 7: Ratio of the number of flips for GA with uniform crossover and hBOA.

Number of DHC flips until optimumn k step hBOA GA (uniform) GA (twopoint)

120 5 1 37,155 141,108 220,318120 5 2 40,151 212,635 353,748120 5 3 37,480 249,217 443,570120 5 4 27,411 195,673 310,894120 5 5 15,589 100,378 145,406120 5 6 9,607 35,101 47,576

Table 1: Comparison of the number of DHC flips until optimum for hBOA and GA. For all settings,the superiority of the results obtained by hBOA was verified with paired t-test with 99% confidence.

tributions of the best and the second best instances of a subproblem, and the noise models fitnesscontributions of other subproblems (Goldberg & Rudnick, 1991; Goldberg, Deb, & Clark, 1992).The smaller the signal-to-noise ratio, the larger the expected population size as well as the overallcomplexity of an algorithm. As was discussed above, the signal-to-noise ratio is influenced primarilyby the value of n; however, the signal-to-noise ratio also depends on the subproblems themselves.The influence of the signal-to-noise ratio on algorithm performance should be strongest for sepa-

rable problems with uniform scaling where all subproblems have approximately the same signal;for problems with overlap and nonuniform scaling, other factors contribute to instance difficulty aswell. Another important factor influencing problem difficulty of decomposable problems is the scal-ing of the signal coming from different subproblems (Thierens, Goldberg, & Pereira, 1998). Nextwe examine the influence of the signal-to-noise ratio and scaling on performance of the comparedalgorithms in more detail.

Figure 14 visualizes the effects of signal-to-noise ratio on the number of flips until optimum

13


15/22

20 40 60 80 1000.4

0.5

0.6

0.7

0.8

Problem size

N

umberofflips(GA,twopoint)/

Numberofflips(hBOA)


20 40 60 80 1000.5

0.75

1

1.25

Problem size

N

umberofflips(GA,twopoint)/

Numberofflips(hBOA)


20 40 60 80 100

1

2

3

4

Problem size

Numberofflips(GA,twopoint)/

Numberofflips(hBOA)


20 40 60 80 100

1

2

3

4

5

67

Problem size


Numberofflips(hBOA)


Figure 8: Ratio of the number of flips for GA with twopoint crossover and hBOA.

for n = 120, k = 5, and step {1, 6}; since UMDA was not capable of solving many of theseproblem instances in practical time, the results for UMDA are not included. The figure shows theaverage number of DHC flips until optimum for different percentages of instances with smallestsignal-to-noise ratios. To make the visualization more effective, the number of flips is normalizedby dividing the values by the mean number of flips over the entire set of instances. The resultsclearly show that for the separable problems (that is, step = 6), the smaller the signal-to-noiseratio, the greater the number of flips. However, for problem instances with strong overlap (that is,

step = 1), problem difficulty does not appear to be directly related to the signal-to-noise ratio andthe primary source of problem difficulty appears to be elsewhere.

Figure 15 visualizes the influence of scaling on the number of flips until optimum. The figureshows the average number of flips until optimum for different percentages of instances with smallestsignal variance. The larger the variance of the signal, the less uniformly the signal is distributed be-tween the different subproblems. For the separable problem (that is, step = 6), the more uniformlyscaled instances appear to be more difficult for all compared algorithms than the less uniformlyscaled ones. For instances with strong overlap (that is, step = 1), the effects of scaling on algorithmperformance are negligible; again, the source of problem difficulty appears to be elsewhere.

Two observations related to the signal-to-noise ratio and scaling are somewhat surprising:(1) Although scalability of selectorecombinative GAs gets worse with nonuniform scaling of sub-

problems, the results indicate that the actual performance is better on more nonuniformly scaledproblems. (2) Performance of the compared algorithms on problems with strong overlap does notappear to be directly affected by signal-to-noise ratio or signal variance. How could these resultsbe explained?

We believe that the primary reason why more uniformly scaled problems are more difficult forall tested algorithms is related to effectiveness of recombination. More specifically, practically anyrecombination operator becomes more effective when the scaling is highly nonuniform; on the other

14


16/22

20 40 60 80 1001.1

1.2

1.3

1.4

1.5

Problem size

Num

berofflips(GA,twopoint)/

Nu

mberofflips(GA,uniform)


20 40 60 80 100

1.1

1.2

1.3

1.4

Problem size

Num

berofflips(GA,twopoint)/

Nu



20 40 60 80 1001.1

1.2

1.3

1.4

1.5

Problem size


Numberofflips(GA,uniform)


20 40 60 80 1001.1

1.2

1.3

1.4

1.5

1.6

Problem size


Numberofflips(GA,uniform)


Figure 9: Ratio of the number of flips for GA with twopoint and uniform crossover.

hand, for uniformly scaled subproblems, fixed recombination operators are often expected to sufferfrom inefficient juxtaposition and frequent disruption of important partial solutions contained inthe optimum or building blocks (Goldberg, 2002; Thierens, 1999; Harik & Goldberg, 1996; Thierens& Goldberg, 1993).

For problems with strong overlap, the influence of the overlap appears to overshadow boththe signal-to-noise ratio and scaling. We believe that a likely reason for this is that the order ofinteractions that must be covered by the probabilistic model may increase due to the effects of the

overlap, which leads to a larger order of important building blocks that must be reproduced and juxtaposed to form the optimum. According to the existing population sizing theory (Goldberg& Rudnick, 1991; Harik et al., 1997; Pelikan et al., 2002), this should lead to an increase inthe population size required to solve the problem (Harik et al., 1997). We are currently exploringpossible approaches to quantifying the effects of overlap in order to confirm or deny this hypothesis.

6 Future Work

Probably the most important topic for future work in this area is to develop tools, which could beused to gain better understanding of the behavior of selectorecombinative genetic algorithms onadditively decomposable problems with overlap between subproblems. This is a challenging topicbut also a crucial one, because most difficult decomposable problems contain substantial overlapbetween the different subproblems. A good starting point for this research is the theoretical workon the factorized distribution algorithm (FDA) (Muhlenbein & Mahnig, 1998) and the scalabilitytheory for multivariate estimation of distribution algorithms (Pelikan, Sastry, & Goldberg, 2002;Pelikan, 2005; Yu, Sastry, Goldberg, & Pelikan, 2007; Muhlenbein, 2008).

It may also be useful to look at other random distributions for generating the subproblems.This avenue of research may split into at least two main directions. One may either bias the

15


17/22

Number of evaluations until optimumn k step hBOA GA (uniform) GA (twopoint)

120 5 1 7,414 16,519 34,696120 5 2 9,011 25,032 56,059120 5 3 9,988 30,285 72,359120 5 4 8,606 24,016 51,521

120 5 5 7,307 13,749 26,807120 5 6 7,328 6,004 10,949

Table 2: Comparison of the number of evaluations until optimum for hBOA and GA. For all settingsexcept for step = 6, the superiority of the results obtained by hBOA was verified with paired t-testwith 99% confidence.

100

101

100

101

Number of flips (hBOA)

Numberof

flips(GA,uniform)

Correlation coefficient= 0.493

100

10

0

102




100

10

0

102

Number of flips (GA, uniform)



Figure 10: Correlation between the number of flips for hBOA and GA for n = 120, k = 5 andstep = 1.

distribution used to generate subproblems in order to generate instances that are especially hardfor the algorithm under consideration with the goal of addressing weaknesses of this algorithm. Onthe other hand, one may want to generate problem instances that resemble important classes ofreal-world problems.

It may also be interesting to study instances of restricted and unrestricted NK landscapesfrom the perspective of the theory of elementary landscapes (Barnes, Dimova, Dokov, & Solomon,2003). Finally, the instances provided by methods described in this paper can be used to test otheroptimization algorithms and hybrids.

7 Summary and Conclusions

This paper described a class of nearest-neighbor NK landscapes with tunable strength of overlap

between consequent subproblems. Shuffling was introduced to eliminate tight linkage and makeproblem instances more challenging for algorithms with fixed variation operators. A dynamicprogramming approach was described that can be used to solve the described instances to opti-mality in low-order polynomial time. A large number of random instances of the described classof NK landscapes were generated. Several evolutionary algorithms were then applied to the gener-ated instances; more specifically, the paper considered the genetic algorithm (GA) with two-pointand uniform crossover and two estimation of distribution algorithms, specifically, the hierarchi-cal Bayesian optimization algorithm (hBOA) and the univariate marginal distribution algorithm

16


18/22

100

100


Nu



100

100

101




100

100

101

Number of flips (GA, uniform)



Figure 11: Correlation between the number of flips for hBOA and GA for n = 120, k = 5 andstep = 6.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Percent easiest hBOA instances

Average

numberofflips

(divide

dbymean)


(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.8

0.9

1

1.1

1.2

Percent easiest hBOA instances

Average

numberofflips

(divide

dbymean)


(b) step = 6

Figure 12: Performance of GA and hBOA as a function of instance difficulty for n = 120 and k = 5.

(UMDA). All algorithms were combined with a simple deterministic local searcher and niching was

used to maintain useful diversity. The results were analyzed and related to existing scalabilitytheory for selectorecombinative genetic algorithms.

hBOA was shown to outperform other algorithms included in the comparison on instances withlarge neighborhood sizes. The factor by which hBOA outperforms other algorithms for largestneighborhoods appears to grow faster than polynomially with problem size, indicating that thedifferences will become even more substantial for problems with larger neighborhoods and largerproblem sizes. This suggests that linkage learning is advantageous when solving the consideredclass of NK landscapes. The second best performance was achieved by GA with uniform crossover,whereas the worst performance was achieved by UMDA.

The complexity of all algorithms was shown to grow exponentially fast with the size of theneighborhood. The correlations between the time required to solve compared problem instances

were shown to be strongest for the two variants of GA; however, correlations were observed alsobetween other compared algorithms.

For problems with no overlap, the signal-to-noise ratio and the scaling of the signal in differentsubproblems were shown to be significant factors affecting problem difficulty. More specifically,the smaller the signal-to-noise ratio and the signal variance, the more difficult instances become.However, for problems with substantial amount of overlap between consequent problems, the effectsof overlap were shown to overshadow the effects of signal-to-noise ratio and scaling.

17


19/22

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 14000

8000

16000

32000

Percentage of overlap

Numberofflips

k=5k=4k=3k=2

(a) hBOA

0 0.10.20.30.40.50.60.70.80.9 12000

4000

8000

16000

32000

64000

128000

256000

512000

1024000


Numberofflips

k=5k=4k=3k=2

(b) GA (uniform)

0 0.10.20.30.40.50.60.70.80.9 12000

4000

8000

16000

32000

64000

128000

256000

512000

1024000


Numberofflips

k=5k=4k=3k=2

(c) GA (twopoint)

Figure 13: Influence of overlap for n = 120 and k = 5 (step varies with overlap).

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

0.975

1

1.025

1.05

1.075

Signal to noise percentile (% smallest)

Averagenumberofflips

(dividedbymean)

GA (twpoint)GA (uniform)hBOA

(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

0.975

1

1.025

1.05

1.075

Signal to noise percentile (% smallest)


(dividedbymean)

GA (twpoint)GA (uniform)hBOA

(b) step = 6

Figure 14: Influence of signal-to-noise ratio on the number of flips for n = 120 and k = 5.

Acknowledgments

This project was sponsored by the National Science Foundation under CAREER grant ECS-0547013, by the Air Force Office of Scientific Research, Air Force Materiel Command, USAF,under grant FA9550-06-1-0096, and by the University of Missouri in St. Louis through the HighPerformance Computing Collaboratory sponsored by Information Technology Services, and theResearch Award and Research Board programs.

The U.S. Government is authorized to reproduce and distribute reprints for government pur-poses notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the authors and do not necessarily reflectthe views of the National Science Foundation, the Air Force Office of Scientific Research, or theU.S. Government. Some experiments were done using the hBOA software developed by Martin

Pelikan and David E. Goldberg at the University of Illinois at Urbana-Champaign and most exper-iments were performed on the Beowulf cluster maintained by ITS at the University of Missouri inSt. Louis.

18


20/22

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

0.975

1

1.025

1.05

1.075

1.1

Signal variance percentile (% smallest)


(dividedbymean)


(a) step = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.95

0.975

1

1.025

1.05

1.075

1.1

Signal variance percentile (% smallest)


(dividedbymean)


(b) step = 6

Figure 15: Influence of signal variance on the number of flips for n = 120 and k = 5.

References

Ackley, D. H. (1987). An empirical study of bit vector function optimization. Genetic Algorithmsand Simulated Annealing, 170204.

Aguirre, H. E., & Tanaka, K. (2003). Genetic algorithms on nk-landscapes: Effects of selection,drift, mutation, and recombination. In Raidl, G. R., et al. (Eds.), Applications of EvolutionaryComputing: EvoWorkshops 2003 (pp. 131142).

Altenberg, L. (1997). NK landscapes. In Back, T., Fogel, D. B., & Michalewicz, Z. (Eds.), Hand-book of Evolutionary Computation (pp. B2.7:510). Bristol, New York: Institute of PhysicsPublishing and Oxford University Press.

Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic searchbased function optimization and competitive learning(Tech. Rep. No. CMU-CS-94-163). Pitts-burgh, PA: Carnegie Mellon University.

Barnes, J. W., Dimova, B., Dokov, S. P., & Solomon, A. (2003). The theory of elementary

landscapes. Appl. Math. Lett., 16(3), 337343.

Cheeseman, P., Kanefsky, B., & Taylor, W. M. (1991). Where the really hard problems are.Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-91) , 331337.

Chickering, D. M., Heckerman, D., & Meek, C. (1997). A Bayesian approach to learning Bayesiannetworks with local structure (Technical Report MSR-TR-97-07). Redmond, WA: MicrosoftResearch.

Choi, S.-S., Jung, K., & Kim, J. H. (2005). Phase transition in a random NK landscape model.pp. 12411248.

Coffin, D. J., & Smith, R. E. (2007). Why is parity hard for estimation of distribution algorithms?

Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2007), 624624.

Deb, K., & Goldberg, D. E. (1991). Analyzing deception in trap functions (IlliGAL Report No.91009). Urbana, IL: University of Illinois at Urbana-Champaign, Illinois Genetic AlgorithmsLaboratory.

Friedman, N., & Goldszmidt, M. (1999). Learning Bayesian networks with local structure. InJordan, M. I. (Ed.), Graphical models (pp. 421459). Cambridge, MA: MIT Press.

19


21/22

Gao, Y., & Culberson, J. C. (2002). An analysis of phase transition in NK landscapes. Journalof Artificial Intelligence Research (JAIR), 17, 309332.

Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Read-ing, MA: Addison-Wesley.

Goldberg, D. E. (2002). The design of innovation: Lessons from and for competent geneticalgorithms, Volume 7 ofGenetic Algorithms and Evolutionary Computation. Kluwer Academic

Publishers.

Goldberg, D. E., Deb, K., & Clark, J. H. (1992). Genetic algorithms, noise, and the sizing ofpopulations. Complex Systems , 6, 333362.

Goldberg, D. E., & Rudnick, M. (1991). Genetic algorithms and the variance of fitness. ComplexSystems, 5(3), 265278. Also IlliGAL Report No. 91001.

Harik, G. R. (1995). Finding multimodal solutions using restricted tournament selection. Pro-ceedings of the International Conference on Genetic Algorithms (ICGA-95), 2431.

Harik, G. R., Cantu-Paz, E., Goldberg, D. E., & Miller, B. L. (1997). The gamblers ruin problem,genetic algorithms, and the sizing of populations. Proceedings of the International Conferenceon Evolutionary Computation (ICEC-97), 712. Also IlliGAL Report No. 96004.

Harik, G. R., & Goldberg, D. E. (1996). Learning linkage. Foundations of Genetic Algorithms , 4 ,247262.

Heckerman, D., Geiger, D., & Chickering, D. M. (1994). Learning Bayesian networks: Thecombination of knowledge and statistical data (Technical Report MSR-TR-94-09). Redmond,WA: Microsoft Research.

Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: Universityof Michigan Press.

Juels, A. (1998). The equilibrium genetic algorithm. Submitted for publication.

Kauffman, S. (1989). Adaptation on rugged fitness landscapes. In Stein, D. L. (Ed.), LectureNotes in the Sciences of Complexity (pp. 527618). Addison Wesley.

Kauffman, S. (1993). The origins of order: Self-organization and selection in evolution. OxfordUniversity Press.

Muhlenbein, H. (2008). Convergence of estimation of distribution algorithms for finite samples(Technical Report). Sankt Augustin, Germany: Fraunhofer Institut Autonomous intelligentSystems.

Muhlenbein, H., & Mahnig, T. (1998). Convergence theory and applications of the factorizeddistribution algorithm. Journal of Computing and Information Technology, 7(1), 1932.

Muhlenbein, H., & Paa, G. (1996). From recombination of genes to the estimation of distribu-tions I. Binary parameters. Parallel Problem Solving from Nature , 178187.

Pelikan, M. (2005). Hierarchical Bayesian optimization algorithm: Toward a new generation ofevolutionary algorithms. Springer.

Pelikan, M., & Goldberg, D. E. (2001). Escaping hierarchical traps with competent geneticalgorithms. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), 511518. Also IlliGAL Report No. 2000020.

Pelikan, M., & Goldberg, D. E. (2003). Hierarchical BOA solves Ising spin glasses and maxsat.Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2003), II,12751286. Also IlliGAL Report No. 2003001.

20


22/22

Pelikan, M., Sastry, K., Butz, M. V., & Goldberg, D. E. (2006). Performance of evolutionaryalgorithms on random decomposable problems. Parallel Problem Solving from Nature, 788797.

Pelikan, M., Sastry, K., Butz, M. V., & Goldberg, D. E. (2008). Analysis of estimation ofdistribution algorithms and genetic algorithms on NK landscapes. pp. 10331040.

Pelikan, M., Sastry, K., & Goldberg, D. E. (2002). Scalability of the Bayesian optimizationalgorithm. International Journal of Approximate Reasoning, 31 (3), 221258. Also IlliGALReport No. 2001029.

Santarelli, S., Goldberg, D. E., & Yu, T.-L. (2004). Optimization of a constrained feed networkfor an antenna array using simple and competent genetic algorithm techniques. Proceedingsof the Workshop Military and Security Application of Evolutionary Computation (MSAEC-2004).

Sastry, K. (2001). Evaluation-relaxation schemes for genetic and evolutionary algorithms. Mas-ters thesis, University of Illinois at Urbana-Champaign, Department of General Engineering,Urbana, IL. Also IlliGAL Report No. 2002004.

Sastry, K., Pelikan, M., & Goldberg, D. E. (2007). Empirical analysis of ideal recombination on

random decomposable problems. pp. 13881395.

Thierens, D. (1995). Analysis and design of genetic algorithms. Doctoral dissertation, KatholiekeUniversiteit Leuven, Leuven, Belgium.

Thierens, D. (1999). Scalability problems of simple genetic algorithms. Evolutionary Computa-tion, 7(4), 331352.

Thierens, D., & Goldberg, D. (1994). Convergence models of genetic algorithm selection schemes.Parallel Problem Solving from Nature , 116121.

Thierens, D., & Goldberg, D. E. (1993). Mixing in genetic algorithms. Proceedings of the Inter-national Conference on Genetic Algorithms (ICGA-93), 3845.

Thierens, D., Goldberg, D. E., & Pereira, A. G. (1998). Domino convergence, drift, and thetemporal-salience structure of problems. Proceedings of the International Conference on Evo-lutionary Computation (ICEC-98), 535540.

Wright, A. H., Thompson, R. K., & Zhang, J. (2000). The computational complexity of N-Kfitness functions. IEEE Transactions Evolutionary Computation, 4 (4), 373379.

Yu, T.-L., Sastry, K., Goldberg, D. E., & Pelikan, M. (2007). Population sizing for entropy-based model building in estimation of distribution algorithms. Proceedings of the Geneticand Evolutionary Computation Conference (GECCO-2007), 601608.

21

Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and...

Documents

Transcript of Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and...