Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

download Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

of 17

  • date post

    17-May-2015
  • Category

    Documents

  • view

    1.171
  • download

    0

Embed Size (px)

description

Epistasis correlation is a measure that estimates the strength of interactions between problem variables. This paper presents an empirical study of epistasis correlation on a large number of random problem instances of NK landscapes with nearest neighbor interactions. The results are analyzed with respect to the performance of hybrid variants of two evolutionary algorithms: (1) the genetic algorithm with uniform crossover and (2) the hierarchical Bayesian optimization algorithm. http://medal.cs.umsl.edu/files/2011002.pdf

Transcript of Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions

  • 1. Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Martin Pelikan Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/ pelikan@cs.umsl.edu Download MEDAL Report No. 2011002http://medal.cs.umsl.edu/files/2011002.pdfMartin PelikanAnalysis of Epistasis Correlation on NK Landscapes

2. Motivation Problem diculty measures Important for understanding and estimating problem diculty. Should be useful in designing, chosing and setting up optimization algorithms. Most past work considers few isolated instances. This study Focuses on measures of epistasis (variable interactions). Analyzes epistasis measures on a large number of instances of nearest-neighbor NK landscapes. Compares the measures with actual performance of hybrid GA. Complements last years GECCO paper on other measures.Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 3. Outline1. Epistasis.2. Epistasis variance and epistasis correlation.3. NK landscapes.4. Experiments.5. Conclusions and future work.Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 4. Epistasis Epistasis Epistasis refers to interactions between problem variables. Eects of one variable depend on values of other variable(s). In biology phenotype mapping of a gene is aected by another. Why should we care? Absence of epistasis indicates a simple, linear problem. Epistasis may make a problem more dicult.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes 5. Critical View on Epistasis Criticism Epistasis is of little use unless we understand its nature.There exist many easy problems with high epistasis.There exist many hard problems with little epistasis. Epistasis is dicult to measure using nite samples. Examples Epistasis in a dicult problemNeedle in a haystack.Deceptive problem. Epistasis in a simple problemOnemax with additional contribution of optimum (simple).Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 6. Linear Fitness Approximation Linear tness approximation Assume candidate solutions are n-bit binary srings. Assume population P of N solutions. Pi (vi ) denotes solutions with vi {0, 1} in position i. Ni (vi ) is the number of solutions in Pi (vi ). fi (vi ) approximates contribution of vi to tness 1 fi (vi ) = f (x) f (P )Ni (vi ) xPi (vi ) Approximate tness as follows n flin (X1 , X2 , . . . , Xn ) =fi (Xi ) + f (P ).i=1Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 7. Epistasis Variance Epistasis variance (Davidor, 1990) In short: Sum of square dierences between f and flin . Epistasis variance P (f ) is dened as1P (f ) = (f (x) flin (x))2NxPMartin PelikanAnalysis of Epistasis Correlation on NK Landscapes 8. Epistasis Correlation Epistasis correlation (Rochet et al., 1997) In short: Correlation coecient between f and flin . Sum of square dierences between f and its average f (P )2sP (f ) =f (x) f (P )xP Sum of square dierences between flin and its average flin (P ) 2 sP (flin ) = flin (x) flin (P )xP Epistasis correlation P (f ) is dened asxP f (x) f (P ) flin (P ) flin (P ) epicP (f ) = sP (f )sP (flin )Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 9. Evaluating Epistasis Measures Epistasis variance Not invariant w.r.t. linear transformations of f . Not within a xed range of values. Smaller epistasis variance indicates weaker epistasis. Epistasis correlation Inviariant w.r.t. linear transformations of f . Value is within range [0, 1]. Greater epistasis correlation indicates weaker epistasis.Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 10. Experiments: Algorithms Genetic algorithm (Holland, 1975) Uniform crossover. Bit-ip mutation. Tournament selection. Restricted tournaments for niching. Steepest ascent hill climber for local search. Hierarchical BOA (Pelikan et al., 2001) Variation by learning and sampling Bayesian networks with decision trees. Tournament selection. Restricted tournaments for niching. Steepest ascent hill climber for local search.Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 11. Experiments: Problems NK landscapes with nearest neighbors Dened on n-bit binary strings. Fitness is sum of n subproblems of order k + 1. Subproblem i uses ith variable and the following k variables. Neighborhoods wrap around (as on a circle). Subproblems dened as lookup tables generated from [0, 1). Example for n = 6 and k = 2f (X1 , . . . , X6 ) = f1 (X1 , X2 , X3 )+ X1X2X3f1 () = f2 (X2 , X3 , X4 )+ 0 0 0 0.51 = f3 (X3 , X4 , X5 )+ 0 0 1 0.18 = f4 (X4 , X5 , X6 )+ 0 1 0 0.97 = f5 (X5 , X6 , X1 )+ 0 1 1 0.68 = f6 (X6 , X1 , X2 )1 0 0 0.47 ... 1 0 1 0.73 1 1 0 0.06 1 1 1 0.41Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes 12. Experiments: Problems NK parameters k {2, 3, 4, 5, 6} n {20, 30, 40, 50, 60, 70, 80, 90, 100} For each (n, k), we use 10,000 instances. Diculty of nearest-neighbor NK landscapes Diculty grows with k. Polynomially solvable using dynamic programming. For larger n and k, hBOA outperforms GA.Martin Pelikan Analysis of Epistasis Correlation on NK Landscapes 13. Results: Scatter Plot for hBOA Epistasis correlation decreases with k (expected). For any k, epistasis correlation does not seem to closely correspond to the actual problem diculty.Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 14. Results: Epistasis Correlation vs. n and k for hBOA1 1 1 0.9 k=2k=2 0.9 Avg. epistasis0.9 n=100 corr., Avg. epista Epistasis correlationEpistasis correlationEpistasis correlation 0.8 k=3k=3 0.80.8 0.7 k=4k=4 0.70.7 0.6 k=5k=5 0.60.6 k=6k=6 0.50.50.5 0.40.40.4 0.30.30.3 0.20.20.2 0.10.10.10 00 6080 100 1202040 6080 100 120 2 3 4 52 63 4umber of bits, n Number of bits, nNumber of neighbors, kNumber of nerrelation with respect(a) Epistasis correlation with respect (b) Epistasis correlation with respect (b) Epistasis correlatioto n. Epistasis correlation does notto k.change with n. to k. Epistasis correlation increases with k.re 3: Epistasis correlation with respectof bits and the number k and the nu elation with respect to the number n to the number n of bits of neighboest-neighbor NK landscapes. dscapes.creasedfact, for GA, the resultsis. In level of epistasis. In fact, forcorrelation does not provide a single input provide aGA, the results correlation does not for the punderstanding of epistasis andagreement with our understanding tioner to assess problem diculty, even if we assumeof epistasis andtioner to assess problem dicuem diculty of k, although thearger values even for larger values ofthe problem sizeofand the order of interactions are xed Martin Pelikan k, although the Epistasis Correlation on NK Landscapesthe orderAnalysisthe problem size and 15. 5)5)50% easiestall instances 26684.2 (3427.8)11445.0 (14255.6) 0.4026 (0.025)0.3079 (0.020)all instances 84375.6 (119204.9)0.305)5)Results:all Problem Diculty and Epistasis Correlation(149074.5) 50% hardest 35968.5 (26303.0) 0.4009 (0.025)instances 25903.7 (14931.0) 0.3072 (0.020) 25% hardest 44928.0 (30885.2) 0.3993 (0.025)50% hardest 40362.3 (16730.7) 0.3068 (0.020) 50% hardest 139492.0 25% hardest 209536.2 (185682.5)0.300.305)25% hardest10% hardest 58353.5 (36351.8)57288.4 (19617.4) 0.3989 (0.025)0.3071 (0.020)10% hardest 335718.7 (242644.0) 0.305)10% hardest 85279.2 (44200.4) 0.3970 (0.025)GA for n = 80 and hBOA, n = 100, k = 6 (e) k = 6:(e) GA (uniform), n = 80, k =(d) GA (uniform), n = 100, k=5desc. ofDHC steps untilepistasisdesc. ofDHC steps until episdesc. ofinstancesDHC steps untiloptimumepistasis correlationinstances optimum corrinstances10% easiestoptimum(2929.9)21364.1correlation 0.2349 (0.016) 10% easiest 15208.7 (3718.2)0.230)10% easiest25% easiest13898.4 (2852.6)26787.7 (5261.6) 0.3099 (0.020) 0.2351 (0.016) 25% easiest 22427.3 (6968.4)0.230)25% easiest50% easiest19872.9 (5765.2)34276.6 (8833.1) 0.3098 (0.020) 0.2348 (0.016) 50% easiest 34855.5 (14722.9) 0.230)50% easiestall instances 29259.2 (11063.4)60774.8 (42442.8)0.3087 (0.020) 0.2344 (0.016) all instances 117021.4 (204462.0) 0.230)all instances 84375.6 (119204.9)50% hardest 87272.9 (46049.2)0.3079 (0.020) 0.2339 (0.016) 50% hardest 199187.4 (264378.2) 0.230)50% hardest 139492.0 (149074.5)25% hardest 114418.9 (52085.3) 0.3070 (0.020) 0.2340 (0.016) 25% hardest 310451.2 (338773.5) 0.230)25% hardest 209536.2 (185682.5)10% hardest 154912.8 (62794.1) 0.3068 (0.020) 0.2341 (0.016) 10% hardest 519430.9 (461122.7) 0.230)10% hardest 335718.7 (242644.0)0.3058 (0.019)hBOA for n =(e) GA (uniform), n = 80,easy 6100 and k = 6: Table 1: Epistasis correlation for k = and hard in-Table 2: Epistasis correlation for easy stances for hBOA. The diculty of instances is mea-stances for GA with uniform crossove desc. ofDHC steps until epistasis sured by the overall number of steps of the localculty of instances is measured by the ov instances optimum correlation searcher.of steps of the local searcher.16)10% easiest 15208.7 (3718.2)0.2358 (0.018)16)25% easiest 22427.3 (6968.4)0.2358 (0.018)16)50% easiest 34855.5 (14722.9) 0.2353 (0.018)16)all instances 117021.4 (204462.0) 0.2344 (0.018)16)50% hardest 199187.4 (264378.2) 0.2335 (0.018)16)25% hardest 310451.2 (338773.5) 0.2330 (0.018)16)10% hardest 519430.9 (461122.7) 0.2324 (0.018)ard in- Table 2: Epistasis correlation for easy and hard in- For xed n and k, epistasis correlation changes only a little.s mea-stances for GA with uniform crossover. The di-e local culty of instances is measured by dicult problems, but the Epistasis is stronger for more the overall numberof dierences are nearly negligible. steps of the local searcher. Martin PelikanAnalysis of Epistasis Correlation on NK Landscapes 16. Conclusions and Future Work Conclusions For NK landscapes, epistasis correlation is certainly not useless, it provided some input on problem diculty of NK landscapes. Epistasis correlation succ