Introduction to Genetic Algorithms

36
Genetic Genetic Algorithms Algorithms A Concise A Concise Introduction Introduction Arber Borici March 30, 2010

description

An introduction to Genetic Algorithms with an example application on the NP-Complete Set-Covering Problem.

Transcript of Introduction to Genetic Algorithms

  • 1. Genetic AlgorithmsA Concise Introduction Arber Borici March 30, 2010

2. Outline GAs: Powerful Metaheuristics The Canonical Algorithm Hybrid GAs Areas of Application Integer Programming Multi-Objective Optimization An Example: Set Covering Problem Application on binary image compression 2 3. What is a GA? A GA is an intelligent stochastic searchmetaheuristic that simulates evolutionarynatural selection Intelligent? Stochastic search Developed by J. Holland 75Univ. Mich. Advanced variations Simple, Steady-State, Hybrid, etc. Genetic Programming (Koza90)G. P. 3 4. GA Procedure Start with a random or predefined population Individuals reproduce, mutate, die Each individual has a relative fitness Genes from highly-fit individuals survive thenext generation Fitness is the crux of GAs Run GA for N generations or until somecriteria are met An infinite run would yield some optimal result4 5. Some Terminology Chromosomes as simple data structures5 6. GA PseudocodeGenerate initial population PEvaluate fitness for each individualREPEAT- Select parents from P- Recombine selected parents- Evaluate offspring fitness- Replace low-fit individuals with offspringUNTIL some criteria are met 6 7. Generating the Population Problem-dependent Merely random individuals Some predefined configuration Admixture of individuals Each individual has a uniform structure Integer-based representation Alleles as groups of bits Individual = chromosome 7 8. The Fitness Function Suppose you are searching for a solution You start with a random population The fitness function could be the distancebetween the current solution to the finalsolution The smaller the distance, the higher the fitness Low-fit individuals will eventually die Highly-fit individuals will reproduce, i.e.transfer genes to offspring 8 9. Fitness(contd) Fitness determines the strength of the GA Relative fitness is computationally suitable The Schema Theorem (Holland75): Each generation, highly-fit individuals survive andproduce offspring with even higher fitness The fittest individual of the last generation shall bethe solution to the search problem Fitness functions vary in complexity andcomputability9 10. Inside the Loop: Selection Selection is done at every generation A stochastic operation Two individuals are chosen until a pool ofoffspring is constructed Selection determines who will live and whowill die Offspring are first evaluated (fitness) Highly-fit offspring will replace low-fitindividuals10 11. Inside the Loop: Selection Several strategies: Ranking Roulette wheel Tournament Stochastic remainder sampling Stochastic uniform sampling Strategies are empirically and theoreticallyemployed11 12. Inside the Loop: Recombination Parents recombine to produce offspring: Gene regions are exchanged Locations are heuristically or randomly selectedCrossover Note the swapped genes Offspring have a different (usually better) fitness12 13. Inside the Loop: Replacement Several strategies: Replace extrema Stochastic replacement Crowding (most similar) Replace just parents Theoretical and empirical judgment 13 14. Ending the Loop: Criteria Number of generations Apocalypse time at a predefined generationnumber (Sub-)Optimal solution A satisfying solution has been attained Requires careful judgment Problem: Premature convergence 14 15. Premature Convergence Crossover and mutation too effective No gene variation quick convergence Likelihood of finding new individuals (solutions)decreases Two approaches: DeJong Crowding: replace most similar solutions Goldbergs Fitness Scaling: decrease fitness ofmost similar individuals15 16. Premature ConvergenceOur objective is to attainPremature convergence atsome maximum valuecertain local niches Figure: CodeProject 16 17. General Parameters Population size Problem-dependent A small size does not imply derated optimality Number of generations What is a good stopping time? Crossover and mutation probabilities Naturally, mutation has a minute rate In simulations, rates vary per empirical judgment 17 18. Some Properties Implicit Parallelism GA searches many solutions implicitly at the sametime (schema of chromosomes) Binary alphabets offer the largest number ofschema Standalone GAs often provide sub-optimalsolutions for very large problem sizes Optimality is expected 18 19. Hybrid GAs GAs may be combined with local searchheuristics Local search on planes input by the GA If niche is reached, it has the highest fitness Much like hill-climbing Great impact on efficiency and speed Employed in hard optimization problems19 20. Applications Optimization Problems Single and multi-objective Constrained and unconstrained Eg. NP-hard problems (TSP, SPP, SCP,scheduling, etc.); Economics (loads ofoptimization of problems); Artificial and Biological Systems Gene profiling Computational creativity 20 21. Case Study: Set-Covering Problem SCP is a classical CS problem Discrete combinatorial optimization Applicable to many real-world problems SCP is NP-Hard SCP is formulated as a typical costminimization problem Real-world areas: bus/airline scheduling,resource allocation, nurse scheduling, etc. 21 22. SCP: Formulation Given a binary X-by-Y matrix, cover ALL therows with the smallest number of columns At minimum cost, if columns have somecorresponding cost Mathematically: YMin Z = ccol xcolcol =1Subject to all rows being covered!s.t. Mx = 1 If column i is chosen, then xcol = 1. M , x {0,1}Y22 23. SCP: Example1 23 4 51 1 10 0 0 12 1 10 0 1 13 0 11 0 0 14 0 00 1 0 ? 15 0 00 1 0 16 1 00 0 0 17 0 01 0 0 123 24. Example 1 2 3 45 1 1 1 0 00 1 2 1 1 0 01 1 3 0 1 1 00 1 4 0 0 0 10 Use 1, 3, 4 1 5 0 0 0 10 1 6 1 0 0 01 1 7 0 0 1 00 1 24 25. Example 1 2 3 45 1 1 1 0 00 1 2 1 1 0 01 1 3 0 1 1 00 1 4 0 0 0 10 Use 1, 3, 4 1 5 0 0 0 10 1 6 1 0 0 01 1 7 0 0 1 00 1 25 26. GA Approach GAs purpose: select those columns whoseunion will be a unit vector How do we model a chromosome? What parameter values do we choose? Crossover rate = ? Mutation rate = ? Type of GA? Any local search heuristics? Fitness? 26 27. SCP Chromosome Chromosome = a vector u of columns u[ i ] = 1 the i-th column of the matrix isconsidered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1 0 1 1 0 0 0 0 0 1 0 27 28. SCP Chromosome Chromosome = a vector u of columns u[ i ] = 1 the i-th column of the matrix isconsidered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1Not optimal 1 0 1 1 0 0 1 1 0 0 0 0 0 1 0 28 29. SCP Chromosome Chromosome = a vector u of columns u[ i ] = 1 the i-th column of the matrix isconsidered for the overall covering 1 2 3 4 5 1 1 0 0 0 1 1 0 0 1Minimal 0 1 0 1 0 0 1 1 0 01 0 1 1 0 0 0 0 1 0 29 30. SCP: GA Parameters Generate a fixed number of binary vectors u Population size depends on SCP matrix size Empirically, 60 300 individuals Mutation rate should be very small Usually, pm = 0.1% to pm = 1% Higher pm may destroy highly-fit individuals Crossover rate Unless other operators are used, pc= 1 - pm30 31. Fitness Function We are trying to minimize the total cost ofselect columns If C(i) > C(j), i, j are individuals, then F(i) < F(j) The most highly-fit individual has the smallest costscore That could be the weak fitness function In real-world problems (e.g. bus scheduling)more constraints are involved: fares, unions,gas, and so forth31 32. SCP: GA Type A canonical GA will converge prematurely Eventually, crossover will cause vectors u to beindistinguishable Strategy: Use a hybrid GA Some established local search heuristic E.g. Nelder simplex search (Numerical Recipesin C, 1989) Computationally expensive for large matrices! Consider parallel GAs 32 33. SCP: Binary Image Compression A binary image is a collection of black andwhite pixels: Representedas a matrix ofzeros and ones Byconvention, zero representsa white pixel, while onerepresents a black pixel Consider partitioning the image into 8x8matrices (or blocks) Compress the image by compressing each block33 34. Compressing Blocks The block comprises 0s and clusters of 1s How can we efficiently encode those clusters?- One line- Two points- One rectangle 34 35. Compressing Blocks Problem: Group 1s using a number ofProblemgeometric shapes, such that the total cost(expressed in bits) is at minimum. First, convert blocks into an SCP matrix, whosecolumns are all possible geometric shapes Then, choose those columns that cover all rows Example 35 36. Conclusions GAs are powerful search metaheuristics Survival-of-the-fittest paradigm Implicit parallelism: Schema Theorem Not random! Stochasticity in mutation/crossover GA crux: modeling the fitness function The fitness model determines GA capability Careful empirical/theoretical considerations GAs are computationally expensive Consider distributing GA computations 36