CSM6120 Introduction to Intelligent Systems

Post on 23-Feb-2016

31 views 0 download

description

CSM6120 Introduction to Intelligent Systems. Evolutionary and Genetic Algorithms. Informal biological terminology. Genes Encoding rules that describe how an organism is built up from the tiny building blocks of life Chromosomes Long strings formed by connecting genes together Recombination - PowerPoint PPT Presentation

Transcript of CSM6120 Introduction to Intelligent Systems

rkj@aber.ac.uk

CSM6120Introduction to Intelligent

SystemsEvolutionary and Genetic Algorithms

Informal biological terminology Genes

Encoding rules that describe how an organism is built up from the tiny building blocks of life

Chromosomes Long strings formed by connecting genes together

Recombination Process of two organisms mating, producing

offspring that may end up sharing genes of their parents

Basic ideas of EAs An EA is an iterative procedure which evolves

a population of individuals Each individual is a candidate solution to a given

problem Each individual is evaluated by a fitness function,

which measures the quality of its candidate solution

At each iteration (generation): The best individuals are selected Genetic operators are applied to selected

individuals in order to produce new individuals (offspring)

New individuals are evaluated by fitness function

TaxonomySearch Techniques

Informed Uninformed

BFSDFS

A* Hill Climbing

Simulated Annealing

Evolutionary Algorithms

Genetic Programming

Genetic Algorithms

Swarm Intelligence

Evolutionary Strategies

The Genetic Algorithm Directed search algorithms based on the

mechanics of biological evolution

Developed by John Holland, University of Michigan (1970s) To understand the adaptive processes of natural

systems To design artificial systems software that retains

the robustness of natural systems

Provide efficient, effective techniques for optimization and machine learning applications

Some GA applicationsDomain Application Types

Control gas pipeline, pole balancing, missile evasion, pursuit

Design semiconductor layout, aircraft design, keyboard configuration, communication networks

Scheduling manufacturing, facility scheduling, resource allocation

Robotics trajectory planning

Machine Learning designing neural networks, improving classification algorithms, classifier systems

Signal Processing filter design

Game Playing poker, checkers, prisoner’s dilemma

Combinatorial Optimization

set covering, travelling salesman, routing, bin packing, graph colouring and partitioning

Application: function optimisation (1)

-10 -5 0 5 10

0

0.5

1

1.5

2

2.5

3

3.5

4

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

f(x) = x2 g(x) = sin(x) - 0.1 x + 2

h(x,y) = x.sin(4x) - y.sin(4y+ ) + 1

Application: function optimisation (2) Conventional approaches:

Often requires knowledge of derivatives or other specific mathematical technique

Evolutionary algorithm approach: Requires only a measure of solution quality

(fitness function)

Components of a GAA problem to solve, and ...

Encoding technique (gene, chromosome) Initialization procedure (creation) Evaluation function (environment) Selection of parents (reproduction) Genetic operators (mutation, recombination) Parameter settings (practice and art)

GA terminology Population

The collection of potential solutions (i.e. all the chromosomes)

Parents/Children Both are chromosomes Children are generated from the parent chromosomes

Generations Number of iterations/cycles through the GA process

Simple GAinitialize population;evaluate population;

while TerminationCriteriaNotSatisfied{

select parents for reproduction;perform recombination and mutation;evaluate population;

}

The GA cycle

selection

population evaluation

modification

discard

deleted members

parents

children

modifiedchildren

evaluated children

recombinationchosenparents

PopulationChromosomes could be:

Bit strings (0101 ... 1100)

Real numbers (43.2 -33.1 ... 0.0 89.2)

Permutations of element (E11 E3 E7 ... E1 E15)

Lists of rules (R1 R2 R3 ... R22 R23)

Program elements (genetic programming)

... any data structure ...

Representation of an individual can be using discrete values (binary, integer, or any other system with a discrete set of values)

The following is an example of binary representation: CHROMOSOME

GENE

Example: Discrete representation

0 0 1 11 0 1 0

8 bits Genotype

Phenotype:

• Integer

• Real Number

• Schedule

• ...

• Anything?

Example: Discrete representation

0 0 1 11 0 1 0

Phenotype could be integer numbers

Genotype:

1*27 + 0*26 + 1*25 + 0*24 + 0*23 + 0*22 + 1*21 + 1*20 =128 + 32 + 2 + 1 = 163

= 163Phenotype:

Example: Discrete representation

0 0 1 11 0 1 0

Phenotype could be real numberse.g. a number between 2.5 and 20.5 using 8

binary digits

9609.135.25.202561635.2 x

= 13.9609Genotype: Phenotype:

Example: Discrete representation

0 0 1 11 0 1 0

Phenotype could be a schedulee.g. 8 jobs, 2 time steps

Genotype:

=

12345678

21211122

Job Time Step

Phenotype

Example: Discrete representation

0 0 1 11 0 1 0

A very natural encoding if the solution we are looking for is a list of real-valued numbers, then encode it as a list of real-valued numbers! (i.e., not as a string of 1s and 0s)

Lots of applications, e.g. parameter optimisation

Example: Real-valued representation

Representation Task – how to represent the travelling

salesman problem (TSP)?

Find a tour of a given set of cities so that Each city is visited only once The total distance travelled is minimised

RepresentationOne possibility - an ordered list of city numbers (this is known as an order-based GA)

1) London 3) Dunedin 5) Beijing 7) Tokyo2) Venice 4) Singapore 6) Phoenix 8) Victoria

Chromosome 1 (3 5 7 2 1 6 4 8)Chromosome 2 (2 5 7 6 8 1 3 4)

Selection

selection

population

parents

Selection Need to choose which chromosomes to use

based on their ‘fitness’ Why not choose the best chromosomes?

We want a balance between exploration and exploitation

Roulette wheel selection

Rank-based selection 1st step

Sort (rank) individuals according to fitness Ascending or descending order (minimization or

maximization)

2nd step Select individuals with probability proportional to their

rank only (ignoring the fitness value) The better the rank, the higher the probability of being

selected

It avoids most of the problems associated with roulette-wheel selection, but still requires global sorting of individuals, reducing potential for parallel processing

Tournament selection A number of “tournaments” are run

Several chromosomes chosen at random The chromosome with the highest fitness is

selected each time Larger tournament size means that weak

chromosomes are less likely to be selected

Advantages It is efficient to code It works on parallel architectures 

The GA cycle

selection

population evaluation

modification

discard

deleted members

parents

children

modifiedchildren

evaluated children

recombinationchosenparents

Crossover: recombination

P1 (0 1 1 0 1 0 1 1) (1 1 0 1 1 0 1 1) C1

P2 (1 1 0 1 1 0 0 1) (0 1 1 0 1 0 0 1) C2

Crossover is a critical feature of GAs: It greatly accelerates search early in evolution of a

population It leads to effective combination of sub-solutions on

different chromosomes Several methods for crossover exist…

Crossover How would we implement crossover for TSPs?

Parent 1 (3 5 7 2 1 6 4 8)Parent 2 (2 5 7 6 8 1 3 4)

Crossover

Parent 1 (3 5 7 2 1 6 4 8)Parent 2 (2 5 7 6 8 1 3 4)

Child 1 (3 5 7 6 8 1 3 4)Child 2 (2 5 7 2 1 6 4 8)

Mutation: local modificationBefore: (1 0 1 1 0 1 1 0)

After: (0 1 1 0 0 1 1 0)

Before: (1.38 -69.4 326.44 0.1)

After: (1.38 -67.5 326.44 0.1)

Causes movement in the search space(local or global)

Restores lost information to the population

Mutation Given the representation for TSPs, how could

we achieve mutation?

Mutation involves reordering of the list:

* *Before: (5 8 7 2 1 6 3 4)

After: (5 8 6 2 1 7 3 4)

Mutation

Note Both mutation and crossover are applied

based on user-supplied probabilities

We usually use a fairly high crossover rate and fairly low mutation rate Why do you think this is?

Evaluation of fitness

The evaluator decodes a chromosome and assigns it a fitness measure

The evaluator is the only link between a classical GA and the problem it is solving

evaluation

modifiedchildren

evaluated children

Fitness functions Evaluate the ‘goodness’ of chromosomes

(How well they solve the problem)

Critical to the success of the GA

Often difficult to define well

Must be fairly fast, as each chromosome must be evaluated each generation (iteration)

Fitness functions Fitness function for the TSP?

(3 5 7 2 1 6 4 8)

As we’re minimizing the distance travelled, the fitness is the total distance travelled in the journey defined by the chromosome

Deletion

Generational GA:entire populations replaced with each iteration

Steady-state GA:a few members replaced each generation

population

discard

deleted members

The GA cycle

selection

population evaluation

modification

discard

deleted members

parents

children

modifiedchildren

evaluated children

recombinationchosenparents

Stopping! The GA cycle continues until

The system has ‘converged’; or A specified number of iterations (‘generations’)

has been performed

An abstract example

Distribution of Individuals in Generation 0

Distribution of Individuals in Generation N

Good demo of the GA components http://www.obitko.com/tutorials/genetic-algorithms

/example-function-minimum.php

TSP example: 30 cities

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

y

x

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

y

x

TSP30 (Performance = 941)

44626967786462544250404038213567606040425099

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

y

x

TSP30 (Performance = 800)

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

y

x

TSP30 (Performance = 652)

423835262135327

3846445860697678716967628494

0

20

40

60

80

100

120

0 10 20 30 40 50 60 70 80 90 100

y

x

TSP30 Solution (Performance = 420)

Overview of performance

0

200

400

600

800

1000

1200

1400

1600

1800

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Dis

tanc

e

Generations (1000)

TSP30 - Overview of Performance

Best

Worst

Average

Example: n-queens Put n queens on an n × n board with no two

queens on the same row, column, or diagonal

Examples Eaters

http://math.hws.edu/xJava/GA/

TSP http://www.heatonresearch.com/articles/65/page1.

html http://www.ads.tuwien.ac.at/raidl/tspga/TSPGA.ht

ml 

Exercise: The Card Problem You have 10 cards numbered from 1 to 10. You have to

choose a way of dividing them into 2 piles, so that the cards in Pile0 *sum* to a number as close as possible to 36, and the remaining cards in Pile1 *multiply* to a number as close as possible to 360

Encoding Each card can be in Pile0 or Pile1, there are 1024 possible

ways of sorting them into 2 piles, and you have to find the best. Think of a sensible way of encoding any possible solution.

Fitness Some of these chromosomes will be closer to the target

than others. Think of a sensible way of evaluating any chromosome and scoring it with a fitness measure.

Issues for GA practitioners Choosing basic implementation issues:

Representation Population size, mutation rate, ... Selection, deletion policies Crossover, mutation operators

Termination criteria Performance, scalability Solution is only as good as the fitness function

(often hardest part)

Your assignment will be to code a GA for a given task! Be aware of the above issues…

Concept is easy to understand

Supports multi-objective optimization

Good for “noisy” environments

Always an answer; answer gets better with time

Inherently parallel; easily distributed

Benefits of GAs