Coevolving Solutions to the Shortest Common Superstring Problem

86
1 Coevolving Solutions to the Shortest Common Superstring Problem Assaf Zaritsky & Moshe Sipper Assaf Zaritsky & Moshe Sipper Ben-Gurion University, Israel Ben-Gurion University, Israel www.cs.bgu.ac.il/~assafza www.cs.bgu.ac.il/~assafza

description

Coevolving Solutions to the Shortest Common Superstring Problem. Assaf Zaritsky & Moshe Sipper Ben-Gurion University, Israel www.cs.bgu.ac.il/~assafza. Outline. The “Shortest Common Superstring” problem. DNA sequencing and the input domain. - PowerPoint PPT Presentation

Transcript of Coevolving Solutions to the Shortest Common Superstring Problem

Page 1: Coevolving Solutions to the Shortest Common  Superstring Problem

1

Coevolving Solutions to the Shortest Common

Superstring Problem

Assaf Zaritsky & Moshe SipperAssaf Zaritsky & Moshe SipperBen-Gurion University, IsraelBen-Gurion University, Israel

www.cs.bgu.ac.il/~assafzawww.cs.bgu.ac.il/~assafza

Page 2: Coevolving Solutions to the Shortest Common  Superstring Problem

2

Outline

The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic

algorithm (GA). The Puzzle approach. Conclusions and future work.

Messy Puzzle.

Page 3: Coevolving Solutions to the Shortest Common  Superstring Problem

3

The Shortest Common Superstring Problem (SCS)

Let SS = { = {ss11,…,,…,ssnn}} be a set of strings (blocksblocks) over some alphabet ΣΣ. A superstringsuperstring of S is a string x such that each si in S is a substring of x.

Problem: Find shortest (common) superstring.Problem: Find shortest (common) superstring. NP-Complete. MAX-SNP hard. Motivation: DNA sequencing, data compression.

Page 4: Coevolving Solutions to the Shortest Common  Superstring Problem

4

S = {ate, half, lethal, alpha, alfalfa} A trivial superstring is “atehalflethalalphaalfalfa” of

length 25 (a simple concatenation of all blocks). A shortest common superstring is “lethalphalfalfate”

of length 17. Note that a “compressed” permutation of the blocks

is actually a superstring.

SCS: Example

Page 5: Coevolving Solutions to the Shortest Common  Superstring Problem

5

Approximation Algorithms Several linear approximations for SCS have been

proposed, most of which rely on greedy approaches. GREEDY

The most widely heuristic used in DNA sequencing. Conjecture [Blum 1994, Sweedyk 1999]: Superstring

produced by GREEDY is of length at most two times the optimal.

We are not aware of any previous evolutionary approach to the SCS problem.

Page 6: Coevolving Solutions to the Shortest Common  Superstring Problem

6

Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem.

DNA sequencing and the input domain.DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic

algorithm (GA). The Puzzle approach. Conclusions and future work.

Messy Puzzle.

Page 7: Coevolving Solutions to the Shortest Common  Superstring Problem

7

DNA SequencingThe most common usage of the SCS problem.

Page 8: Coevolving Solutions to the Shortest Common  Superstring Problem

8

DNA Sequencing (cont’d)

The problem: “read” a string of DNA. Short DNA strands can be read in laboratory. To sequence a long DNA strand:

(The DNA sequence appears in many copies)1. Cut the DNA to short fragments using restriction

enzymes.2. Sequence each of the resulting fragments.

3. Order those fragments using a SCS algorithm.

Page 9: Coevolving Solutions to the Shortest Common  Superstring Problem

9

The Input DomainThe input strings used in the experiments were inspired by DNA sequencing:

Page 10: Coevolving Solutions to the Shortest Common  Superstring Problem

10

Input Generation Setup: Parameters

NB: increasing number of blocks results in exponential growth of the problem’s complexity.

Size of random string250 bits (~50 blocks)400 bits (~80 blocks)

Minimal block size20 bits

Maximal block size30 bits

Number of duplicates created from a random string

5

Page 11: Coevolving Solutions to the Shortest Common  Superstring Problem

11

Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain.

Standard and cooperative coevolutionary Standard and cooperative coevolutionary genetic algorithm (GA).genetic algorithm (GA).

The Puzzle approach. Conclusions and future work.

Messy Puzzle.

Page 12: Coevolving Solutions to the Shortest Common  Superstring Problem

12

Simple Genetic Algorithmproduce an initialinitial population of individuals

evaluateevaluate fitness of all individuals

whilewhile termination condition not met dodo

selectselect fitter individuals for reproduction

recombinerecombine individuals

mutatemutate individuals

evaluateevaluate fitness of modified individuals

generategenerate a new population

end whileend while

Page 13: Coevolving Solutions to the Shortest Common  Superstring Problem

13

Simple GA for the SCS Problem Given a set of strings as input, generate initial

population of random candidate solutions. The fitness of each individual depends on its

lengthlength and accuracyaccuracy. The GA uses selection, recombination, and

mutation to create the next generation, each individual of which is then evaluated.

Theses steps are repeated a predefined number of times or until the solution is deemed satisfactory.

Page 14: Coevolving Solutions to the Shortest Common  Superstring Problem

14

Simple GA for the SCS Problem (cont’d)

Blocks of the input set are atomicatomic components. Representation: An individual’s genome is

represented as a sequence of blocks. An individual may have missing blocks or

contain duplicate copies of the same block. Permutation Representation: Good or Bad?

Page 15: Coevolving Solutions to the Shortest Common  Superstring Problem

15

Simple GA for the SCS Problem (cont’d)

Evaluation: fitness of an individual is the length of it’s compressed genome + the total length of the blocks that are not covered by the individual.

Genetic operators: Fitness proportionate selection. Two-points recombination. Allows growth and

reduction in genome’s length. Block-change mutation.

Page 16: Coevolving Solutions to the Shortest Common  Superstring Problem

16

Simple GA for the SCS Problem (example)

S = {s1,s2,s3,s4}; s1 = 0011, s2 = 1100, s3 = 1001, s4 = 111. Fitness (< s2,s1>) = |110011| + |111| = 6 + 3 = 9. Fitness (< s4,s2,s1,s4>) = |11100111| = 8. Recombination:

p1 = <s1,||s2,s3||,s4> p2 = <s4,||s1,s3,s2||> p3 = recombine1(p1,p2) = <s1,s1,s3 ,s2,s4> p4 = recombine2(p1,p2) = <s4,s2,s3 >

mutate (<s1,s2,s2>) = <s1,s4,s2>

Page 17: Coevolving Solutions to the Shortest Common  Superstring Problem

17

Coevolution

Simultaneous evolution of two or more species with coupled fitness.

Coevolving species either competecompete or cooperatecooperate.

Competitive coevolution: Fitness of individual based on direct competition with individuals of other species, which in turn evolve separately in their own populations (“prey-predator”).

Page 18: Coevolving Solutions to the Shortest Common  Superstring Problem

18

Cooperative Coevolution

Page 19: Coevolving Solutions to the Shortest Common  Superstring Problem

19

Cooperative Coevolution (cont’d)

Cooperative Coevolution involves a number of independently evolving species.

Interaction between species occurs via fitness function only.

The fitness of an individual depends on its ability to collaborate with individuals from other species.

Page 20: Coevolving Solutions to the Shortest Common  Superstring Problem

20

Cooperative Coevolution (cont’d)

Source: Potter & DeJong (1997)Source: Potter & DeJong (1997)

Page 21: Coevolving Solutions to the Shortest Common  Superstring Problem

21

Cooperative Coevolutionary Algorithm for the SCS Problem

Two species evolve simultaneously. First species contains prefixesprefixes of candidate

solutions to the SCS problem at hand. Second species contains candidate suffixessuffixes. Fitness of an individual in each species

depends on how good it interacts with representativesrepresentatives from other species to construct a global solutionconstruct a global solution.

Page 22: Coevolving Solutions to the Shortest Common  Superstring Problem

22

Cooperative Coevolutionary Algorithm for the SCS Problem (evaluation process)

Prefixes population

Suffixes population

Suffix

Suffix

Representative

RepresentativeIndiv

idual

Indiv

idual

Merge

Page 23: Coevolving Solutions to the Shortest Common  Superstring Problem

23

Cooperative Coevolutionary Algorithm for the SCS Problem (evaluation process)

Prefixes population

Suffixes population

Fitness

Fitness

Evaluate

Page 24: Coevolving Solutions to the Shortest Common  Superstring Problem

24

ExperimentsCompare: GREEDY, Standard GA, Cooperative CoevolutionCompare: GREEDY, Standard GA, Cooperative Coevolution

Page 25: Coevolving Solutions to the Shortest Common  Superstring Problem

25

Experimental Setup

Each type of GA was executed twice on each problem instance; the better run of the two was used for statistical purposes.

Population size500Number of generations5000Recombination rate0.8Mutation rate0.03Problem instances per experiment50

Page 26: Coevolving Solutions to the Shortest Common  Superstring Problem

26

Results: Experiment I (~50 blocks)

Page 27: Coevolving Solutions to the Shortest Common  Superstring Problem

27

Results: Experiment II (~80 blocks)

Page 28: Coevolving Solutions to the Shortest Common  Superstring Problem

28

Results: Summary

381381Distance from Distance from optimum: optimum: 131131

280280Distance from Distance from optimum: optimum: 3030

275275Distance from Distance from optimum: optimum: 2525

596596Distance from Distance from optimum: optimum: 196196

685685Distance from Distance from optimum: optimum: 285285

547547Distance from Distance from optimum: optimum: 147147

Problem size

Problem size

Algorithm

Algorithm

50 blocks

80 blocks

GREEDY Genetic Cooperative

Average of the best superstring lengthsAverage of the best superstring lengths

Page 29: Coevolving Solutions to the Shortest Common  Superstring Problem

29

Conclusion:

The collaboration between the two The collaboration between the two populations results in a populations results in a good good decomposition of the problem into decomposition of the problem into two smaller sub-problems, each is two smaller sub-problems, each is solved using a standard GA.solved using a standard GA.

Page 30: Coevolving Solutions to the Shortest Common  Superstring Problem

30

Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic

algorithm (GA).

The The PuzzlePuzzle approach. approach. Conclusions and future work.

Messy Puzzle.

Page 31: Coevolving Solutions to the Shortest Common  Superstring Problem

31

The Puzzle Algorithm

Page 32: Coevolving Solutions to the Shortest Common  Superstring Problem

32

The Schema Theorem

““Short, low-order, above-average Short, low-order, above-average schemata receive exponentially schemata receive exponentially increasing trials in subsequent increasing trials in subsequent generations of a genetic algorithm.”generations of a genetic algorithm.”

Holland (1975)Holland (1975)

Page 33: Coevolving Solutions to the Shortest Common  Superstring Problem

33

Building Blocks Hypothesis

““A genetic algorithm seeks near-optimal A genetic algorithm seeks near-optimal performance through the juxtaposition performance through the juxtaposition of short, low-order, high-performance of short, low-order, high-performance schemata, called the building blocks.”schemata, called the building blocks.”

Page 34: Coevolving Solutions to the Shortest Common  Superstring Problem

34

Our Interpretation

““The The success of Gsuccess of GAAss stems from stems from their ability to combine quality their ability to combine quality sub-solutions (building blocks)sub-solutions (building blocks) from separate individuals in order from separate individuals in order to form better global solutions.to form better global solutions.””

Page 35: Coevolving Solutions to the Shortest Common  Superstring Problem

35

The Main Assumption

PProblems in nature have an roblems in nature have an inherentinherent structural design. Even structural design. Even when the structure is not known when the structure is not known explicitly Gexplicitly GAAss detect it detect it implicitly and gradually implicitly and gradually enhance good building blocks.enhance good building blocks.

Page 36: Coevolving Solutions to the Shortest Common  Superstring Problem

36

A Problem

Recombination may Recombination may destroy quality building destroy quality building blocks found by the GA. blocks found by the GA.

Page 37: Coevolving Solutions to the Shortest Common  Superstring Problem

37

ExampleBrain AppearanceBrain Appearance

00101010101010101010000111101000100000010101010101010101000011110100010000

Page 38: Coevolving Solutions to the Shortest Common  Superstring Problem

38

Example (con’t)Brain AppearanceBrain Appearance

00101010101010101010000111101000100000010101010101010101000011110100010000

1. Smart (assumable)1. Smart (assumable)

2. Blond 2. Blond

But not very beautiful…But not very beautiful…

Page 39: Coevolving Solutions to the Shortest Common  Superstring Problem

39

The Preservation of Favoured Building Blocks in the Struggle for Fitness: The Puzzle Algorithm

Page 40: Coevolving Solutions to the Shortest Common  Superstring Problem

40

Puzzle Algorithm: The Idea

Improve Recombination Operator. Preserve good building blocks discovered by

GA using selection of recombination loci that do not destroy good building blocks.

Result: Assembly of good building blocks to construct better solutions (as in a puzzle).

Page 41: Coevolving Solutions to the Shortest Common  Superstring Problem

41

Puzzle Algorithm (cont’d) Two populations:

1. Candidate solutions: As in simple GA.2. Building blocks: Each individual is a sequence of blocks contained in at least one candidate solution.

Building blocks population

Candidate solutions population

Page 42: Coevolving Solutions to the Shortest Common  Superstring Problem

42

Puzzle Algorithm (cont’d) Interaction between candidate solutionscandidate solutions and

building blocks is through fitness function.

Fitness evaluationFitness evaluation

Crossover locationCrossover location

Building blocks

population

Candidate solutions

population

Interaction between building blocksbuilding blocks and candidate solutions is through constraints on recombination points.

Page 43: Coevolving Solutions to the Shortest Common  Superstring Problem

43

Puzzle Algorithm: Zoom In

Building blocks population

Candidate solutions population

Fitness evaluationFitness evaluation

Crossover locationCrossover location

each individual is a sequence of blocks

Page 44: Coevolving Solutions to the Shortest Common  Superstring Problem

44

Puzzle Algorithm: Zoom In

Building blocks population

Candidate solutions population

Fitness evaluationFitness evaluation

Crossover locationCrossover location

each building block is contained in at each building block is contained in at least one individual in the solutions least one individual in the solutions

populationpopulation

overlapping building blocks

Page 45: Coevolving Solutions to the Shortest Common  Superstring Problem

45

The Candidate Solutions Population

Representation, fitness evaluation, selection, and mutation are identical to the simple GA.

Recombination-aid vector aids in selecting the recombination loci.

Recombination-aid vector is updated by building blocks individuals.

Building blocks population

Candidate solutions population

Fitness evaluation

Crossover location

Page 46: Coevolving Solutions to the Shortest Common  Superstring Problem

46

The Building Blocks Population An individual is represented as a sequence of

blocks, contained in at least one candidate solution. Fitness of an individual is the average of the fitness

of candidate solutions containing it. Fitness-proportionate selection.

Building blocks population

Candidate solutions population

Fitness evaluation

Crossover location

Page 47: Coevolving Solutions to the Shortest Common  Superstring Problem

47

The Building Blocks Population (con’t) “Unisex” individuals. Two modification operators:

Expansion: Increase it’s genome by one block. Occurs with high probability.

Exploration: “Die”, and start over as a new 2-block individual. Occurs with low probability.

Building blocks population

Candidate solutions population

Fitness evaluation

Crossover location

Page 48: Coevolving Solutions to the Shortest Common  Superstring Problem

48

Building Blocks – Candidate Solutions

Fitness evaluationFitness evaluationBuilding blocks population

Candidate solutions population

ff22

ff33

ff44

ff11

Page 49: Coevolving Solutions to the Shortest Common  Superstring Problem

49

Building Blocks – Candidate Solutions

Fitness evaluationFitness evaluationBuilding blocks population

Candidate solutions population

ff22

ff33

ff44

ff11

Update Update “recombination-aid” “recombination-aid”

vectorvector

ff11

ff11 ff22

ff22

ff33

ff33

ff44

Page 50: Coevolving Solutions to the Shortest Common  Superstring Problem

50

Update Recombination-aid vector

Solution’s genome

building block #1 fitness = 0.3

00000000000000Recombination-aid vector

building block #2 fitness = 0.4

building block #3 fitness = 0.6

Page 51: Coevolving Solutions to the Shortest Common  Superstring Problem

51

Update Recombination-aid vector

Solution’s genome

000.60.60.40.4000.30.30.30.300Recombination-aid vector

building block #1 fitness = 0.3

building block #2 fitness = 0.4

building block #3 fitness = 0.6

Page 52: Coevolving Solutions to the Shortest Common  Superstring Problem

52

Update Recombination-aid vector

Solution’s genome

0.60.60.60.60.40.4000.30.30.30.30.30.3Recombination-aid vector

building block #1 fitness = 0.3

building block #2 fitness = 0.4

building block #3 fitness = 0.6

Page 53: Coevolving Solutions to the Shortest Common  Superstring Problem

53

Recombination-loci selection

Solution’s genome

0.60.60.60.60.40.4000.30.30.30.30.30.3Recombination-aid vector

* Ties are broken arbitrarily

Page 54: Coevolving Solutions to the Shortest Common  Superstring Problem

54

ExperimentsCompare: GREEDY, Standard GA, PuzzleCompare: GREEDY, Standard GA, Puzzle

Page 55: Coevolving Solutions to the Shortest Common  Superstring Problem

55

Building Blocks - Experimental Setup

Population size1000Expansion rate0.8Exploration rate0.1

Page 56: Coevolving Solutions to the Shortest Common  Superstring Problem

56

Results: Experiment III (~50 blocks)

CooperativeCooperative

Page 57: Coevolving Solutions to the Shortest Common  Superstring Problem

57

Results: Experiment IV (~80 blocks)

CooperativeCooperative

Did we lose to cooperative?Did we lose to cooperative?

NO!NO!

Page 58: Coevolving Solutions to the Shortest Common  Superstring Problem

58

Results: Summary

381381Distance from Distance from optimum: optimum: 131131

280280Distance from Distance from optimum: optimum: 3030

253253Distance from Distance from

optimum: optimum: 33

596596Distance from Distance from optimum: optimum: 196196

685685Distance from Distance from optimum: optimum: 285285

571571Distance from Distance from optimum: optimum: 171171

Problem size

Problem size

Algorithm

Algorithm

50 blocks

80 blocks

GREEDY Genetic Puzzle

Average of the best superstring lengthsAverage of the best superstring lengths

Page 59: Coevolving Solutions to the Shortest Common  Superstring Problem

59

Relations Between The Algorithms

Co-PuzzleCo-Puzzle

GAGA

PuzzlePuzzle

puzzl

epu

zzle

puzzl

epu

zzle

CooperativeCooperativecooperation

cooperation

cooperation

cooperation

Page 60: Coevolving Solutions to the Shortest Common  Superstring Problem

60

The Co-Puzzle Algorithm

Possible building blocks population

Candidate prefixes population

Fitness eval

Crossover location

Possible building blocks population

Candidate suffixes population

Fitness eval

Crossover location

Fitness evaluation

Page 61: Coevolving Solutions to the Shortest Common  Superstring Problem

61

ExperimentsCompare: GREEDY, Cooperative Coevolution, Co-PuzzleCompare: GREEDY, Cooperative Coevolution, Co-Puzzle

Page 62: Coevolving Solutions to the Shortest Common  Superstring Problem

62

Results: Experiment V (~80 blocks)

Page 63: Coevolving Solutions to the Shortest Common  Superstring Problem

63

Results: Experiment VI (~50 blocks)

PuzzlePuzzle

????????

Page 64: Coevolving Solutions to the Shortest Common  Superstring Problem

64

Results: Summary

381381Distance from Distance from optimum: optimum: 131131

275275Distance from Distance from optimum: optimum: 2525

268268Distance from Distance from optimum: optimum: 1818

596596Distance from Distance from optimum: optimum: 196196

547547Distance from Distance from optimum: optimum: 147147

482482Distance from Distance from optimum: optimum: 8282

Problem size

Problem size

Algorithm

Algorithm

50 blocks

80 blocks

GREEDY Cooperative Co-puzzle

size of shortest common superstringsize of shortest common superstring

42% 42% improvement over cooperative

Page 65: Coevolving Solutions to the Shortest Common  Superstring Problem

65

Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic

algorithm (GA). The Puzzle approach.

Conclusions and future work.Conclusions and future work.

Messy Puzzle.

Page 66: Coevolving Solutions to the Shortest Common  Superstring Problem

66

Results: Summary

381381Distance from Distance from optimum:optimum: 131131

275275Distance from Distance from optimum:optimum: 2525

253253Distance from Distance from optimum:optimum: 33

268268Distance from Distance from optimum:optimum: 1818

596596Distance from Distance from optimum:optimum: 196196

547547Distance from Distance from optimum:optimum: 147147

571571Distance from Distance from optimum:optimum: 171171

482482Distance from Distance from optimum:optimum: 8282

Problem size

Problem size

Algorithm

Algorithm

50 blocks

80 blocks

GREEDY Cooperative Co-puzzle

size of shortest common superstringsize of shortest common superstring

Puzzle

677677Distance from Distance from optimum:optimum: 227227

673673Distance from Distance from optimum:optimum: 223223

683683Distance from Distance from optimum:optimum: 233233

617617Distance from Distance from optimum:optimum: 167167

768768Distance from Distance from optimum:optimum: 268268

768768Distance from Distance from optimum:optimum: 268268

813813Distance from Distance from optimum:optimum: 313313

732732Distance from Distance from optimum:optimum: 232232

90 blocks

100 blocks

20 problem instances per experiment

25% 25% betterbetter

13% 13% betterbetter

83% 83% betterbetter

42% 42% betterbetter

Page 67: Coevolving Solutions to the Shortest Common  Superstring Problem

67

Larger Problems - Using More Species

836836Distance from Distance from optimum: optimum: 286286

867867Distance from Distance from optimum: optimum: 317317

??Distance from Distance from

optimumoptimum : :??

906906Distance from Distance from optimum: optimum: 306306

992992Distance from Distance from optimum: optimum: 392392

906906Distance from Distance from optimum: optimum: 306306

Problem size

Problem size

Algorithm

Algorithm

110 blocks

120 blocks

GREEDY Co-puzzle 3-Co-puzzle

size of shortest common superstringsize of shortest common superstring

Page 68: Coevolving Solutions to the Shortest Common  Superstring Problem

68

Conclusions

Cooperative coevolution might prove Cooperative coevolution might prove deleterious when too many species are deleterious when too many species are used (when close to optimum?).used (when close to optimum?).

When a suitable number of species are When a suitable number of species are used, cooperative coevolution improves used, cooperative coevolution improves performance by decomposing the performance by decomposing the problem to several easier subproblems.problem to several easier subproblems.

Page 69: Coevolving Solutions to the Shortest Common  Superstring Problem

69

Conclusions (con’t)

Evolving a population of building blocks Evolving a population of building blocks to aid in the selection of recombination to aid in the selection of recombination loci improves drastically the loci improves drastically the performance of a standard GA.performance of a standard GA.

Cooperation between cooperative Cooperation between cooperative coevolution and Puzzle ultimately coevolution and Puzzle ultimately improves global performance.improves global performance.

Page 70: Coevolving Solutions to the Shortest Common  Superstring Problem

70

Future Work Test the (Co-) Puzzle approach on other Test the (Co-) Puzzle approach on other

problem domains.problem domains. A hybrid GA.A hybrid GA.

Tackle larger problems.Tackle larger problems. Comparison to greedy-stochastically based Comparison to greedy-stochastically based

local-search algorithms.local-search algorithms.

Page 71: Coevolving Solutions to the Shortest Common  Superstring Problem

71

Outline The “Shortest Common Superstring” problem.The “Shortest Common Superstring” problem. DNA sequencing and the input domain. Standard and cooperative coevolutionary genetic

algorithm (GA). The Puzzle approach. Conclusions and future work.

Messy Puzzle.Messy Puzzle.

Page 72: Coevolving Solutions to the Shortest Common  Superstring Problem

72

The Messy Puzzle Algorithm

Page 73: Coevolving Solutions to the Shortest Common  Superstring Problem

73

Static Detection of Building Blocks for addressing the

Linkage ProblemHillel MaozHillel Maoz

Ben-Gurion University, IsraelBen-Gurion University, Israel

Page 74: Coevolving Solutions to the Shortest Common  Superstring Problem

74

The Linkage Problem A binary Genome of size n = 14.A binary Genome of size n = 14. Genes Genes aa and and bb togethertogether encode important information. encode important information. Random cross over is applied.Random cross over is applied.

Survival probability = The chance to appear in the offspringSurvival probability = The chance to appear in the offspring Left genome – 4/15Left genome – 4/15 Right genome – 14/15Right genome – 14/15

Page 75: Coevolving Solutions to the Shortest Common  Superstring Problem

75

The Linkage Problem (con’t)

In many cases it is hard In many cases it is hard to know the optimal to know the optimal

representationrepresentation

Page 76: Coevolving Solutions to the Shortest Common  Superstring Problem

76

The MaxCut Problem

Input: undirected weighted graph G=(V, E, W).

Output: a partition of V into two disjoint sets (S,V\S).

Goal: maximal sum of edge weights between the sets.

NP-complete.

Page 77: Coevolving Solutions to the Shortest Common  Superstring Problem

77

Cut = 34

Cut = 47

MaxCut - Example

Page 78: Coevolving Solutions to the Shortest Common  Superstring Problem

78

Simple GA for MaxCut

Population of candidate solutionsPopulation of candidate solutions• Give each node with a numberGive each node with a number• Assign ‘0’ or ‘1’ to indicate which set the node belongs toAssign ‘0’ or ‘1’ to indicate which set the node belongs to

Iteration step Iteration step • Select any two parentsSelect any two parents• Recombine and create an offspringRecombine and create an offspring• Repeat until a new population is generatedRepeat until a new population is generated

Fitness – The weight of the cutFitness – The weight of the cut

Page 79: Coevolving Solutions to the Shortest Common  Superstring Problem

79

The Representation Problem

““How to define the order of the How to define the order of the vertices within the genome ?”vertices within the genome ?”

Page 80: Coevolving Solutions to the Shortest Common  Superstring Problem

80

Messy Genes

The main difficulty: identifying the related vertexes. Messy gene is an ordered pair <allele-locus,allele-value>. Possible solution:

Use some sort of messy genes to detect related genes.

Use the Puzzle approach to keep them together.

Page 81: Coevolving Solutions to the Shortest Common  Superstring Problem

81

The Messy Puzzle Algorithm

A building block’s genome A building block’s genome is represented as a is represented as a

sequence of messy genessequence of messy genes

Page 82: Coevolving Solutions to the Shortest Common  Superstring Problem

82

Messy Puzzle Algorithm

Two population setup as in the puzzle algorithm.Two population setup as in the puzzle algorithm. Enhanced recombination operator.Enhanced recombination operator. Evolved building blocks structure (similar to Evolved building blocks structure (similar to

puzzle).puzzle).

<0,0>

<2,0>

<1,1>

<5,0>

<6,1>

Page 83: Coevolving Solutions to the Shortest Common  Superstring Problem

83

Enhanced Recombination

I)I)

II)II)

IIIIII))IV)IV)

0.8 0.7 0.60.8 0.7 0.6

1 2 3 4 5 6 7 81 2 3 4 5 6 7 8 1 2 3 4 5 6 7 81 2 3 4 5 6 7 8

Add the Add the 1st1st BB - success BB - success

Add the Add the 2nd2nd BB - failure BB - failure

Add the Add the 33rdrd BB - success BB - success

Simple crossoverSimple crossover

Page 84: Coevolving Solutions to the Shortest Common  Superstring Problem

84

Static Detection of Building Blocks

Building blocks do not truly evolve. No Expansion and Exploration operators. Building blocks’ fitness is based on a number of

generations. Purpose: to check and understand the core of the

messy puzzle algorithm.

Page 85: Coevolving Solutions to the Shortest Common  Superstring Problem

85

Results

Max Cut Size - Puzzle VS. GA

0.010.020.030.040.050.060.070.080.0

1 2 3 4 5 6 7 8 9 10

graph number

cut s

ize

diffe

renc

e

1graph_200_0.01_1

2graph_200_0.05_1

3graph_200_0.1_1

4graph_200_0.3_1

5graph_200_0.5_1

6graph_300_0.01_1

7graph_300_0.05_1

8graph_300_0.1_1

9graph_300_0.3_1

10graph_300_0.5_1

Random Generated Graphs.Random Generated Graphs. 1000 generations.1000 generations. 10 separate experiments per problem instance.10 separate experiments per problem instance.

Avg Cut Size - different number of BB (graph_300_0.1_1)

0

10

20

30

40

50

10 20 30 40 50 60

number of BB

dist

ance

from

GA

Avg Cut Size - different number of BB (graph_300_0.5_1)

-20

0

20

40

60

80

10 20 30 40 50 60

number of BB

dist

ance

from

GA

Max Cut Size - Bi-partite graphs

-200

0

200

400

600

800

1000

1200

1400

1600

1 2 3 4 5 6

graph number

cut s

ize

diffe

renc

e

•Distance to optimum

•Puzzle addition

Page 86: Coevolving Solutions to the Shortest Common  Superstring Problem

86

Conclusions and Future Work Do messy work to solve the linkage problem. Even a small population of building blocks

improves the GA performance. Messy puzzle is better when inner structures

exists.

Applying evolution to the building blocks population.

Comparing to different representation-search techniques.