GH Galapagos

10/23/11 7:50 PMEvolutionary Principles applied to Problem Solving - Grasshopper

Page 1 of 28http://www.grasshopper3d.com/profiles/blogs/evolutionary-principles

Evolutionary Principles applied to Problem SolvingThis blog post is a rough approximation of the lecture I gave at the AAG10 conference in Viennaon September 21st 2010. Naturally it will be quite a different experience as the medium is quitedifferent, but it my hope the basic premise of the lecture remains intact. This post deals with Evo-lutionary Solvers in general, but I use Rhino, Grasshopper and Galapagos to demonstrate thetopics.

There is nothing particularly new about Evolutionary Solvers or Genetic Algorithms.The first references to this field of computation stem from the early 60's when LawrenceJ. Fogel published the landmark paper "On the Organization of Intellect" which sparkedthe first endeavours into evolutionary computing. The early 70's witnessed further for-ays with seminal work produced by -among others- Ingo Rechenberg and John HenryHolland. Evolutionary Computation didn't gain popularity beyond the programmerworld until Richard Dawkins' book "The Blind Watchmaker" in 1986, which came with asmall program that generated a seemingly endless stream of body-plans called "Bio-morphs" based on human selection. Since the 80's the advent of the personal computerhas made it possible for individuals without government funding to apply evolutionaryprinciples to personal projects and they have since made it into the common parlance.



The term "Evolutionary Computing" may very well be widely known at this point intime, but they are still very much a programmers tool. 'By programmers for program-mers' if you will. The applications out there that apply evolutionary logic are eitheraimed at solving specific problems, or they are generic libraries that allow other pro-grammers to piggyback along. It is my hope that Galapagos will provide a generic plat-form for the application of Evolutionary Algorithms to be used on a wide variety ofproblems by non-programmers.

Pros and Cons

Before we dive into the subject matter too deeply though I feel it is important to high-light some of the (dis)advantages of this particular type of solver, just so you know whatto expect. Since we are not living in the best of all possible worlds there is often no suchthing as the perfect solution. Every approach has drawbacks and limitations. In the caseof Evolutionary Algorithms these are luckily well known and easily understood draw-backs, even though they are not trivial. Indeed, they may well be prohibitive for many aparticular problem.

Firstly; Evolutionary Algorithms are slow. Dead slow. It is not unheard of that a singleprocess may run for days or even weeks. Especially complicated set-ups that require along time in order to solve a single iteration will quickly run out of hand. A light/shad-ow or acoustic computation for example may easily take a minute per iteration. If we as-sume we'll need at least 50 generations of 50 individuals each (which is almost certainlyan underestimate unless the problem has a very obvious solution.) we're already look-ing at a two-day runtime.



Secondly, Evolutionary Algorithms do not guarantee a solution. Unless a predefined'good-enough' value is specified, the process will tend to run on indefinitely, neverreaching The Answer, or, having reached it, not recognizing it for what it is.

All is not bleak and dismal however, Evolutionary Algorithms have strong benefits aswell, some of them rather unique amongst the plethora of computational methods. Theyare remarkably flexible for example, able to tackle a wide variety of problems. There areclasses of problems which are by definition beyond the reach of even the best solver im-plementation and other classes that are very difficult to solve, but these are typicallyrare in the province of the human meso-world. By and large the problems we encounteron a daily basis fall into the 'evolutionary solvable' category.

Evolutionary Algorithms are also quite forgiving. They will happily chew on problemsthat have been under- or over-constrained or otherwise poorly formulated. Further-more, because the run-time process is progressive, intermediate answers can be harvest-ed at practically any time. Unlike many dedicated algorithms, Evolutionary Solversspew forth a never ending stream of answers, where newer answers are generally of ahigher quality than older answers. So even a pre-maturely aborted run will yield some-thing which could be called a result. It might not be a very good result, but it will be aresult of sorts.

Finally, Evolutionary Solvers allow -in principle- for a high degree of interaction withthe user. This too is a fairly unique feature, especially given the broad range of possibleapplications. The run-time process is highly transparent and browsable, and there existsa lot of opportunity for a dialogue between algorithm and human. The solver can becoached across barriers with the aid of human intelligence, or it can be goaded into ex-ploring sub-optimal branches and superficially dead-ends.



The Process

In this section I shall briefly outline the process of an Evolutionary Solver run. It is ahighly simplified version of the remainder of the blog post, and I'll skip over many in-teresting and even important details. I'll show the process as a series of image frames,where each frame shows the state of the 'population' at a given moment in time. Before Ican start however, I need to explain what the image below means.

What you see here is the Fitness Landscape of a particular model. The model contains twovariables, meaning two values which are allowed to change. In Evolutionary Comput-



ing we refer to variables as genes. As we change Gene A, the state of the model changesand it either becomes better or worse (depending on what we're looking for). So as GeneA changes, the fitness of the entire model goes up or down. But for every value of A, wecan also vary Gene B, resulting in better or worse combinations of A and B. Every combina-tion of A and B results in a particular fitness, and this fitness is expressed as the heightof the Fitness Landscape. It is the job of the solver to find the highest peak in this land-scape.

Of course a lot of problems are defined by not just two but many genes, in which casewe can no longer speak of a 'landscape' in the strict sense. A model with 12 genes wouldbe a 12-dimensional fitness volume deformed in 13 dimensions instead of a two-dimen-sional fitness plane deformed in 3 dimensions. As this is impossible to visualize I shallonly use one and two-dimensional models, but note that when we speak of a "land-scape", it might mean something terribly more complex than the above image shows.

As the solver starts it has no idea about the actual shape of the fitness landscape. In-deed, if we knew the shape we wouldn't need to bother with all this messy evolutionarystuff in the first place. So the initial step of the solver is to populate the landscape (or"model-space") with a random collection of individuals (or "genomes"). A genome isnothing more than a specific value for each and every gene. In the above case, a genomecould for example be {A=0.2 B=0.5}. The solver will then evaluate the fitness for eachand every one of these random genomes, giving us the following distribution:



Once we know how fit every genome is (i.e., the elevation of the red dots), we can makea hierarchy from fittest to lamest. We are looking for high-ground in the landscape andit is a reasonable assumption that the higher genomes are closer to potential high-ground than the low ones. Therefore we can kill off the worst performing ones and fo-cus on the remainder:



It is not good enough to simply pick the best performing genome from the initial popu-lation and call it quits. Since all the genomes in Generation 0 were picked at random, itis actually quite unlikely that any of them will have hit the jack-pot. What we need to dois breed the best performing genomes in Generation 0 to create Generation 1. When webreed two genomes, their offspring will end up somewhere in the intermediate model-space, thus exploring fresh ground:



We now have a new population, which is no longer completely random and which is al-ready starting to cluster around the three fitness 'peaks'. All we have to do is repeat theabove steps (kill off the worst performing genomes, breed the best-performinggenomes) until we have reached the highest peak.



In order to perform this process, an Evolutionary Solver requires five interlocking parts,which I'll discuss in something resembling detail. We could call this the anatomy of theSolver.

1. Fitness Function2. Selection Mechanism3. Coupling Algorithm4. Coalescence Algorithm5. Mutation Factory

Fitness Functions



In biological evolution, the quality known as "Fitness" is actually something of a stum-bling block. Usually it is very difficult to say exactly what it means to be fit. It certainlyhas little or nothing to do with being the strongest, or the fastest, or the most vicious.The reason there are no flying dogs isn't that evolution hasn't gotten around to makingany yet, it is that the dog lifestyle is supremely incompatible with flying and the sacri-fices required to equip a dog with flight would certainly detract more from the overallfitness than flight would add to it. Fitness is the result of a million conflicting forces.Evolutionary Fitness is the ultimate compromise.

A fit individual is on average able to produce more offspring than an unfit one, so wecould say that fitness equals the number of genetic children. A better measure yetwould be to count the number of grand-children. And a better measure yet would be tocount the allele frequency in the gene-pool of the genes that made up the individual inquestion. But these are all rather ad-hoc definitions that cannot be measured on the spot.



At least in Evolutionary Computation, fitness is a very easy concept. Fitness is whateverwe want it to be. We are trying to solve a specific problem, and therefore we know what itmeans to be fit. If for example we are seeking to position a shape so that it may bemilled with minimum material waste, there is a very strict fitness function that leavesno room for argument.

Let's have a look at the fitness landscape again and let's imagine it represents a modelthat seeks to encase an object in a minimum volume bounding-box. A minimum bound-ing-box is the smallest orthogonal box that completely contains any given shape. In theimage below, the green shape is encased by two bounding boxes. B has a smaller areathan A and is therefore fitter.

When we need to mill or 3D-print a shape, it isoften a good idea to rotate it until it requires theleast amount of material to be used during man-ufacturing. For a real minimum bounding-boxwe need at least three rotation axes, but sincethat will not allow me to display the real fitnesslandscape, we will restrict ourselves to rotationaround the world X and Y axes. So, Gene A willrepresent the rotation around the X axis andGene B will represent rotation around the Y axis.

There is no need to allow for rotation higher than 360 degrees, so both genes have a lim-ited working domain. (In fact, since we are talking about orthogonal boxes, even a 0-90degree domain would suffice). Behold rotation around a single axis:



When we pick two rotational angles at random,we end up somewhere on the fitness landscape.If we allow for 4 decimal places in the rotationangles it means we can actually generate almost810,000,000,000 (or 810 billion) unique rotations.It is therefore exceptionally unlikely that wemanage to pick a random rotation that yields thebest possible answer. But let's say we don't evenmanage to get close. Let's say we manage to pick

a random genome that is at the bad end of the fitness scale, i.e. at the bottom of the fit-ness landscape. What can we say about the blood-line of this genome? When we trackthe descendants of a particular genome there is always a large amount of randomnessinvolved due to the workings of the Solver, but there is a strong general tendency thatcan be distinguished. Just like water will always flow downhill along the steepest slope,so genetic descendants will generally climb uphill along the steepest slope:



Every individual tries to maximize its own fitness, as high fitness is rewarded by thesolver. And the steepest uphill climb is the fastest way towards high fitness. So if theblack sphere represents the location of the ancestral genome, the orange track representsthe pathway of its most successful offspring. We can repeat this exercise for a largeamount of sample points which will tell us something about how the Solver and the Fit-ness Landscape interact:

Since every genome is pulled uphill, every peak in the fitness landscape has a basin of at-traction around it. This basin represents all the points in model-space that will convergeupon that specific peak. It is important to notice that the area of the basin is in no wayrepresentative of the quality of the peak. Indeed, a very poor solution may have a largebasin of attraction while a good peak might have a small catchment area. Problems likethis are typically very difficult to solve, as the solution tends to get stuck in local optima.But we'll have a look at problematic fitness functions later on.



First, let's have a closer look at the actual fitness landscape for our minimum bounding-box model. I'm afraid it's not quite as simple as the image we've been using so far. I wasactually quite surprised how organic and un-box-like the actual fitness landscape forthis problem is. Remember, the x-axis rotation is mapped along the Gene A direction andthe y-axis rotation along the Gene B direction. So every point on the AB plane representsa unique rotation composed of two angles. The elevation of this point is a direct map-ping of the volume of the bounding-box at those two rotation angles:

The first thing to notice is that the landscape is periodic. I.e., it repeats itself every 90 de-grees in both directions. Also, this landscape is in fact inverted as we're looking for aminimum volume, not a maximum one. Thus, the orange peaks in fact represent theworst solutions to this problem. Note that there are 16 of these peaks in the entire rangeand that they are rounded. When we look at the bottom of this fitness landscape, we geta rather different view:



It would appear that the lowest points in this landscape (the minimum bounding-boxes)are both fewer in number and of a different kind. We only get 8 optimal solutions andthey are all very sharp, indicating a somewhat more fragile state.

Still, on the whole we have nothing to complain about. All the solutions are of equalquality and there are no local optima at all. We can generalize this landscape to a 2-dimensional graph:



No matter where you end up as an ancestralgenome, your blood-line will always find its wayto a minimum bounding box. There's nowherefor it to get 'stuck'. So it's really just a questionabout who gets there first. If we look at a slightlymore complex fitness graph, it becomes apparent

that this need not be the case:

This fitness landscape has two kinds of solu-tions. The high quality sharp ones near the bot-tom of the graph and the low quality flat onesnear the top. The basin of attraction is given forboth solutions (yellow for high quality, pink forlow quality) and you can see that about half ofthe model space is attracted to the low quality

solutions.

An even worse example (flipped upright again this time, so high values indicate goodsolutions) would be the following fitness landscape:



The basins for these peaks are very small indeedand therefore easy to miss by a random sam-pling of the landscape. As soon as a luckygenome finds the peak on the left, its offspringwill rapidly populate the low peak causing therest of the population to go extinct. It is now

even less likely that the better peak on the right will be found. The smaller the basins forsolution, the harder it is to solve a problem with an evolutionary algorithm.

Another example of a cumbersome problem to solve would be a discontinuous fitnesslandscape:

Even though there are strictly speaking no localoptima, there is also no 'improvement' on theplateaus. A genome which finds itself in the mid-dle of one of these horizontal patches doesn'tknow where to go. If it takes a step to the left,nothing changes. If it takes a step to the right,nothing changes. There's no 'pressure' in this fit-

ness landscape, so all the genomes will wander about aimlessly, until one of them hasthe good fortune to suddenly step onto a higher plateau. At this point it will quicklydominate the gene-pool and the wandering starts again until the next plateau is acciden-tally found.

Even worse than this though is a landscape that has a high degree of noise or chaos. A



landscape may be continuous and yet feature so much detail that it becomes impossibleto make any intelligible pronunciations regarding the fitness of a local patch:

In a landscape like this, mommy and daddy mayboth be very similar and both be very fit, butwhen they mate the offspring might end up inone of the fissures. A landscape like this defiesnavigation.

Selection Mechanisms

Biological Evolution proceeds by Natural Selection. The ruthless force identified by Dar-win as the arbiter of progress. Put simply, Natural Selection affects the direction of thegene-pool over time by regulating who gets to mate. In extreme cases mating is prevent-ed because a specific genome is so unfit that the bearer cannot survive until reproduc-tive age. Another rather extreme case would be sterility. However, there's a myriadways in which Natural Selection can make it difficult or impossible for certain individu-als to pass on their genetic footprint.

However, Natural Selection isn't the only game in town. For a long time now humanshave been using Artificial Selection in order to breed specific characteristics into a(sub)species. When we try to solve problems using an Evolutionary Solver, we alwaysuse some form of artificial selection. There's no such thing as sex or gender in the com-puter. The process of selection is also much simpler than in nature, as there is basically



only one question that needs to be answered: Who gets to mate?

Allow me to enumerate the mechanisms for parent selection that are available in Gala-pagos. This is only a small subset of the selection algorithms that are possible, but theyseem to cover the basics rather well.

First off, we have Isotropic Selection, which is the simplest kind of algorithm you canimagine. In fact, it is the absence of a selection algorithm. In Isotropic Selection everyonegets to mate:

No matter where you find yourself on this fit-ness graph, your chances of ending up in a mat-ing couple are constant. You might think thatthis is a particularly pointless selection strategy

as it does nothing to further the evolution of the gene-pool. But it is not without prece-dent in nature. Take for example wind-pollination or coral spawning. If you're a sexual-ly functioning member of such a species, you get to play ball come mating season. An-other example would be females in a walrus colony. Every female in a colony gets tobreed with the dominant male, no matter how fit or unfit she is. Isotropic Selection iscertainly not without function either. For one, it dampens the speed with which a popu-lation runs uphill. It therefore acts as a safe-guard against a premature colonization of alocal -and possibly inferior- optimum.

Another mechanism available in Galapagos is Exclusive Selection, where only the top N%of the population get to mate:



If you're lucky enough to be in the top N%,you'll likely have multiple offspring. A goodanalogy in nature for Exclusive Selection wouldbe Walrus males. There's only a few harems to

go around and far too many males to assign them all (a harem of one female after all isnot really a harem). The flunkies get to sit on the side-line without a single chance to fa-ther a walrus baby, doing whatever it is walruses do when they can't get any action.

Another common pattern in nature is Biased Selection, where the chance of mating in-creases as the fitness increases. This is something we typically see with species that formstable couples. Everyone is basically capable of finding a mate, but the really attractiveindividuals manage to get a lot of hanky-panky on the side, thus increasing theirchances of becomes genetic founders for future generations. Biased Selection can be am-plified by using power functions, which have the effect of flattening or exaggerating thecurve.

Coupling Algorithms

Coupling is the process of finding mates. Once a genome has been elected to mate by theactive Selection Algorithm, it has to pick a mate from the population to complete the act.There are of course many ways in which mate selection could occur, but Galapagos atthe moment only allows one; selection by genomic distance. In order to explain this indetail, I should first tell you how a Genome Map works. This



is a Genome Map. It displays all the genomes (in-dividuals) in a certain population as dots on agrid. The distance between two genomes on thegrid is roughly analogous with the distance be-tween the genomes in gene-space. I say roughlybecause it is in fact impossible to draw a mapwith exact distances. A single genome is definedby a number of genes. We assume that all thegenomes in a species have the same number of

genes (this is not technically a limitation of Evolutionary Algorithms, even though it iscurrently a limitation of Galapagos). Therefore the distance between two genomes is anN-Dimensional value, where N equals the number of genes. It is not possible to accu-rately display an N-Dimensional point cloud on a 2-Dimensional screen so the GenomeMap is only a coarse approximation. It also follows that the axes of this graph have nomeaning whatsoever, the only information a Genome Map conveys is which genomesare more or less similar (close together) and which genomes are more or less different(far apart).

Imagine you are an individual that has been selected for mating (yay). The population iswell distributed and you are somewhere near the average (I'm sure you are a wildlyoriginal and delightful person in real life, but for the time being try to imagine you arein fact sort of average):



That red dot is you. Who looks attractive?

You could of course limit your search of poten-tial partners to your immediate neighbourhood.This means that you mate with individuals whoare very much like you and it means your off-spring will also be very much like you.

When this is taken to extremes we call it incestu-ous mating behaviour and it can become detri-mental pretty quickly. Biological incest has anasty habit of expressing unhealthy but recessivegenes, but in the digital world of EvolutionarySolvers the biggest risk of incest is a rapid de-cline in population diversity. Low diversity de-creases the chances of finding alternate solutionbasins and thus it risks getting stuck in local op-

tima.

The other extreme is to exclude everyone near you. You'll often hear it said that oppo-sites attract, but that's true only up to a point. At some point the genomes at the otherend of the scale become so different as to be incompatible.



This is called zoophilic mating and it can beequally detrimental. This is especially true whena population is not a single group of genomes,but in fact contains multiple sub-species, each ofwhich is climbing their own little fitness peak.

You defi-nitely donot wantto matewith amemberin a dif-ferentsub-

species, as the offspring would likely land somewhere in the middle. And since thesetwo species are climbing different peaks, "in the middle" actually puts you in a fitnessvalley.

It would seem that the best option is to balance in-breeding and out-breeding. To selectindividuals that are not too close and not too far. In Galapagos you can specify an in-breeding factor (between -100% and +100%, total out-breeding vs. total in-breeding re-spectively) that allows you to guide this relative offset:



Note that mate selection at present completelyignores mate fitness. This is something thatneeds looking into for future releases, but evenwithout any advanced selection algorithms thesolver still works.

Coalescence Algorithms

Once a mate has been selected, offspring needs to be generated. On the genetic level thisis anything but fun and games. The biological process of gene recombination is horren-dously complicated and itself subject to evolution (meiotic drive for example). The digitalvariant is much more basic. This is partially because genes in evolutionary algorithmsare not very similar to biological genes. Ironically, biological genes are far more digitalthan programmatic genes. As Mendel discovered in the 1860's, genes are not continu-ously variable qualities. Instead they behave like on-off switches. Genes in evolutionarysolvers like Galapagos behave like floating point numbers, that can assume all the val-ues between two numerical extremes.

When we mate two genomes, we need to decide what values to assign to the genes ofthe offspring. Again, Galapagos provides several mechanisms for achieving this.



Imagine we have two genomes of four genes each. There isno gender and no sex-based characteristics in the solver sothe combination of M and D is potentially a completely sym-metrical process. A mechanism that is somewhat synony-

mous with biological recombination is Crossover Coalescence.

In Crossover mating, junior inherits a random number ofgenes from mommy and the remainder from daddy. In thismechanism gene value is maintained.

Blend Coalescence will compute new values for genes based on both parents, basically av-eraging the values:

It is also possible to add a blending preference based on rela-tive fitness. If mum is fitter than dad for example, her genevalues will be more prominent in the offspring:



Mutation Factories

All the mechanisms we have discussed so far (Selection, Coupling and Coalescence) aredesigned to improve the quality of solutions on a generation by generation basis. How-ever all of them have a tendency to reduce the bio-diversity in a population. The onlymechanism which can introduce diversity is mutation. Several types of mutation areavailable in the Galapagos core, though the nature of the implementation in Grasshop-per at the moment restricts the possible mutation to only Point mutations.

Before we get to mutations though, I'd like to talk briefly about Genome Graphs. A pop-ular way to display multi-dimensional points on a two-dimensional medium is to drawthem as a series of lines that connect different values on a set of vertical bars. Each barrepresents a single dimension. This way we can quite easily display not just points withany number of dimensions, but even points with a different number of dimensions inthe same graph:



Here for example we have a genome consisting of 5genes. This genome is thus a point in the 5-dimensionalspace that delineates this particular species. When G0is drawn at !, it means that the value is one-third be-tween the minimum and maximum allowed limits. The

benefit of this graph is that it becomes quite easy to spot sub-species in a population, aswell as lone individuals. When we apply mutations to a genome, we should see achange in the graph, as every unique genome has a unique graph.

The above modification shows a Point Mutation, wherea single gene value is changed. This is currently theonly mutation type that is possible in Galapagos. Wecould also swap two adjacent gene values, in whichcase we get an Inversion Mutation:

Inversion mutations are only useful when subsequentgenes have a very specific relationship. It tends to dras-tically modify a genome and thus in most cases alsodrastically modify fitness. This is almost always a detri-mental operation.

Two examples of mutations that cannot be used on a species which requires a fixednumber of genes are Addition and Deletion mutations.



Conclusion

Galapagos is still a very young product and hasn't really had time to position itself firm-ly in any work-flow, provided that it could. It seems to be capable of solving relativelysmall problems quite quickly, but it certainly needs a lot of work to make it more robustand usable. It is likely that the most effective applications for a solver of this type andcapability are small or partial problems. To try and evolve anything complicated will al-most certainly result in frustration.

GH Galapagos

Documents

Transcript of GH Galapagos