An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics...

28
An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1

Transcript of An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics...

Page 1: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

1

An Introduction to Genetic Algorithm (GA)

By:Dola Pathak

For:STAT 992:Computational Statistics

SPRING 2015

Page 2: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

2

Motivating Example

Motivating ExampleThe data were collected at Baystate Medical Center in Spring- field, Massachusetts, in 1986, on 189 births and the response variable was the birth weight of the baby. There were eight explanatory variables/factors which were considered. This data, called the low birth weight data, is in the Appendix I of Hosmer and Lemeshow’s book “Applied Logistic Regression”.

What is the best fitted model? Are all the variables important? How many models are possible with main effects only– 256! Any alternatives?

Page 3: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

3

What is Genetic Algorithm (GA)?

A genetic algorithm is an optimization method based on natural selection.

Page 4: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

4

The concept

• Inspired by Darwin’s Theory of evolution popularly termed as “Survival of the fittest”

• Fittest – ability to adapt to the environment which increases the likelihood of survival

• The fitness is maintained through the process of reproduction, crossover and mutation.

Page 5: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

5

From the statistical point of view

A GA is a stochastic technique with simple operations based on the theory of natural selection. The basic operations are :

1. Selecting population members for the next generation 2. Mating these members via crossover of “chromosomes”In statistical terms chromosomes will be the individual members of the population and the genes of the chromosomes are the variables.

3. Performing mutations on the chromosomes to preserve population diversity so as to avoid convergence to local optima. 4. Finally, determining the fitness of each member in the new generation using an evaluation (fitness) function. This fitness influences the selection process for the next generation.

Page 6: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

6

Development

• Alan Turing -----“learning machine” ----- evolution process

• In 1960s the idea of evolution strategy/programming was developed in Germany and in the USA.

• John Holland and his students in the early 1970s (book: "Adaption in Natural and Artificial Systems"(1975)).

• David Goldberg, who was a graduate student of Holland’s presented an award-winning doctoral thesis on his GA application to gas pipeline optimization.

Page 7: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

7

Why Genetic Algorithm?

• Exhaustive search (where every possible combination is checked)

- accurate result - time consuming- inefficient when the numbers of variables are large - optimizers tend to : - limit the number of variables they use or - limit the number of values these variables can take.

• The genetic algorithm (does not try every possible combination)

- optimal solution by selecting variables intelligently. - far more variables can be utilized - all values of a variable can be checked- simultaneous searching - less likely to become stuck in "local minima" - less time- efficient

Page 8: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

8

Where are Genetic Algorithms applied?

There are at least three situations where genetic algorithms are applied:

1. The objective function is not smooth (i.e., not differentiable).

2. There are multiple local optima( often happens with non linear functions).

3. There are a large number of parameters (the meaning of “large” keeps changing).

Page 9: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

9

Outline of the Basic Genetic Algorithm

[Start] Generate random population of n chromosomes (suitable solutions for the problem)

[Fitness] Evaluate the fitness f(x) of each chromosome x in the population

[New population] Create a new population by repeating the following steps until the new population is complete

[Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)

Page 10: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

10

[Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.

[Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).

[Accepting] Place new offspring in a new population

[Replace] Use new generated population for a further run of algorithm

[Test] If the end condition is satisfied, stop, and return the best solution in current population

[Loop] Go to step 2

Page 11: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

11Scrucca (April, 2013)

Page 12: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

12

Example 1Going back to the example that was presented at the beginning of the presentation, let us see how a first

generation of optimum variable values are calculated using a Genetic Algorithm.

Page 13: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

13

The model

The multiple linear regression equation:=1.0116+0.0033*x+0.0019*y+0.007*z#x= age , 14<x<45#y=weight of the mother in her lmp in pounds, 80<y<250#z= ftv, 0<z<6

Page 14: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

14

STEP1: Generate random population of 4 chromosomes (individuals)> set.seed(7777)> x<-floor(runif(4, 14,45))> y<-floor(runif(4, 80,250))> z<-floor(runif(4, 0,6))> population_chrom<-matrix(c(x,y,z),byrow = TRUE,nrow=3,ncol=4) #population units

PARENT POPULATIONpopulation_chrom [,1] [,2] [,3] [,4][1,] 14 29 33 44[2,] 144 230 229 196[3,] 2 3 5 1

STEP2 Evaluate the fitness f(x) of each chromosome x in the population> fitness<-fit(population_chrom)> fitness [,1] [,2] [,3] [,4][1,] 1.3454 1.5653 1.5906 1.5362

Page 15: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

15

STEP 3: Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)Determining the new ordering> prob<-objective/total #probabilities associated with each chromosome

> set.seed(9001)

> gen_prob<-runif(4, min=0, max=1)

> data.frame(t(prob), gen_prob)

t.prob. gen_prob

1 0.2228406 0.2316537

2 0.2592629 0.9483070

3 0.2634534 0.1912030

4 0.2544431 0.6869134

#generating the new order of the parent population (chromosomes) for the 1st generation> Chrom_reordered

[,1] [,2] [,3] [,4]

[1,] 29 14 33 44

[2,] 230 144 229 196

[3,] 3 2 5 1

Page 16: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

16

STEP 4: CROSSOVER

Let the crossover probability, We generate random number between 0 and 1 and perform crossover between those chromosomes whose randomly generated numbers are less than 0.35

> set.seed(7755)> for(k in 1:4){ R[k]<-runif(1, min=0, max=1) R[k] }> R [,1][1,] 0.5204274[2,] 0.9641516[3,] 0.2732612[4,] 0.2740294

It is seen that chromosome # 3 and #4 will undergo crossover and we determine the position at which they will undergo the crossover by:

> set.seed(6655)> cross_pos<-floor(runif(1, min=1, max=2))

> cross_pos [1] 1

Page 17: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

17

Crossover:

After Crossover:> Chromosome_cross

[,1] [,2] [,3] [,4]

[1,] 29 14 33 44

[2,] 230 144 196 229

[3,] 3 2 1 5

> Chromosome_cross [,1] [,2] [,3] [,4] [1,] 29 14 33 44 [2,] 230 144 196 229 [3,] 3 2 1 5

Page 18: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

18

STEP 5:MutationLet the mutation probability,

Total length of genes= #of chromosomes*genes per chromosome#of genes to be mutated =* total length of the gene =2 (rounded)

Thus of the 24 position of the genes, two positions are randomly selected to be mutated.> set.seed(7761)> #values to be mutuated> floor(runif(2, min=1, max=12))[1] 5 4

The 5th position is in the 2nd population and represents weight of the mother (ranges from 80-250) and the 4th position is also in the 2nd population and it represents the age of the mother (14 to 45 years).Two numbers are generated within the range of values for lwt and age of the mother and these are mutated in the 5th and 4th position of the second chromosome.> #for position 5: lwt> floor(runif(1, min=80, max=250))[1] 235> # for position 4: Age> floor(runif(1, min=14, max=45))[1] 29> Chromosome_cross[2,2]<-235> Chromosome_cross[1,2]<-29

Page 19: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

19

1st Generation:> first_gen<-Chromosome_cross> first_gen

[,1] [,2] [,3] [,4]

[1,] 29 29 33 44

[2,] 230 235 196 229

[3,] 3 2 1 5

Comparing the fitness function:> fitness_new<-fit(first_gen)> data.frame(t(fitness),t(fitness_new)) t.fitness. t.fitness_new.1 1.3454 1.56532 1.5653 1.56783 1.5906 1.49994 1.5362 1.6269

Page 20: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

20

Example 2: Efficiency of GA in searching for the best fitted model?

(Using the package glmulti)The basic model is: age + lwt + race.cat + sm + preterm + ht + ui + ftv.cat

Without interactionCASE 1: Exhaustive Search

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.3051026 0.1107701 11.782 < 2e-16 ***race.catBlack -0.2117263 0.0659755 -3.209 0.001575 ** race.catOther -0.1524077 0.0510387 -2.986 0.003217 ** preterm1+ -0.0944738 0.0606883 -1.557 0.121287 lwt 0.0018382 0.0007607 2.416 0.016666 * sm -0.1474951 0.0477114 -3.091 0.002307 ** ht -0.2606070 0.0904447 -2.881 0.004438 ** ui -0.2225058 0.0617916 -3.601 0.000409 ***

Page 21: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

21

CASE 2: GA SearchCoefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.3051026 0.1107701 11.782 < 2e-16 ***race.catBlack -0.2117263 0.0659755 -3.209 0.001575 ** race.catOther -0.1524077 0.0510387 -2.986 0.003217 ** preterm1+ -0.0944738 0.0606883 -1.557 0.121287 lwt 0.0018382 0.0007607 2.416 0.016666 * sm -0.1474951 0.0477114 -3.091 0.002307 ** ht -0.2606070 0.0904447 -2.881 0.004438 ** ui -0.2225058 0.0617916 -3.601 0.000409 ***

COMPARISON:

Page 22: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

22

With pairwise interaction:CASE 1:Exhaustive SearchFor the Exhaustive case the iterations went on for a long time and it failed to give the best fitted model. The program ran for approximately 710 minutes before it was terminated.

Page 23: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

23

CASE 2: GA SearchCoefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.243534 0.159220 7.810 5.27e-13 ***race.catBlack -0.155676 0.066214 -2.351 0.01985 * race.catOther -0.102393 0.052312 -1.957 0.05191 . preterm1+ -0.590434 0.299435 -1.972 0.05022 . age 0.019363 0.005991 3.232 0.00147 ** lwt -0.001286 0.001080 -1.191 0.23523 sm 0.276818 0.203220 1.362 0.17492 age:sm -0.017958 0.008412 -2.135 0.03419 * age:ht -0.028554 0.012304 -2.321 0.02147 * lwt:ht 0.002484 0.001827 1.360 0.17567 age:ui -0.010769 0.002614 -4.120 5.86e-05 ***preterm1+:lwt 0.004184 0.002398 1.745 0.08278 . age:ftv.catNone -0.023259 0.006973 -3.336 0.00104 ** age:ftv.catMany -0.022184 0.010763 -2.061 0.04079 * lwt:ftv.catNone 0.003848 0.001244 3.094 0.00231 ** lwt:ftv.catMany 0.003630 0.001970 1.843 0.06705 . ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Page 24: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

Comparison of time taken for the methods to converge:

Conclusion: It is seen that for models with pairwise interactions, GA gets to the best fitted model much faster than the exhaustive search.

Method Exhaustive Search Genetic Algorithm

Time taken to find the best fit model

710.7032 minutes

(failed to give the optimum solution)

0.5055 minutes

(260 generations)

Page 25: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

25

GAs are used for

• Building Double Threshold Auto Regressive Conditional Heteroscedastic (DTARCH) models• Multilayer neural network• Optimal Component Selection for component based systems• Robotics• VLSI layout optimization• Modelling the behavior of economic agents in

macroeconomic models, etc.

Page 26: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

26

Computers and GA:With the advent of different packages and softwares the use of GA has definitely gained popularity.Apart from glmulti, some of the packages which I came across while researching for this presentation were: GA genalg PROC GA PROC CONNECT ( uses Parallel Genetic Algorithm) Genetic Algorithm Toolbox in MATLAB GAlib (available for the UNIX system)

Page 27: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

27

References and further readings:• Denny Hermawanto. Genetic Algorithm for Solving Simple Mathematical Equality

Problem.

• Hillier and Lieberman. Introduction to Operations Research (9th Ed.)• Hosmer, D.W. and Lemeshow, S. Applied Logistic Regression. New York: Wiley • Givens & Hoeting.Computational Statistics (2nd Ed.) • K.F Man, et all. Genetic Algorithms: Concepts and Applications. IEEE

Transactions on Industrial Electronics, Vol.43 No 5, Oct 1996• Luca Scrucca. GA: A Package for Genetic Algorithms in R. Journal of Statistical

Software, Vol 23 , Issue 4, April 2013

• V. Calcagno, et all. glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models. Journal of Statistical Software, Vol 24 Issue 12, May 2010.

• http://www.obitko.com/tutorials/genetic-algorithms/

• http://www.r-bloggers.com/genetic-algorithms-a-simple-r-example/

Page 28: An Introduction to Genetic Algorithm (GA) By: Dola Pathak For: STAT 992:Computational Statistics SPRING 2015 1.

28

Thank you