Faster Evolutionary Multi-Objective Optimization via GALE: the Geometric Active Learner

Joseph Krall

In partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science.

College of Engineering and Mineral Resources

Faster Evolutionary Multi-Objective Optimization via GALE, the Geometric Active Learner

a Ph.D. Final Defense Presentation for the

Special Thanks to the NASA Ames Research Center

The Lane Department of Computer Science and Electrical Engineering

at

April 21, 2014

Estimated Duration: 45 minutes

4/21/2014 Faster Multi-Objective Optimization via GALE

A Thesis Proposal

- “JMOO: Tools for Faster Multi-Objective Optimization”

Comments from Committee

- Lacking Rigor

- Generalizability of Proposal

- Lacking Details / Misunderstandings

- Some Missing Related Works

- Validity Concerns

- Needed More – Not Substantial Enough

Last Time 1. Introduction

November, 2013

SE or CS?

2/48


Final Dissertation - “Faster Multi-Objective Optimization via GALE” Key Changes from Proposal - Focus on Contributions of GALE - Focus on Assessing and Validating GALE - Very rigorous experimental methodology - Addressing Comments from Proposal - Expansive Related Works - Formalizing the Field - MANY more experimental results

This Time

Spring!

…Sort of

April, 2014

1. Introduction

3/48


Search & Optimization of Goals - the art of decision making - e.g. shortest time city navigation - e.g. managing calorie intake for diets Not always trivial - Landing an airplane safely - Maximizing software project profits

MOO = Multi-Objective Optimization - Draft solutions to a problem (red) - Find Pareto Frontiers (green) - Report to a decision maker

This Thesis

Areas on the Pareto frontier

Rejected Solutions

Who do I pick???

1. Introduction

4/48


Increasing Interest

The Field of MOO

Agile Project Studies

Aircraft Studies

Software Engineering (SE) General MOO

(MOO) Coello: http://delta.cs.cinvestav.mx/˜ccoello/EMOO/EMOObib.html (SE) CREST: http://crestweb.cs.ucl.ac.uk/resources/sbse_repository/repository.html

* Data from :

8000 Papers Since the

1950’s

1. Introduction

In this thesis: SE and CS

5/48


[Sayyad & Ammar 2013] Report:

- NSGA-II and SPEA2 are the most popular search tools today

Popular Search Tools Evaluate Too Much - O(N2) internal search: fast if solution evaluation is a cheap operation

- Need to count number of evaluations instead: O(2NG)

This Thesis Proposes GALE: O(2Log2(NG)) - GALE adds data mining to evaluate only the most-informative solutions

Main Message Introduction

GALE: 597s

NSGA-II: 14,018s

N = population size G = number of generations

6/48


Aircraft Studies for Safety Assurance

- Complex Simulations at NASA [8 seconds per run]

Standard MOO Tools

- Many [300] weeks

GALE

- Many [300] hours

Applications of MOO

!

* Asiana Flight Wreckage, Summer 2013

(50400 hrs)

(1.8 wks)

1. Introduction

7/48


GALE is a Meta-heuristic Search Tool

- Too difficult (maybe impossible) to “prove”

- Can only be experimented -> Generalizability (External Validity) concerns

-> A MOO Critique to Improve Validity

Research Questions

- Evaluations

- Runtime

- Solution Quality

Assessing GALE

4 Experimental Areas: - #1 Aircraft Safety (CDA)

- #2 Agile Projects (POM3) - #3 Constrained Lab Problems

- #4 Unconstrained Lab Problems

SE or CS?

SE CS

CS CS

1. Introduction

8/48


GALE shown to be a strong rival to NSGA-II & SPEA2

And The Results

Two orders of magnitude fewer evaluations for all

models

Two orders of magnitude faster (seconds) for big

models

Better Solution Quality

SPEA2 much slower

GALE Never worse NSGA-II/SPEA2 Never better

1. Introduction

9/48


Background 2

In this chapter: - Formalities - Definitions

- Related Works

1. Introduction

2. Background

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

10 Slides

10/48


Mathematical Programming: [Dantzig] - The aim is to find solutions that optimize objectives - Transformation functions transform decisions (x) into objectives (y) - Solutions are infeasible if they do not satisfy constraint functions

Formalities 2. Background

objectives

Constraint functions Optimality direction

Transformation functions

a. Defines

11/48


Lab Problems

- Schaffer, Viennet, Tanaka, etc.

Real-world Problems

- Simulations

- Too complex for math

- Aircraft Safety

- Software Dev. Profit

Kinds of Models

The Schaffer Model

2. Background a. Defines

12/48


Early methods assumed math models

- A bad assumption for real world practicality

They also assume other aspects:

- Concave vs. Convex

- Differentiability

- Linear vs. Non-linear

- Single vs. Multi-objective

- Objective Functions vs. Simulation

Numerical Optimization 2. Background b. Early Methods

13/48


Exterior Search [Dantzig]

- For Linear problems ( [Nelder & Mead 1965] made a non-linear version)

- Embed a simplex with solutions along the vertices

- Traverse along the nodes

- Good average Complexity

- But bad O(N3) worst case

Simplex Search

Nelder, John A.; R. Mead (1965). "A simplex method for function minimization". Computer Journal 7: 308–313.

2. Background b. Early Methods

14/48


Karmarkar’s Algorithm – [Karmarkar 1984]

- Good for big data

- Fast convergence

- Polynomial complexity

- 50x faster than Simplex

- Single-Objective Only

- Requires Concavity

Interior Point Methods

Narendra Karmarkar (1984). "A New Polynomial Time Algorithm for Linear Programming", Combinatorica, Vol 4, nr. 4, p. 373–395.

2. Background b. Early Methods

15/48


Moving onward from Numerical Methods

- Improve a heuristic, not the actual objectives

- Hill Climbing: Accept only improved steps

- Tabu Search: Refuse only recently attempted steps

- Simulated Annealing: Early bad okay, late bad refused

Heuristic-based Searches 2. Background c. Recent Methods

16/48


Particle Swarm Optimization [Kennedy 1995]

- Real life swarms; flocks of birds, etc

- Swarm towards good solutions

- Self best and Pack best

Ant Colony Optimization [Dorigo 1992]

- Ant Colony Path Searches

- Pheromone density = best path

PSO & ACO

Kennedy, J.; Eberhart, R. (1995). "Particle Swarm Optimization". Proceedings of IEEE International Conference on Neural Networks IV. pp. 1942–1948.

M. Dorigo, Optimization, Learning and Natural Algorithms, PhD thesis, Politecnico di Milano, Italy, 1992.

2. Background c. Recent Methods

17/48


Standard EA (Evolutionary Algorithm): 1) Build initial population

2) Repeat for max_generations:

a) crossover

b) mutation

c) select

3) Return final population

Evolutionary Algorithms

a+b) Build Offspring: Perturb Population c) Combine Offspring + Population c) Cull the worst solutions to retain Population Size

* Malin Åberg: http://physiol.gu.se/maberg/images.html


18/48


NSGA-II [Deb 2002]

- Non-dominated Sorting Genetic Algorithm

- Standard select+crossover+mutation

- Sort by ‘bands’, or domination ‘depth’

- Break ties based on density

- crowding distance

NSGA-II

Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. (2002). "A fast and elitist multiobjective genetic algorithm: NSGA-II". IEEE Transactions on Evolutionary Computation 6 (2): 182


19/48


SPEA2 [Zitzler2002]

- Strength Pareto Evolutionary Algorithm

- Standard select+crossover+mutation

- Sort by ‘strength’: count of solutions someone dominates

- Truncate crowded solutions via nearest neighbor

SPEA2

E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. Evolutionary Methods for Design Optimization and Control with Applications to Industrial Problems, 95--100, 2001.


20/48


MOO Critique 3

In this chapter: - Survey - Rigor

1. Introduction

2. Background

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

4 Slides

21/48


Experimental Rigor

- Want to maximize validity

- Because reasons to doubt GALE

- Still does good with few evals?

- Can still run fast?

We looked at literature for advice - Search query targeted these questions:

- Ended up selecting 21 papers

Survey of MOO

Statistical Methods? - [Demsar2006]: recommends KS-Test + Friedman + Nemenyi

* J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” ˇ J. Mach. Learn. Res., vol. 7, pp. 1–30, Dec. 2006.

Population size? - 20 ~ 100 is good. - Over 200 is a waste

Number of Repeats? - [Harman 2012]: 30-50 is common. - This Thesis: 20.

* M. Harman et al., Search based software engineering: techniques, taxonomy, tutorial. In Empirical Software Engineering and Verification, Bertrand Meyer and Martin Nordio (Eds.). Springer-Verlag, Berlin, Heidelberg 1-59.

3. MOO Critique

22/48


1. Use variety of models – Real World Models: Practicality.

– Standard Models: Reproducibility.

– Constrained and Unconstrained: Generalizability

2. How many Repeats – Pragmatics: Keep repeats low to save on computational cost

– Statistics: Want high repeats for statistical stability

– The middle ground: for n in 20,30,40: no change. So 20 is good.

Principles 1 & 2

Many papers used only lab models

- 7 Constrained - 13 Unconstrained - 1 Privatized (CDA) - 1 Public (POM3)

In this thesis: Standard Models Real World Models

Constrained Lab

Unconstrained Lab Public

Privatized Use models from all quadrants:

3. MOO Critique

23/48


3. Statistical Methods – Based on Demsar’s Recommendations

– Begin with Kolmogorov-Smirnov (KS-Test) to test normality

• Data rarely conforms to normality assumptions

– For two-group testing, use Wilcoxon Rank Sum (WRS) Test

– For Multi-group testing, use Friedman Test + Nemenyi

4. Runtimes – Report runtimes to aid reproducibility arguments

– Report details of machine

Principles 3 & 4 3. MOO Critique

Most papers failed to address number of groups

Half of the papers neglected to report runtimes

24/48


5. Number of Evaluations – Report number of evaluations

– Because they dominate runtime of real-world models

6. Parameters – Define all parameters carefully

– Reproducibility concerns: pop. Size, #gens, stopping criteria

7. Discuss Threats of Validity – Don’t make the reader do all the work

– Rigorous Experimental Methods = Stronger Conclusions

Principles 5-7

Half of the papers neglected to report evaluations

Almost no one had a threats to validity section in their paper

3. MOO Critique

25/48


GALE 4

1. Introduction

2. Background

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

In this chapter: - Spectral Learning - Active Learning

5 Slides

26/48

GALE: Geometric Active Learning (Evolution)

- At most O(2Log2N) evaluations per generation

- Exactly Θ(2N) evaluations for NSGA-II, SPEA2

Main Differences in GALE:

- cluster solutions

- evaluate some, not all

- Directed vs random

- More on these later


Introducing GALE 4. GALE

GALE NSGA-II SPEA2

Asymptotic Notation: Big-O: worst case

Big-Theta: Exact case

27/48


Three key phrases to talk about

1. Active Learning - Minimize cost of evaluation

- Learn more from using less [Settles 2009]

2. Spectral Learning (WHERE) - Reasoning with eigenvectors via covariance matrix

- “Spectral Clustering” – via eigenvectors

- FastMap finds eigenvectors faster than PCA

3. Directed Search - Shove solutions along promising directions

Components to GALE

some, not all

clustered spectrally

Directed mutation

4. GALE

28/48


Algorithm shown here and explained over next several slides - WHERE algorithm - WHERE uses FastMap - Directed Mutation

1. Build initial population, P0. Initialize generation: t = 0. Set Life = 3. 2. Repeat until stopping criteria is met (stop if life == 0):

a. Run WHERE (with pruning) to select Rt = dominant leafs from WHERE. b. Perform Directed Mutation on members of Rt. c. Copy Rt into Pt+1 and generate new random candidates until new population is full. d. Increment generation number t = t + 1. e. Collect stats and evaluate stopping criteria. Decrement life if no improvement to any

objective.

3. Run WHERE (without pruning) to select Rt = dominant leafs from WHERE. 4. Rt contains approximations to the Pareto frontier.

GALE Pseudo-Code GALE

Spectral Learning Active Learning Directed Search

29/48


Spectral clustering is O(n3) [Kumar12]

- Common method: PCA

- The Nystrom Method reduces to near-linear

- Low-rank approx. of covariance matrix

e.g.: FastMap is a Nystrom Algorithm [Platt05] - 1) Pick an arbitrary point, z.

- 2) Let ‘east’ be the furthest point from z.

- 3) Let ‘west’ be the furthest point from ‘east’.

- 4) Project all points onto the line east-west

- 5) east-west is the first principal component

Nystrom Method GALE

east

west

c

b

a x

Active Learning: - Only evaluate East & West!

30/48


WHERE = Spectral Learning in GALE

- Similar to Boley’s PDDP: find first eigenvector and recursively split

- PDDP uses PCA. WHERE uses FastMap.

The WHERE Tool GALE

Initial population

WHERE clusters initial population = Spectral Learning

Only evaluate the best clusters =

Active Learning

Mutate along those clusters = Directed Search

At Most 2Log2(NG) Evaluations (N=Population Size. G=Number of Generations)

Refill the Population

Non-dominated clusters

31/48


Models 5

1. Introduction

2. Background

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

In this chapter: - CDA

- POM3 - Lab Models

4 Slides

32/48


5. Models

Continuous Descent Arrival

- NASA wants to know if CDA is doable

- Standard descents are less efficient than CDA -> more {noise, time, fuel, $$$}

- CDA might unnecessarily strain air traffic control (ATC)

CDA Model a. CDA

33/48


Lots of work - 2 months at NASA Ames Research Center

- CDA not pre-assembled

Inspiration from 2013 Asiana Flight Crash - Pilots had to do unusually more tasks than normal

- Keeping airspeed nominal was a task they ‘forgot’

- Human Factors model a pilot ‘HTM’ = maximum human taskload

Goal of CDA: less forgetting, less time from delays and missed tasks

* based on Work Models that Compute by Pritchett, Kim and Feigh, 2011-2013

Building CDA 5. Models a. CDA

34/48


POM3

- Model of Agile Software Requirements Engineering

Agile Software Projects

- Programmers rush to complete tasks

- But what tasks get most priority?

Requirements Prioritization Strategies

- Find good schemes that optimize objectives

POM3

Repeat 2 < N < 6 times: 1. Collect Tasks 2. Prioritize Tasks 3. Execute Tasks 4. Find New Tasks 5. Adjust Priorities

Objectives to Minimize - Total Cost - % Idle Rate of Teams

Objectives to Maximize - % Completion of Tasks

* POM3 based on POM2 based on POM by Portman, Owens, Menzies (2008, 2009)

5. Models b. POM3

35/48


We explore all these: The Constrex Model

Standard Lab Models

Unconstrained Constrained

Fonseca BNH

Golinski Constrex

Kursawe Osyczka2

Poloni Srinivas

Schaffer Tanaka

Viennet2-3-4 TwoBarTruss

ZDT1-2-3 Water

ZDT4-6

5. Models c. Lab

36/48


Experiments 6

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

1. Introduction

2. Background

4 Slides

37/48

In this chapter: - Results - Analysis


Research Questions:

- Number of Evaluations

- Runtime

- Quality of Solutions

4 Experiment Areas:

- #1 Aircraft Safety

- #2 Agile Software Development

- #3 Constrained Lab Models

- #4 Unconstrained Lab Models

Experimental Methods 6. Experiments

1. Run the Model 500 times 2. Collect an average-case baseline 3. Compute loss (x, baseline) for each solution x

4. The median loss is the “Quality Score”

o = number of objectives

Quality Score: > 1.0: Loss in Quality from Baseline = 1.0: No Change from Baseline < 1.0: Improvement from Baseline

[Zitzler & Kunzli 2004]

38/48


Experiment GALE NSGA-II SPEA2

#1 Aircraft Safety (CDA Model)

50 +++

2800 =

2450 =

#2 Agile Software (POM3 Models)

36-46 +++

3000-3550 =

3050-3300 =

#3 Constrained Lab Models

28-88 +++

1050-3250 =

950-3150 =

#4 Unconstrained Lab models

26-45 +++

1250-3550 =

1250-3250 =

RQ1: Number of Evaluations

GALE needed two orders of magnitude fewer evaluations

6. Experiments

39/48




6 – 20mins +++

3 – 5hrs =

3 – 5hrs =


1.5 – 9.5s ++

4.0 – 108s =

12 – 109s =


0.5 – 1.5s =

0.5 – 1.0s =

3 – 30s –


0.5 – 2.5s =

0.5 – 1.0s =

3 – 30s –

#5 – 16 Modes of the CDA Model

83 hours 6 months 6 months

RQ2: Runtime

GALE needed two orders of magnitude lesser runtime

6. Experiments

GALE enabled an even larger

study on CDA

NSGA-II and SPEA

weren’t used in #5,

so these values were extrapolated

from #1

40/48




0-0-2 =

0-0-2 =

0-0-2 =


0-0-6 =

0-1-5 =

1-0-5 =


12-0-2 +

0-6-8 =

0-6-8 =


10-3-13 +

1-5-20 =

2-5-19 =

RQ3: Solution Quality

Displays are ‘Wins-Losses-Ties’ Format GALE never loses. GALE usually wins.

KS-Test + Friedman + Nemenyi at the 99% Level

6. Experiments

41/48


Threats to Validity

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

7

1. Introduction

2. Background

1 Slide

42/48

In this chapter: - Validity


Most threats were already addressed

Others too trivial for this presentation

Threats to Validity 7. Validity

43/48


Conclusion

3. MOO Critique

4. GALE

5. Models

6. Experiments

7. Validity

8. Conclusion

8

1. Introduction

2. Background

3 Slides

44/48

In this chapter: - Summary

- Ending


Popular MOO Tools Need O(2NG) Evaluations

- Very slow for large models

GALE: Geometric Active Learning (Evolution)

- Add Data Mining to Search

- Evaluate only most informative Solutions

- At most O(2LogNG) Evaluations (usually less than that)

- Enables large studies with large models

- Finds good solutions for wide

variety of models

Summary 8. Conclusion

N = population size G = number of generations

Active Learning: - Only evaluate East & West!

Standard Models Real World Models

Constrained Lab

Unconstrained Lab Public

Privatized

45/48


Developed principles for rigorous experiments

Employed those principles for our experiments

Principles 8. Conclusion

46/48


GALE a clear winner

Results of Experiments

#1 #2 #3

8. Conclusion

47/48


The End

48/48

Blue Guy Clipart Collection

Faster Evolutionary Multi-Objective Optimization via GALE: the Geometric Active Learner

Software

Transcript of Faster Evolutionary Multi-Objective Optimization via GALE: the Geometric Active Learner