Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

47
Parallelization of Stochastic Metaheuristics to Achieve Linear Speed- ups while Maintaining Quality Course Project Presentation : Mustafa Imran Ali Ali Mustafa Zaidi

description

Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality. Course Project Presentation : Mustafa Imran Ali Ali Mustafa Zaidi. Outline of Presentation. Brief Introduction Motivation for this problem Simulated Annealing (SA) Simulated Evolution (SimE) - PowerPoint PPT Presentation

Transcript of Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Page 1: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Course Project Presentation:Mustafa Imran AliAli Mustafa Zaidi

Page 2: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Outline of Presentation Brief Introduction

Motivation for this problem Simulated Annealing (SA) Simulated Evolution (SimE)

Related Work Classification of Parallel Strategies Previous efforts

Our Efforts and Results Low-Level Parallelization of Both Heuristics Domain-Decomposition Parallelizations of SimE Multithreaded-Search implementations of SA and SimE

Conclusions Drawn, Future Work

Page 3: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Motivation Stochastic Heuristics are utilized to solve a wide variety of

combinatorial optimization problems

VLSI CAD, Operations Research, Network Design etc.

These heuristics attempt to find near-optimal solutions by performing an ‘intelligent search’ of the ‘solution space’

Each algorithm has a built in ‘intelligence’ that allows it to move towards successively better solutions.

But they usually require large execution times to achieve near optimal solutions.

Page 4: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Motivation Parallelization of Heuristics aims to achieve one of two basic goals:

Achieve Speedup Achieve Better Quality Solutions

Parallelization of Heuristics different from Data and Functional parallelism:

Different strategies alter the properties of basic algorithm in different ways Thus, Quality of solution achievable is sensitive to parallelization strategy

Key is to select/develop parallelization strategy that has maximum positive impact on heuristic

With respect to the goals we are trying to achieve: For better quality solutions, strategy should enhance algorithmic “intelligence”. For better runtimes, strategy should focus on reducing work-done per processor.

Page 5: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Our Objectives Our goal is to explore, implement and

develop parallelization strategies that allow us to:

Achieve near-linear speedup (or best effort)

Without sacrificing quality (fixed constraint)

Speedup trends should be as scalable as possible

Page 6: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Approaches taken and Considerations In order to effectively consider this problem, we

must look at it in terms of several aspects: The nature of the Heuristics themselves The nature of the Parallel Environment The nature of the Problem Instance and Cost Functions

All three factors influence both the Runtime, and Achievable Solution Quality

of any parallelization strategy.

For our task, we must optimize all three factors.

Page 7: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Introduction to Simulated Annealing

Basic Simulated Annealing Algorithm With Metropolis Loop

Page 8: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Introduction to Simulated Evolution

Basic Simulated Evolution Algorithm

Page 9: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Related Work Classification of Parallelization Strategies for

Metaheuristics [ref]1. Low-Level Parallelization2. Domain Decomposition3. Parallel or Multithreaded Search

Previous work done for Parallel SA Extensive – general as well as problem specific All three approaches tried,

Type 3 most promising – still active area of research

Previous work done for Parallel SimE Minimal – Type 2 parallelization strategy proposed, both by

designers of SimE

Page 10: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Our Efforts Starting Point for our work this semester:

Basic version of Type 3 Parallel SA Basic version of Type 2 Parallel SimE

Parallelization of SA: Several Enhanced versions of Type 3 Implementation of Type 1

Type 2 not implemented because…

Parallelization of SimE Several Enhancements to Basic Type 2 Implementation of Type 1 Implementation of Type 3

Page 11: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Basic Type 3 Parallel SA Basic Type 3 Parallel SA:

Based on Asynchronous Multiple Markov Scheme developed in [ref]

Best Type 3 scheme developed for SA to date.

Primarily intended for improving solution qualities achievable over serial version

Page 12: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Basic Type 3 Parallel SA Algorithm

Page 13: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type 3 Parallel SA – Second Version Strategy 2 - Speed-up Oriented Type 3 Parallel SA

From the above starting point, we saw that high-quality characteristics of basic Type 3 may be exploited to produce a speed-up oriented version

Expected to be Capable of achieving quality equivalent to serial version, but not better

While providing near-linear runtimes

Near-linear runtimes forced by dividing workload on each processor by number of processors

M/p metropolis iterations instead of M.

Page 14: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Speed-up Oriented Type 3 Parallel SA Speedup oriented Type 3 Parallel SA

Results consistently show a 10% drop in achievable solution quality from the Serial version

Runtimes show near-linear speedups, as expected

Page 15: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Lessons Learned from Strategy 2 We reasoned that

The 10% quality drop occurs due to negative impact on the ‘intelligence’ of the parallel SA.

To restore achievable quality, we must counteract this effect.

Intelligence of SA: Lies in the “Cooling Schedule” [ref Dr. Sait’s book] Division of workload directly tampers with the Cooling Schedule.

Proposed Solution: Attempt to optimize the Cooling Schedule to take into account the

parallel environment – a “Multi-Dimensional” Cooling Schedule. To maintain Speedup, M/p remained unchanged, while other

parameters varied across processors

Page 16: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Third Version of Type 3 Parallel SA Strategy 3 – Varying Parameter settings across different

processors Expected that this would result in more effective search of the

solution space.

Several sets of parameter settings were tried Primarily by Varying and T across processors Processors with higher T and lower would perform more random

search Processors with lower T and higher would be more greedy Intermittent sharing of information should diversify search from same

position.

Results obtained from these versions: NO IMPROVEMENT OF QUALITY OVER STRATEGY 2 Show results (hadeed_1)

Page 17: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Lessons Learned from Strategy 3 Based on the last two attempts, we reasoned that:

M/p drastically reduces time spent searching for better solutions at a given temperature

Lower temperatures achieved quicker, Adverse effect on hill-climbing property of Heuristic.

Simple division of M/p inadequate for sustaining quality.

What to do next? Develop techniques that minimize adverse effect on algorithmic

intelligence.

Two different tracks possible1. Type 1 parallel SA2. Further Study Runtime vs Quality trends of Type 3 Parallel SA

Page 18: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

What to do next? Type 1 Parallel SA

Pros: Leaves algorithmic intelligence intact Solution quality guaranteed.

Cons: High communication frequency adversely affects runtime Environmental factors come into play.

Further explore dynamics of Type 3 Pros:

Low communication frequency Suitable for cluster environment (show chart)

Cons: Uncharted territory - progress not assured

Page 19: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type 1 Parallel SA

Low-level parallelization: Divide Cost Computation Function

Our cost function has 3 parts: Wirelength, Power, Delay First two are easy to divide

Division of Delay computation posed a significant challenge Too many replicated computations across

processors negated benefits of division. Eventually had to be excluded

Page 20: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Results of Type 1 Parallel SA Performance of Type 1 Parallel SA:

Abysmal!!

Reasons: 2 collective communications per iteration Amount of work divided is small compared to communication

time Communication delay increases with increasing processors

(Show Chart) Found to be completely unsuitable for

MIMD-DM environment, and Our problem instance.

p=2 p=3 p=4 p=5 p=6 p=70.566007 42.886 1816.4 2216.6

Time for Parallel SA Type 1Circuit Name

Number of Cells m(s) SA

Time for Serial SA

Page 21: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Further Exploring Type 3 Parallel SA To improve our achievable quality of our Type 3

parallel SA:

In depth study of the impact of parameter M on achievable solution quality.

All experiments first attempted for the Serial Version, then replicated to the parallel version

Based on what we know of Type 3 Parallelization schemes, see how any new lessons can be incorporated into it.

Page 22: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Impact of ‘M’ on Solution Quality (Serial)Serial Run Characteristics vs division factor

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50 100 150 200 250 300

time

qu

alit

y

1

9

17

25

57

Page 23: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Serial Run Characteristics vs division factor

0

0.1

0.2

0.3

0.4

0.5

0.6

0 5 10 15 20 25 30

time

qu

alit

y

9

17

25

57

Impact of ‘M’ on Solution Quality (Serial)

Page 24: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Parellel 7, vs Div Fact

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-20 0 20 40 60 80 100 120 140

Time

Qu

alit

y

1

9

17

25

57

Impact of ‘M’ on Solution Quality (Parallel 7)

Page 25: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Parellel 7, vs Div Fact

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-5 0 5 10 15 20 25

Time

Qu

alit

y

9

17

25

57

Impact of ‘M’ on Solution Quality (Parallel 7)

Page 26: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Observations We see that initially, fastest improvement in quality

appears with smallest M

However, Quality saturates earlier with smaller M.

Thus it might be beneficial to increase M as time progresses.

But by mow much? How to minimize runtime.

Page 27: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Learning From Experiments Another observation helps:

Until saturation, rate of improvement nearly constant (per metropolis calls)

This applicable for all runs for a given circuit.

Thus best way to minimize time while sustaining quality improvement:

Set value of M adaptively such that the average rate of improvement remains constant.

New Enhancement to Serial SA.

Since Type 3 parallel SA improves faster than serial version, it is expected that some speedup will be observed

Experiments still under way – parameter tuning being done

Page 28: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Preliminary results

For 7 processors in parallel

Adaptive Type 3 Scheme

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-10 0 10 20 30 40 50 60 70 80 90 100

time

qu

alit

y

serial

Parallel 7

Page 29: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Preliminary Results

Results are similar to the original implementation

Enhancement to serial SA mitigates the observed benefits of the parallel version Although further parameter tuning/code refinements may

improve parallel results even more.

Page 30: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Conclusions Drawn from Experiments Type 3 parallelization of SA is most robust

Minimum susceptibility to environment and problem instance Type 1 fails due to unsuitability to environment and to problem

instance

For Parallel SA, direct trade off exists between achievable solution quality and speedup

Depending on quality desired, linear speedup possible. For highest quality, speed-up diminishes to 1 in most cases.

Further experimentation needed to verify these points.

Page 31: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Parallel Simulated Evolution Strategies

Page 32: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Base Work for SimE

Type II (Domain Decomposition) implementation

Problem: Poor Runtimes with Quality Degradation

Improvement Proposed: Use of Random Row Allocation over the Fixed Row Allocation used in previous work

Results: Improvement of Quality over Fixed Row Allocation but still short of Serial Quality

Page 33: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II Quality Issues

Observation: Parallel Quality will always lag behind serial quality

Reason: Division of Solution into Domains restricts optimum cell movement (worsens with more processors/partitions)

Focus on improving runtime!

Page 34: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II Runtime Improvements

How? Reduce Communication time

Reduce Frequency of Communications Reduce Amount of Data Communicated Overlap Communication with Computation

Reduce Computations Can Workload be still better divided? How will workload division affect communication

requirements?

Page 35: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II SimE Algorithm Structure

Communication

Cost Computations

Page 36: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II Communication Optimization No potential for Computation Communication

Overlap Implicit Barrier synchronization at communication points

Possibility of Reducing Communication Frequency Over Multiple Iterations Do multiple operations (E,S,A) on assigned partition before

communicating Impact on Solution Quality due to accumulated errors Actual Impact on Solution Quality vs. runtime improvement

not presently quantified (future work)

Page 37: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II Communication Optimization(2) Reducing Communication Frequency

by combining gather & broadcast operation into 1 MPI call

Efficiency of collective call not too superior in MPI implementation used

Reduce Data Communicated per call Significant impact on runtime due to barrier

synchronization Essentially compressing placement data! Tradeoff: Added computations in data

preprocessing and post processing step

Page 38: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type II Computation Optimization Can workload be better divided?

Evaluation, Selection and Allocation already localized (divided)

What about cost computation? Goodness Evaluation needs costs computed

Dependencies across partitions Delay computations over long paths

Spans over partitions Wire length & Power Computation

Potential for cost computation division

Page 39: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Wirelength & Power Cost Division More independent of other partitions than

delay computations Effect of computation division can be readily

evaluated for wire length and power (within the limited time constraints)

Tradeoff: Added Computation Partial Costs Need to be communicated to Master Additional communication phase added per

iteration! Actual benefit over non-division: Results will tell!

Page 40: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Results & Observations

Effect of communication worsens speed-ups with increasing processors communication time over shadows gain of computation

division Optimizing communication resulted in greater gains

than cost computation division Dependencies do not allow much gains

Original Type II

Implementation

Communication Optimized Type II

Communication + Computation Type II

Circuit Cells Quality

P=1 P=3 P=5 P=3 P=5 P=3 P=5

s1196 561 0.644 54 sec

~44 sec

~50 sec ~34 sec

~44 sec ~30 sec ~41 sec

S1238 540 0.680 60 sec

~45 sec

~55 sec ~39 sec

~46 sec ~34 sec ~42 sec

Page 41: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Targeting Better Solution Qualities Type II parallelization has poor quality issues How to maintain quality? Type I Parallelization

Same Quality as Serial Implementation guaranteed

Speed-ups governed by gains resulting from division of cost-computation

Type III Parallelization Can we benefit from parallel co-operating

searches?

Page 42: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type I Parallelization

Target as much computation division without dividing SimE algorithm’s core “intelligence”

Computations that don’t affect intelligence Cost computations Goodness evaluations

Computations that affect intelligence Selection function Allocation function

Page 43: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type I Parallelization

Again, wire length & power easier to divide than delay (same reasoning as for Type II)

Workload division achieved by dividing the cells among PE Each PE computes costs and goodness for a

fraction of cells Division of cells done at beginning & fixed Slave PEs communicate goodness values to

master PE which does Selection & Allocation

Page 44: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type I Parallelization Results

Runtimes not drastically reduced Allocation takes significant portion of overall

computation time Speed-ups again limited by communication

Not by data communicated but due to more overheads with increased participants

SimE Type I Parallelization

Circuit

Cells

Quality

P=1 P=3 P=5 P=7

s1196 561 0.762 ~67 sec

~53 sec

~44 sec

~45 sec

s1238 540 0.799 ~72 sec

~49 sec

~35 sec

~40 sec

Page 45: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type III Parallelization

Parallel Cooperating searches communicating best solutions through a master

Modeled after Parallel SA A Metropolis loop can be equated with a SimE

compound move Intelligence of SimE can benefit from best

solution produced among all May lead to more rapid convergence to better

quality solutions

Page 46: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Type III Parallelization Results

Results quite contrary to expectations Highest quality inferior than serial algorithm No speed-ups observed Parallel Searches used as is doesn’t benefit

possible explanation: greedy behavior of parallel searches is counter productive to SimE intelligence (inability to escape local minima)

SimE Type III Parallelization

Circuit

Cells

Quality

P=1 P=3 P=5 P=7

s1196 561 0.694 ~58 sec

~57 sec

~60 sec

~61 sec

s1238 540 0.709 ~65 sec

~66 sec

~64 sec

~67 sec

Page 47: Parallelization of Stochastic Metaheuristics to Achieve Linear Speed-ups while Maintaining Quality

Improving Type III Parallelization Proposed schemes for improved SimE parallel

searches

1. varying frequency of communication with time Initially exchanges can be frequent and with frequency

decreasing to allow more diversification

2. Intelligently combining good elements in each solution to get a new starting solution

Best location for each cell (or a cluster) can be identified by examining/comparing goodness values among solutions received from peers and constructing a good solution to improve further