Towards Adaptive Caching for Parallel and Distributed Simulation

Maria Hybinette, UGA 1

Towards Adaptive Caching for

Parallel and Distributed Simulation

Abhishek Chugh & Maria Hybinette

Computer Science Department

The University of Georgia

WSC-2004


Airspace

Atlanta Munich

Simulation Model Assumptions

Collection of Logical Processes (LPs) Assume LPs do not share state variables Communicate by exchanging time stamped

messages

LP

LPLP

LP

LP


Problem & Goal

Problem:Inefficiency in PDES: Redundant computations

Observation:Computations repeat: » Long run of simulations» Cyclic Systems» Communication network simulations

Goal:Increase efficiency by reusing computations


LPLPLPLPMsgMsgMsg

Cache

Approach

Cache computations and re-use when they repeat instead of re-compute.

Msg Msg Msg

Msg

Msg MsgMsg LP

Msg LP

Msg

Msg

LP

LP

Msg


Approach: Adaptive Caching


Generic caching mechanism independent of simulation engine and application

Caveat: Different factors that impact the effectiveness of caching

» Proposal: An adaptive approach

Msg LP

Msg LP

Msg

Msg

LP

LP

Cache


Factors Affecting Caching Effectiveness

Cache size Cost of looking up into the cache and

updating cache Execution time of the computation Probability of a hit: Hit rate


Effective Caching Cost

E(Costuse_cache) =

hit_rate * Costlookup_hit

+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)


Caching is Not Always a Good Idea

E(Costuse_cache) =

hit_rate * Costlookup_hit

+ (1 - hit_rate) * (Costlookup_miss + Costcomputation+ Costinsert)

Hit rate low, or Very fast computation Only when Costuse_cache < Costcomputation is caching

worthwhile


How Much Speedup is Possible?

Neglecting cache warm up and fixed costs

Expected Speedup = Costcomputation / Costuse_cache

Upper bound (hit_rate = 1)

= Costcomputation / Costlookup

In our experiments Costcomputation / Costlookup = ~3.5


Related Work

Function Caching: Replace application level function calls with cache queries:

» Introduced by: Bellman (1957); Michie (1968)» Incremental computations:

– Pugh & Teitelbaum (1989), Liu & Teitelbaum (1995)» Sequential discrete event simulation:

– Staged Simulation: Walsh & Sirer (2003) function caching + currying (break up computations), re-ordering and pre-computations),

Decision Tool Techniques for PADS: Multiple runs of similar simulations

» Simulation Cloning: Hybinette & Fujimoto (1998); Chen & Turner, et al (2002); Straburger (2000)

» Updateable Simulations (Ferenci et al 2002) Related Optimization Techniques

» Lazy Re-Evaluation: West (1988)


Overview of Adaptive Caching

Execution time:

1. Warm-up execution phase, for each function:a) Monitor: hit rate, query time, function run time

b) Determine utility of using cache

2. Main execution phase, for each function:a) Use cache (or not) depending on results from 1

b) Randomly sample: hit rate, query time, function run time» Revise decision if conditions change


What’s New

Decision to use cache is made dynamically » in response to unpredictable local conditions for each LP at

execution time

Relieves user of having to know whether something is worth caching

» adaptive method will automatically identify caching opportunities, reject poor caching choices

Easy to use caching API » independent of application or simulation kernel

» cache middleware

Distributed cache» Each LP maintains own independent cache


Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

cacheInitialize(int argc, char** argv);

}


Pseudo-Code Example

// ORIGINAL LP CODE

LP_init(){cacheInitialize(int argc, char** argv);

}

Proc(state, msg, MyPE){retval = cacheCheckStart( currentstate, event );if( retval == NULL )

{/* original LP code. compute new state and events to be scheduled */

/* allow cache to save results */cacheCheckEnd( newstate, newevents ) ;}

else{newstate = retval.state;newevents = retval.events;}

schedule( newevents );

}


Implementation


Caching Middleware

Simulation Application

Cache Middleware

Simulation Kernel


Caching Middleware (Hit)


Cache Middleware

Simulation Kernel

Check cache state/message Cache Hit


Caching Middleware (Miss)


Cache Middleware

Simulation Kernel

Check cache state/message

Miss or cache lookup expensive

Miss: Cache new state & message

Cache Miss


Cache Implementation

Hash table and separate chaining Input: Current State & Message Output: State and output message(s) Hash function (djb2 by Dan Bernstein, Perl)


Memory Management

Distributed cache; one for each LP Pre-allocate memory pool for cache in each

LP during initialization phase Upper limit parameterized


Experiments

3 Sets of Experiments with P-Hold» Proof of concept (no adaptive caching) hit-rate» Evaluation of impact of cache size and simulation

running time on speedup (no caching/caching)» Evaluation of adaptive caching with regard to the cost of

event computation 16 processor SGI Origin 2000

» 4 processors

“Curried” out time stamps


0

10

20

30

40

50

60

70

80

90

100

0 20000 40000 60000 80000 100000 120000 140000 160000 180000

Progress (Simulated Time)

Hit

Rate

(Perc

en

tag

e %

)

90 KB (10%)

25000 KB (25%)

10000 KB (100%)

Hit Rate versus Progress

As expected hit ratio increases as cache size increases Maximum hit rate for large cache Hit rates sets an upper bound for speedup


Speedup vs Cache Size

0

0.5

1

1.5

2

2.5

3

3.5

0 2000 4000 6000 8000 10000

Size of Cache (KB)

Spe

edu

p (

No C

achin

g/C

ach

ing

)

5 msec3 msec

Speedup improves as size of the cache increases Beyond size 9,000KB speedup declines and levels off Better performance for simulations with computations

that have higher latency


Speedup vs Costcomputation

Non-adaptive caching suffers a speedup of 0.82 for low latency computations and improves to 1 when the computational latency approaches 1.5 msec

0.8

0.85

0.9

0.95

1

1.05

1.1

0 0.5 1 1.5 2 2.5 3

Computational Latency (msec)

Speedup (

Cach

ing/N

o C

ach

ing)

Non-Adaptive


Speedup vs Costcomputation

Adaptive Caching, tracks the cost of consulting the cast in comparison of running the actual computation

Adaptive caching is 1 for small computational latencies (selects performing computation instead of consulting cache)

0.8

0.85

0.9

0.95

1

1.05

1.1

0 0.5 1 1.5 2 2.5 3

Computational Latency (msec)

Speedup (

Cach

ing/N

o C

ach

ing)

Non-Adaptive

Adaptive


Summary & Future Work

Summary: Middleware implementation that require no major

structural revision of application code Best case speedup approaches 3.5 worst case speedup

of 1 (speedup is limited to a hit rate of 70%) Random generated information (such as time stamps or

other) caching may become ineffective unless taking pre-cautions

Future Work: Function caching instead of LP caching Look at series of functions to jump forward Adaptive replacement strategies


Closing

“A sword wielded poorly will kill it’s owner”

-- Ancient Proverb


Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}

Proc(state, msg, MyPE)

{

val1 =

fancy_function(msg->param1,

state->key_part);

val2 =

fancier_function(msg->param3);

state->key_part = val1 + val2;

}


Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}


{

val1 =


state->key_part);

val2 =



}


Pseudo-Code Example

// ORIGINAL LP CODE

LP_init()

{

//

//

//

//

}


{

val1 =


state->key_part);

val2 =



}

// LP CODE WITH CACHING

LP_init()

{

cache_init(FF1, SIZE1, 2,

fancy_function);

cache_init(FF2, SIZE2, 1,

fancier_function);

}


{

val1 =

cache_query(FF1, msg->param1,

state->key_part);

val2 =

cache_query(FF2, msg->param3);

State->key_part = val1 + val2;

}


Approach


LP

LPLP

LPLP

LPLP

LPLP

LP

Towards Adaptive Caching for Parallel and Distributed Simulation

Documents

Transcript of Towards Adaptive Caching for Parallel and Distributed Simulation