Speculative Parallelism in Cilk++

68
Speculative Parallelism in Cilk++ Ruben Perez & Gregory Malecha MIT May 11, 2010 Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33

Transcript of Speculative Parallelism in Cilk++

Page 1: Speculative Parallelism in Cilk++

Speculative Parallelism in Cilk++

Ruben Perez & Gregory Malecha

MIT

May 11, 2010

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 1 / 33

Page 2: Speculative Parallelism in Cilk++

Parallelizing Embarrassingly Parallel Problems

Lots of search algorithms are embarrassingly parallel.

Search down all the paths of a tree.Search multiple disjoint branches in parallel.

No communcation overhead.Very little memory contention.

Since we often only need a single answer, we can stop once we’vefound any solution.

Cannonical algorithms are in game-search, minimax and α-β pruning.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 2 / 33

Page 3: Speculative Parallelism in Cilk++

2 Player Game Trees

Nodes represent game states.

Edges represent transitionsbetween states.

Leaf states are scored by aheuristic.

Goal is to maximize theheuristic against an adversary.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 3 / 33

Page 4: Speculative Parallelism in Cilk++

Parallelizing MiniMax

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 5: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 6: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3 0

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 7: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3 0

2

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 8: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3 0

2 -1

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 9: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3 0 -1

2 -1

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 10: Speculative Parallelism in Cilk++

Parallelizing MiniMax

3

3 0 -1

2 -1

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 4 / 33

Page 11: Speculative Parallelism in Cilk++

Improving MiniMax

MiniMax exhaustively searches the game tree to a certain depth(iterative deepening).

Not efficient for most games due to large branching factor.

Can’t search deep enough for the computer to be smart.

Intuition Keep track of the range of feasible scores and prunebranches that fall outside the range.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 5 / 33

Page 12: Speculative Parallelism in Cilk++

α-β Pruning

Keep additional information with each game node: α and β.

α is the lower bound on the player’s score.

β is the upper bound on the player’s score.

This pruning allows us to search twice as deep on average.

Let’s see an example...

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 6 / 33

Page 13: Speculative Parallelism in Cilk++

α-β Pruning

Keep additional information with each game node: α and β.

α is the lower bound on the player’s score.

β is the upper bound on the player’s score.

This pruning allows us to search twice as deep on average.

Let’s see an example...

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 6 / 33

Page 14: Speculative Parallelism in Cilk++

Pruning Example

[−∞,∞]

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 15: Speculative Parallelism in Cilk++

Pruning Example

[3,∞]

3

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 16: Speculative Parallelism in Cilk++

Pruning Example

[3,∞]

3 0

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 17: Speculative Parallelism in Cilk++

Pruning Example

[3,∞]

3 0 [3,∞]

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 18: Speculative Parallelism in Cilk++

Pruning Example

[3,∞]

3 0 [3,∞]

2

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 19: Speculative Parallelism in Cilk++

Pruning Example

[3,∞]

3 0 [3, 2]

2

Prune!

max

min

max

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 7 / 33

Page 20: Speculative Parallelism in Cilk++

A Recipe for Speculation

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 8 / 33

Page 21: Speculative Parallelism in Cilk++

A Recipe for Speculation

A Recipe for Speculation

When we speculate, we don’t want to have to commit to somethingfinishing.

Need a way to abort currently running computations that we don’tneed anymore.For consistency, we’d like to be able to abort computations that arespeculating.

We also need a way to combine the results.

Reducers would work for this, but they will need to be able to callabort.Older versions of cilk had another mechanism for this.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 9 / 33

Page 22: Speculative Parallelism in Cilk++

A Recipe for Speculation

Speculation in Cilk5

Speculation is just based on spawn.

Combining results done through inlets.

inlets are functions that merge results together (similar to reducers).inlets can also make the choice to abort computations.

Let’s look at a simple example.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 10 / 33

Page 23: Speculative Parallelism in Cilk++

A Recipe for Speculation

Simple Example: Native abort & Inlets

Imagine you have two computations that should yield the same result,but one could take significantly longer to compute.

Speculatively execute both and abort when the first finishes.

i n t l o ng compu ta t i on 1 ( vo id ∗ a r g s ) ;i n t l o ng compu ta t i on 2 ( vo id ∗ a r g s ) ;

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {i n t x ;inlet void reduce(int r) {

x = r;

abort;}reduce(c i l k spawn l o ng compu ta t i on 1 ( a rg s1 )) ;reduce(c i l k spawn l o ng compu ta t i on 2 ( a rg s2 )) ;c i l k s y n c ;re tu rn x ;

}

No support for abort or inlet in cilk++.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 11 / 33

Page 24: Speculative Parallelism in Cilk++

A Recipe for Speculation

Simple Example: Native abort & Inlets

Imagine you have two computations that should yield the same result,but one could take significantly longer to compute.

Speculatively execute both and abort when the first finishes.

i n t l o ng compu ta t i on 1 ( vo id ∗ a r g s ) ;i n t l o ng compu ta t i on 2 ( vo id ∗ a r g s ) ;

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {i n t x ;inlet void reduce(int r) {

x = r;

abort;}reduce(c i l k spawn l o ng compu ta t i on 1 ( a rg s1 )) ;reduce(c i l k spawn l o ng compu ta t i on 2 ( a rg s2 )) ;c i l k s y n c ;re tu rn x ;

}

No support for abort or inlet in cilk++.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 11 / 33

Page 25: Speculative Parallelism in Cilk++

A Recipe for Speculation

Simple Example: Native abort & Inlets

Imagine you have two computations that should yield the same result,but one could take significantly longer to compute.

Speculatively execute both and abort when the first finishes.

i n t l o ng compu ta t i on 1 ( vo id ∗ a r g s ) ;i n t l o ng compu ta t i on 2 ( vo id ∗ a r g s ) ;

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {i n t x ;inlet void reduce(int r) {

x = r;

abort;}reduce(c i l k spawn l o ng compu ta t i on 1 ( a rg s1 )) ;reduce(c i l k spawn l o ng compu ta t i on 2 ( a rg s2 )) ;c i l k s y n c ;re tu rn x ;

}

No support for abort or inlet in cilk++.Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 11 / 33

Page 26: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 12 / 33

Page 27: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Handling inlets

Inlets are local functions that merge the result of spawnedcomputations.

Serve a similar purpose to reducers, but a little more general.Get access to the parent function’s stack frame.

Semanticsinlets are locally serial, i.e. all of the inlets for a particular stackframe will run serially.The order of executing them is non-deterministic, implementationbased on the amount of time that the computations take.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 13 / 33

Page 28: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Handling inlets

i n t l o ng compu ta t i on 1 ( vo id ∗ a r g s ) ;i n t l o ng compu ta t i on 2 ( vo id ∗ a r g s ) ;

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {i n t x ;i n l e t vo id r educe ( i n t r ) {

x = r ;abort ;

}r educe ( c i l k spawn l o ng compu ta t i on 1 ( a rg s1 ) ) ;r educe ( c i l k spawn l o ng compu ta t i on 2 ( a rg s2 ) ) ;c i l k s y n c ;re tu rn x ;

}

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 14 / 33

Page 29: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Handling inlets – via Translation

Translate inlets into continuations.

Not a particularly painful translation.It can be annoying to work with continuations in C code.Ideally this would be a compilation step.

Mostly mechanical translation preserves semantics

Depending on how they are used, it might be more efficient to usereducers.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 15 / 33

Page 30: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

cilk5 Code

i n t f 1 ( vo id ∗ a r g s ) ;i n t f 2 ( vo id ∗ a r g s ) ;

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {i n t x ;i n l e t vo id r educe ( i n t r ) {

x = r ;abort ;

}r educe ( c i l k spawn f 1 ( a rg s1 ) ) ;r educe ( c i l k spawn f 2 ( a rg s2 ) ) ;c i l k s y n c ;re tu rn x ;

}

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 16 / 33

Page 31: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Translated cilk++ Code

s t r u c t I n l e t E n v { c i l k : : mutex m; i n t x ; } ;

i n t f 1 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;i n t f 2 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;

i n t f i r s t i n l e t ( i n t r e s u l t , I n l e t E n v ∗ env ) {env−>m. l o c k ( ) ; // S e r i a l e x e c u t i o nenv−>x = r e s u l t ;abort ;env−>m. un lock ( ) ; // S e r i a l e x e c u t i o n

}

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {InletEnv env ;c i l k spawn f 1 ( a rg s1, first inlet, env ) ;c i l k spawn f 2 ( a rg s2, first inlet, env ) ;c i l k s y n c ;re tu rn env.x ;

}

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 17 / 33

Page 32: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Translated cilk++ Code

s t r u c t I n l e t E n v { c i l k : : mutex m; i n t x ; } ;

i n t f 1 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;i n t f 2 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;

i n t f i r s t i n l e t ( i n t r e s u l t , I n l e t E n v ∗ env ) {env−>m. l o c k ( ) ; // S e r i a l e x e c u t i o nenv−>x = r e s u l t ;abort ;env−>m. un lock ( ) ; // S e r i a l e x e c u t i o n

}

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {InletEnv env ;c i l k spawn f 1 ( a rg s1, first inlet, env ) ;c i l k spawn f 2 ( a rg s2, first inlet, env ) ;c i l k s y n c ;re tu rn env.x ;

}

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 17 / 33

Page 33: Speculative Parallelism in Cilk++

A Recipe for Speculation Porting the Example

Translated cilk++ Code

s t r u c t I n l e t E n v { c i l k : : mutex m; i n t x ; } ;

i n t f 1 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;i n t f 2 ( vo id ∗ a r g s, int(*cont)(int, InletEnv*), InletEnv* env ) ;

i n t f i r s t i n l e t ( i n t r e s u l t , I n l e t E n v ∗ env ) {env−>m. l o c k ( ) ; // S e r i a l e x e c u t i o nenv−>x = r e s u l t ;abort ; Still need to handle abort

env−>m. un lock ( ) ; // S e r i a l e x e c u t i o n}

i n t f i r s t ( vo id ∗ args1 , vo id ∗ a rg s2 ) {InletEnv env ;c i l k spawn f 1 ( a rg s1, first inlet, env ) ;c i l k spawn f 2 ( a rg s2, first inlet, env ) ;c i l k s y n c ;re tu rn env.x ;

}

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 17 / 33

Page 34: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 18 / 33

Page 35: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

abort Features

We saw a notion of global abort in Lab 5.

When you abort, you abort everything, completely done.

This is not compositional.

Compositionality

Compositionality requires the ability to abort speculating computations.Need an abort hierarchy.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 19 / 33

Page 36: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

Hierarchical abort

Abort any subtree of the computation without affecting the rest ofthe computations.

0

0 0

0 0

0 0

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 20 / 33

Page 37: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

Hierarchical abort

Abort any subtree of the computation without affecting the rest ofthe computations.

0

X 0

0 0

0 0

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 20 / 33

Page 38: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

Hierarchical abort

Abort any subtree of the computation without affecting the rest ofthe computations.

0

X 0

X X

X X

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 20 / 33

Page 39: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 40: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 41: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 42: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 43: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 44: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?Yes

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 45: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 46: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 47: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 48: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X 0

0 0Should I abort?

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 49: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X 0

X 0Should I abort?Yes

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 50: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 51: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

0 0

0 0

Abort!

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 52: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X 0

0 0

Abort!

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 53: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X 0

X 0

Abort!

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 54: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X 0

X X

Abort!

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 55: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

User-Implemented abort

Implement using polling...

Two possible implementations:1 Poll up toward the root.

Also, lazily copy values downward.

2 Abort down, push the abort flag down the tree

0

X 0

X X

X X

Abort!

0 0

0 0

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 21 / 33

Page 56: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

High-level Performace

Poll-UpPolling is expensive (linear in depth of node)abort is cheap (constant time, single memory access).

Really simple implementation.

Abort-DownPolling is cheap (constant time, single memory access).abort is expensive (linear in the size of the subtree).

Code is much more complex.Some extra overhead, currently using a locking implementation.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 22 / 33

Page 57: Speculative Parallelism in Cilk++

A Recipe for Speculation Implementing abort

High-level Performace

Poll-UpPolling is expensive (linear in depth of node)abort is cheap (constant time, single memory access).Really simple implementation.

Abort-DownPolling is cheap (constant time, single memory access).abort is expensive (linear in the size of the subtree).Code is much more complex.Some extra overhead, currently using a locking implementation.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 22 / 33

Page 58: Speculative Parallelism in Cilk++

Evaluation

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 23 / 33

Page 59: Speculative Parallelism in Cilk++

Evaluation Performance

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 24 / 33

Page 60: Speculative Parallelism in Cilk++

Evaluation Performance

Different Flavors of Abort

Compare the different abort techniques.Build a tree of height 14 and branching factor of 4.Poll at each leaf 100 times.Abort the root.Poll at each leaf.

Page 1

Poll Up Poll Up (Caching) Poll Down

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Distribution of Time

Tree Height 14, Branching Factor 4

Aborted Poll(x1)AbortPoll (x100)Build

Abort Method

Pro

cess

or

Tic

ks (

x10

00

00

0)

Note that abort is serial.Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 25 / 33

Page 61: Speculative Parallelism in Cilk++

Evaluation Performance

Different Flavors of Abort

Compare the different abort techniques.Build a tree of height 20 and branching factor of 2.Poll at each leaf 100 times.Abort the root.Poll at each leaf.

Page 1

Poll Up Poll Up (Caching) Poll Down

0

1000

2000

3000

4000

5000

6000

Distribution of Time

Tree Height 20, Branching Factor 2

Aborted Poll(x1)AbortPoll (x100)Build

Abort Method

Pro

cess

or

Clo

ck T

icks

(x1

00

00

00

)

Note that abort is serial.Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 26 / 33

Page 62: Speculative Parallelism in Cilk++

Evaluation Performance

Polling Granularity Effects – Nodes Explored

We’re running our ported pousse code (with some instrumentation).

Granularity N doesn’t poll at the lowest N levels of the tree.

Base only checks abort in the loop that spawns the childcomputations.

Page 1

Granularity 3 Granularity 4 Granularity 5 Base

54000

56000

58000

60000

62000

64000

66000

Total Number of Nodes Explored

Polling Granularity

# o

f No

de

s (x

10

00

)

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 27 / 33

Page 63: Speculative Parallelism in Cilk++

Evaluation Performance

Polling Granularity Effects – # of Polls

We’re running our ported pousse code (with some instrumentation).

Granularity N doesn’t poll at the lowest N levels of the tree.

Base only checks abort in the loop that spawns the childcomputations.

Page 1

Granularity 3 Granularity 4 Granularity 5 Base

232000

234000

236000

238000

240000

242000

244000

246000

248000

Number of Polls

Polling Strategy

# o

f Po

lls (

x10

00

)

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 28 / 33

Page 64: Speculative Parallelism in Cilk++

Evaluation Performance

Polling Granularity Effects – # of Aborts

We’re running our ported pousse code (with some instrumentation).

Granularity N doesn’t poll at the lowest N levels of the tree.

Base only checks abort in the loop that spawns the childcomputations.

Page 1

Granularity 3 Granularity 4 Granularity 5 Base

0

1000

2000

3000

4000

5000

6000

Number of Aborts

Polling Strategy

# o

f Ab

ort

s (x

10

00

)

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 29 / 33

Page 65: Speculative Parallelism in Cilk++

Evaluation Complexity Porting Old Code

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 30 / 33

Page 66: Speculative Parallelism in Cilk++

Evaluation Complexity Porting Old Code

Experience Porting Cilk Code

We ported Cilk Pousse 1 from cilk to cilk++.

LinesCilk Pousse 956Cilk++ Pousse 1011

Increase ≈5.07%

Pretty simple with the inlet translation.Most annoying part is adding the calls to poll.

This can use some tuning.

Code is a little more difficult to read, but not too bad.

The real problem is that this changes the interface.Need to pass around the abort object.If it is used with an inlet, need to add continuation.Whole-code transformation is bad.

1people.csail.mit.edu/pousse/Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 31 / 33

Page 67: Speculative Parallelism in Cilk++

Conclusions

Outline

1 A Recipe for SpeculationPorting the ExampleImplementing abort

2 EvaluationPerformanceComplexity Porting Old Code

3 Conclusions

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 32 / 33

Page 68: Speculative Parallelism in Cilk++

Conclusions

Speculations on Speculative Parallelism

Speculative parallelism is definitely implementable as a library.

Native runtime support might be more efficient/nicer.

Lack of some abstraction features makes using it at the high-levelrequire interface changes.

...but, using it locally at the leaves does not leak into the rest of thecode.

Ruben Perez & Gregory Malecha (MIT) Speculative Parallelism in Cilk++ May 11, 2010 33 / 33