Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant,...

61
Functional programming and parallelism Conal Elliott May 2016

Transcript of Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant,...

Page 1: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Functional programming and parallelism

Conal Elliott

May 2016

Page 2: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

What makes a language good for parallelism?

. . .

Conal Elliott Functional programming and parallelism May 2016 2 / 61

Page 3: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

What makes a language bad for parallelism?

Sequential bias

Primitive: assignment (state change)

Composition: sequential execution

“Von Neumann” languages (Fortran, C, Java, Python, . . . )

Over-linearizes algorithms.

Hard to isolate accidental sequentiality.

Conal Elliott Functional programming and parallelism May 2016 3 / 61

Page 4: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Can we fix sequential languages?

Throw in parallel composition.

Oops:

Nondeterminism

Deadlock

Intractable reasoning

Conal Elliott Functional programming and parallelism May 2016 4 / 61

Page 5: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Can we un-break sequential languages?

Perfection is achieved not when there is nothing left to add,but when there is nothing left to take away.

Antoine de Saint-Exupery

Conal Elliott Functional programming and parallelism May 2016 5 / 61

Page 6: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Applications perform zillions of simple computations.

Compute all at once?

Oops — dependencies.

Minimize dependencies!

Conal Elliott Functional programming and parallelism May 2016 6 / 61

Page 7: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Dependencies

Three sources:

1 Problem

2 Algorithm

3 Language

Goals: eliminate #3, and reduce #2.

Conal Elliott Functional programming and parallelism May 2016 7 / 61

Page 8: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Dependency in sequential languages

Built into sequencing: A ;B

Semantics: B begins where A ends.

Why sequence?

Conal Elliott Functional programming and parallelism May 2016 8 / 61

Page 9: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Idea: remove all state

And, with it,

mutation (assignment),

sequencing,

statements.

Expression dependencies are specific & explicit.

Remainder can be parallel.

Contrast: “A ;B” vs “A+B” vs “(A+B)× C”.

Conal Elliott Functional programming and parallelism May 2016 9 / 61

Page 10: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Programming without state

Programming is calculation/math:

Precise & tractable reasoning (algebra),

. . . including optimization/transformation.

No loss of expressiveness!

“Functional programming” (value-oriented)

Like arithmetic on big values

Conal Elliott Functional programming and parallelism May 2016 10 / 61

Page 11: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential sum

C:

int sum(int arr[], int n) {

int acc = 0;

for (int i=0; i<n; i++)

acc += arr[i];

return acc;

}

Haskell:

sum = sumAcc 0where

sumAcc acc [ ] = accsumAcc acc (a : as) = sumAcc (acc + a) as

Conal Elliott Functional programming and parallelism May 2016 11 / 61

Page 12: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Refactoring

sum = foldl (+) 0

where

foldl op acc [ ] = accfoldl op acc (a : as) = foldl op (acc ‘op‘ a) as

Conal Elliott Functional programming and parallelism May 2016 12 / 61

Page 13: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Right alternative

sum = foldr (+) 0

where

foldr op e [ ] = efoldr op e (a : as) = a ‘op‘ foldr op e as

Conal Elliott Functional programming and parallelism May 2016 13 / 61

Page 14: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential sum — left

+

+

32

+

32

+

32

+

32

+

32

Out32

0 32

In

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 14 / 61

Page 15: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential sum — right

+ Out32

+ 32

+

32

+

32

+

32

+

32

0 32

In

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 15 / 61

Page 16: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Parallel sum — how?

Left-associated sum:

sum [a, b, ..., z ] ≡ (...((0 + a) + b)...) + z

How to parallelize?

Divide and conquer?

Conal Elliott Functional programming and parallelism May 2016 16 / 61

Page 17: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced data

data Tree a = L a | B (Tree a) (Tree a)

Sequential:

sum = sumAcc 0where

sumAcc acc (L a) = acc + asumAcc acc (B s t) = sumAcc (sumAcc acc s) t

Again, sum = foldl (+) 0.

Parallel:

sum (L a) = asum (B s t) = sum s + sum t

Equivalent? Why?Conal Elliott Functional programming and parallelism May 2016 17 / 61

Page 18: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced tree sum — depth 4

+

+

32

+ 32

+

+

32

+ 32

+

+

32

+ 32

+

+ 32

+

32

+

32

+

3232

Out32

+ 32

32

32

In

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 18 / 61

Page 19: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced computation

Generalize beyond +, 0.

When valid?

Conal Elliott Functional programming and parallelism May 2016 19 / 61

Page 20: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Associative folds

Monoid: type with associative operator & identity.

fold :: Monoid a ⇒ [a ]→ a

Not just lists:

fold :: (Foldable f ,Monoid a)⇒ f a → a

Balanced data structures lead to balanced parallelism.

Conal Elliott Functional programming and parallelism May 2016 20 / 61

Page 21: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Two associative folds

fold :: Monoid a ⇒ [a ]→ afold [ ] = ∅fold (a : as) = a ⊕ fold as

fold :: Monoid a ⇒ Tree a → afold (L a) = afold (B s t) = fold s ⊕ fold t

Derivable automatically from types.

Conal Elliott Functional programming and parallelism May 2016 21 / 61

Page 22: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Trickier algorithm: prefix sums

C:

int prefixSums(int arr[], int n) {

int sum = 0;

for (int i=0; i<n; i++) {

int next = arr[i];

arr[i] = sum;

sum += next;

}

return sum;

}

Haskell:

prefixSums = scanl (+) 0

Conal Elliott Functional programming and parallelism May 2016 22 / 61

Page 23: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequence prefix sum

+

+

32

Out

32

+

32

32

+

32

32

+

32

32

+

32

32

32

0 32

32

In

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 23 / 61

Page 24: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential prefix sums on trees

prefixSums = scanl (+) 0

scanl op acc (L a) = (L acc, acc ‘op‘ a)scanl op acc (B u v) = (B u ′ v ′, vTot)where

(u ′, uTot) = scanl op acc u(v ′, vTot) = scanl op uTot v

Conal Elliott Functional programming and parallelism May 2016 24 / 61

Page 25: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential prefix sums on trees — depth 2

+

+

32

Out

32

+

32

32

+

32

32

32

0 32

32

In

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 25 / 61

Page 26: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential prefix sums on trees — depth 3

+

+

32

Out

32

+

32

32

+

32

32

+

32

32

+

32

32

+

32

32

+

32

32

32

0 32

32

In

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 26 / 61

Page 27: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Sequential prefix sums on trees

prefixSums = scanl (+) 0

scanl op acc (L a) = (L acc, acc ‘op‘ a)scanl op acc (B u v) = (B u ′ v ′, vTot)where

(u ′, uTot) = scanl op acc u(v ′, vTot) = scanl op uTot v

Still very sequential.

Does associativity help as with fold?

Conal Elliott Functional programming and parallelism May 2016 27 / 61

Page 28: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Parallel prefix sums on trees

On trees:

scan (L a) = (L ∅, a)scan (B u v) = (B u ′ (fmap adjust v ′), adjust vTot)

where(u ′, uTot) = scan u(v ′, vTot) = scan vadjust x = uTot ⊕ x

If balanced, dependency depth O(log n), work O(n log n).

Can reduce work to O(n). (Understanding efficient parallel scan).

Generalizes from trees.

Automatic from type.

Conal Elliott Functional programming and parallelism May 2016 28 / 61

Page 29: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 2

Out

+ 32

+

32

+

32

+ 32

32

32

+ 32

0 32

0 32

+

32

+ 32

0

320 32

In

32

32

32

32

3232

Conal Elliott Functional programming and parallelism May 2016 29 / 61

Page 30: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 3

Out

+

32

+

32

+

32

+

32

+

32

+

32

32

32

32

32

+

32

+

32

+

32

32

32

+

32

0 32

0 32

+ 32

+ 32

0 32

0 32

+ 32

+ 32

+

32

+ 32

32

32

+

32

0 32

0 32

+

32

+

32

0

32

0 32

In

32

32

32

32

32

32

32

32

32

323232

Conal Elliott Functional programming and parallelism May 2016 30 / 61

Page 31: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 4

Out

+

32

+ 32

+ 32

+

32

+

32

+

32

+

32

+

32

+

32

+

32

32

32

32

32

32

32

32

32

+

32

+

32

+

32

+

32

+

32

32

32

32

32

+

32

+

32

+ 32

32

32

+

32

0 32

0 32

+ 32

+

32

0

32

0 32

+

32

+ 32

+ 32

+

32

32

32

+

32

0 32

0 32

+ 32

+ 32

0 32

0

32

+

32

+ 32

+

32

+

32

+ 32

+

32

32

32

32

32

+

32

+

32

+ 32

32

32

+

32

0 32

0 32

+

32

+

32

0

32

0 32

+

32

+

32

+

32

+ 32

32

32

+

32

0

32

0 32

+

32

+

32

0 32

0 32

In

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 31 / 61

Page 32: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 2, optimized

+

+ 32

+

32

Out

32

+

32

32

32

0 32

In

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 32 / 61

Page 33: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 3, optimized

+

+ 32

+

32

Out

32

+ 32

32

+

+

32

+

32

+

32

+ 32

+ 32

+ 32

32

32

+

32

32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 33 / 61

Page 34: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Balanced parallel prefix sums — depth 4, optimized

+

+ 32

+

32

Out

32

+ 32

32

+

+ 32

+ 32

+

32

+

32

+ 32

+ 32

+

+

32

+

32

+

32

+

32

+ 32

+ 32

+

+

32

+

32

+

32

+

+

32

+

32

+

32

32

32

+

32

32

32

32

32

32

+ 32

32

+

32

32

+

32

+

32

32

32

32

32

32

32

32

32

32

32

32

32

32

+ 32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 34 / 61

Page 35: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Why functional programming?

Parallelism

Correctness

Productivity

Conal Elliott Functional programming and parallelism May 2016 35 / 61

Page 36: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

R&D agenda: elegant, massively parallel FP

Algorithm design:

Functional & richly typed

Parallel-friendly

Easily composable

Compiling for highly parallel execution:

Convert to algebraic vocabulary (CCC).

Interpret vocabulary as “circuits” (FPGA, silicon, GPU).

Other interpretations.

Conal Elliott Functional programming and parallelism May 2016 36 / 61

Page 37: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Composable data structures

Data structure tinker toys:

data Empty a = Empty

data Id a = Id a

data (f + g) a = L (f a) | R (g a)

data (f × g) a = Prod (f a) (g a)

data (g ◦ f ) a = O (g (f a))

Specify algorithm version for each.

Automatic, type-directed composition.

Conal Elliott Functional programming and parallelism May 2016 37 / 61

Page 38: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Vectors

n times︷ ︸︸ ︷Id × · · · × Id

Right-associated:

type family RVec n whereRVec Z = EmptyRVec (S n) = Id × RVec n

Left-associated:

type family LVec n whereLVec Z = EmptyLVec (S n) = LVec n × Id

Conal Elliott Functional programming and parallelism May 2016 38 / 61

Page 39: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Perfect binary leaf trees

n times︷ ︸︸ ︷Pair ◦ · · · ◦ Pair

Right-associated:

type family RBin n whereRBin Z = IdRBin (S n) = Pair ◦ RBin n

Left-associated:

type family LBin n whereLBin Z = IdLBin (S n) = LBin n ◦ Pair

Uniform pairs:

type Pair = Id × Id

Conal Elliott Functional programming and parallelism May 2016 39 / 61

Page 40: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Generalized treesn times︷ ︸︸ ︷

h ◦ · · · ◦ h

Right-associated:

type family RPow h n whereRPow h Z = IdRPow h (S n) = h ◦ RPow h n

Left-associated:

type family LPow h n whereLPow h Z = IdLPow h (S n) = LPow h n ◦ h

Binary:

type RBin n = RPow Pair ntype LBin n = LPow Pair n

Conal Elliott Functional programming and parallelism May 2016 40 / 61

Page 41: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Composing scans

class LScan f wherelscan :: Monoid a ⇒ f a → (f × Id) a

pattern And1 fa a = Prod fa (Id a)

instance LScan Empty wherelscan fa = And1 fa ∅

instance LScan Id wherelscan (Id a) = And1 (Id ∅) a

instance (LScan f ,LScan g)⇒ LScan (f × g) wherelscan (Prod fa ga) = And1 (Prod fa ′ ga ′) gx

whereAnd1 fa ′ fx = lscan faAnd1 ga ′ gx = adjust fx (lscan ga)

Conal Elliott Functional programming and parallelism May 2016 41 / 61

Page 42: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Composing scans

instance (LScan g ,LScan f ,Zip g)⇒ LScan (g ◦ f ) wherelscan (O gfa) = And1 (O (zipWith adjust tots ′ gfa ′)) tot

where(gfa ′, tots) = unzipAnd1 (fmap lscan gfa)And1 tots ′ tot = lscan tots

adjust :: (Monoid a,Functor t)⇒ a → t a → t aadjust a t = fmap (a⊕) t

Conal Elliott Functional programming and parallelism May 2016 42 / 61

Page 43: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Scan — RPow Pair N5

+

+

32

+

32

Out

32

+ 32

32

+

+

32

+

32

+

32

+

32

+ 32

+

32

+

+

32

+ 32

+ 32

+ 32

+

32

+ 32

+

+

32

+

32

+

32

+

+

32

+ 32

+

32

+

+

32

+

32

+

32

+

32

+

32

+

32

+

+ 32

+

32

+ 32

+

+

32

+ 32

+

32

+

+

32

+

32

+

32

+

+

32

+ 32

+

32

+

+

32

+

32

+

32

+

+

32

+ 32

+

32

32

32

+

32

32

32

32

32

32

+

32

32

+

32

32

+

32

+

32

32

32

32

32

32

32

32

32

+

32

32

+

32

32

+

32

+

32

+ 32

32

+

32

+

32

+

32

+

32

+ 32

+ 32

32

32

32

32

32

32

32

+

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

+

32

32

32

32

32

32

+

32

32

+ 32

32

+

32

+ 32

32

32

32

32

32

32

32

32

32

32

32

32

+ 32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 43 / 61

Page 44: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Scan — LPow Pair N5

+

+

32

+

32

Out

32

+ 32 32

+

+

32

+ 32

+

32

+ 32

+

32

+

+

32

+ 32

+ 32

+ 32

+ 32

+

+

32

+ 32

+ 32

+

32

+ 32

+

+

32

+

32

+

32

+ 32

+

32

+

+ 32

+

32

+

32

+

32

+ 32

+

+

32

+ 32

+

32

+ 32

+ 32

+

+

32

+

32

+

32

+

32

+

32

32

32

+

32

32

3232

32

+

32

+ 32

32

32

32

32

32

+ 32

+

32

32

32

32

32

32

+

32

+

32

32

32

32

3232

32

32

32

+

32

32

32

32

32

32

+ 32

+

32

32

32

32

32

32

32

32

32

32

32

32

+

32

32

32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

32

32

32 32

32

32

32

32

32

32

32

32

3232

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 44 / 61

Page 45: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Scan — RPow (LVec N3 ) N3

+

+

32

Out

32

+ 32

+

32

+

32

32

+

+ 32

32

32

32

+

+

32

+

32

+

32

+

32

+

+

32

+ 32

+ 32

+ 32

+

32

+

32

+

32

+

+ 32

32

+

32

32

+

+

32

+

32

+

+

32

+ 32

+

+

32

+

32

+

32

+

32

+ 32

+

32

+

32

+

+ 32

32

+ 32

32

+

+ 32

+

32

+

+

32

+

32

32

32

32

32

32

32

32

32

32

+

32

+

32

32

+ 32

32

+

32

32

32

32

32

32

32

32

+ 32

+ 32

32

+

32

32

+

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 45 / 61

Page 46: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Scan — RPow (LPow Pair N2 ) N3

+

+ 32

+

32

Out

32

+ 32

32

+

+

32

+

32

+

32

+

32

+ 32

+

32

+

+

32

+ 32

+

32

+

32

+

32

+

32

+

+ 32

+ 32

+

32

+

32

+

32

+ 32

+

+

32

+

32

+

32

+

32

+ 32

+

32

+

+

32

+ 32

+ 32

+

+ 32

+

32

+ 32

+

+ 32

+

32

+

32

+

+ 32

+ 32

+

32

+

+

32

+

32

+ 32

+

+

32

+ 32

+ 32

+

+

32

+

32

+

32

+

32

+ 32

+

32

+

+ 32

+ 32

+ 32

+

+

32

+ 32

+

32

+

+

32

+

32

+

32

+

+

32

+ 32

+

32

+

+ 32

+

32

+

32

+

+

32

+

32

+

32

+

+ 32

+

32

+

32

+

32

+

32

+

32

+

+

32

+ 32

+

32

+

+

32

+ 32

+ 32

+

+

32

+

32

+ 32

+

+

32

+ 32

+ 32

+

+

32

+

32

+

32

+

+

32

+

32

+

32

32

32

+

32

32

32

32

32

32

+ 32

32

+

32

32

32

32

+ 32

32

32

32

32

32

32

32

32

32

3232

32

32

+

32

32

+

32

+

32

+

32

32

+

32

+

32

32

+

32

+

32

32

+

32

+

32

32

32

32

32

32

32

32

+

32

32

+

32

+

32

+

32

32

+

32

+

32

32

+

32

+

32

32

+ 32

+

32

+

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

+

32

32

32

32

32

32

+

32

32

+

32

32

32

+

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

+

32

32

32

32

32

32

32

+

32

32

+

32

32

32

32

32

32

+

32

32

32

32

32

32

32

32

32

32

32

32

+

32

32

32

32

32

32

+

32

32

+

32

+

32

32

+

32

+

32

32

+

32

+

32

32

+

32

+

32

32

32

32

32

32

32

32

32

32

32

32

32

32

+ 32

32

32

32

32

32

32

+

32

32

+

32

32

32

32

32

32

+

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

0 32

In

32

32

32

32

32

32

32 32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

3232

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 46 / 61

Page 47: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Polynomial evaluation

a0 · x0 + · · ·+ an · xn

evalPoly coeffs x = coeffs · powers x

powers = lproducts ◦ pure

lproducts = underF Product lscan

Conal Elliott Functional programming and parallelism May 2016 47 / 61

Page 48: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Powers — RBin N4

1

Out

32

In

32

×

32

32

× 32

×

32

×

32

32

32

×

32

32

×

32

×

32

32

× 32

×

32

32

×

32

32

32

32

32

×

32

32

32

×

32

32

×

32

32

3232

32

32

32

×

32

32

32

32

×

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 48 / 61

Page 49: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Polynomial evaluation — RBin N4

+

+

32

+ 32

+

+

32

+

32

+

+

32

+ 32

+

+ 32

+

32

+ Out32

+

32

+

32

32

32

+

32

32

32

In

32

× 32

32 × 32

×

32

× 32

×

32 ×

32

×

32

× 32

× 32

×

32

×

32

×

32

×

32

×

32

×

32

×

32

×

32

32

×

32

×

32

×

32

32 32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

×

32

32 ×

32 ×

32

32 ×

32

×

32

32

×

32

32

32

32

32

×

32

32

32

× 32

32

×

32

32

32

3232

32

32

×

32

32

32

32

×

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 49 / 61

Page 50: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Fast Fourier transform

DFT:

Xk =

N−1∑n=0

xne− 2πi

Nnk

FFT for N = N1 ·N2 (Gauss / Cooley-Tukey):

Xk =

N1−1∑n1=0

[e−

2πiN

n1k2](N2−1∑

n2=0

xN1n2+n1e− 2πiN2

n2k2

)e− 2πiN1

n1k1

Conal Elliott Functional programming and parallelism May 2016 50 / 61

Page 51: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Fast Fourier transform

class FFT f wheretype FFO f :: ∗ → ∗fft :: RealFloat a ⇒ f (Complex a)→ FFO f (Complex a)

instance FFT Id wheretype FFO Id = Idfft = id

instance FFT Pair wheretype FFO Pair = Pairfft (a :# b) = (a + b) :# (a − b)

Conal Elliott Functional programming and parallelism May 2016 51 / 61

Page 52: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

FFT — composition (Gauss / Cooley-Tukey)

instance...⇒ FFT (g ◦ f ) wheretype FFO (g ◦ f ) = FFO f ◦ FFO gfft = O ◦ traverse fft ◦ twiddle ◦ traverse fft ◦ transpose ◦ unO

twiddle :: ...⇒ g (f (Complex a))→ g (f (Complex a))twiddle = (zipWith ◦ zipWith) (∗) twiddleswhere

n = size@(g ◦ f )twiddles = fmap powers (powers ω)ω = cis (−2 ∗ π / fromIntegral n)cis a = cos a :+ sin a

Conal Elliott Functional programming and parallelism May 2016 52 / 61

Page 53: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

FFT — RBin N3 (“Decimation in time”)

+

+

64

64

+

+

64 −

64

+

+

64

64

+

+

64

64

+ 64

64

+

64

64

+ 64

64

+

64

64

+

64

64

+ 64

64

+

+

64

64

+

+

64

64

64

64

64

64

+

64

64

+

+ 64

− 64

64

64

+

64

64

+

64

64

Out

64

64

+

64

+

64

+

64

64

64

64

+

64

+

64

+

64

64

-0.7071067811865474

×

64 ×

64

-0.7071067811865475

×

64

×

64

-0.7071067811865477

× 64

× 64

-1.0

×

64

×

64

×

64

×

64

×

64

×

64

0.7071067811865476 ×

64

×

64

2.220446049250313e-16

× 64

×

64

6.123233995736766e-17

×

64

× 64

×

64

×

64

In

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

− 64

64

64

64

6464

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

6464

64

64

64

64

64

6464

64

64

64

64

Conal Elliott Functional programming and parallelism May 2016 53 / 61

Page 54: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

FFT — LBin N3 (“Decimation in frequency”)

+

+

64

64

+

+

64

64

+

+

64

64

+

+

64 −

64

+

64

64

+ 64

64

+

64

64

+ 64

64

+ 64

64

+ 64

64

+

+ 64

64

+

+ 64

64

+ 64

64

Out

64

64

+ 64

+

64

64

64

+

64

+

64

64

64

64

64

+

×

64

×

64

+

×

64

×

64

+ 64

64

+

64

64

+

64

64

+

64

64

-0.7071067811865474

× 64

×

64

-0.7071067811865475

64

64

-0.7071067811865477

×

64

×

64

-1.0

× 64

× 64

×

64

×

64

×

64

×

64

0.7071067811865476

64

64

2.220446049250313e-16 ×

64

×

64

6.123233995736766e-17 ×

64

×

64

×

64

×

64

In

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

− 64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

− 64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

64

Conal Elliott Functional programming and parallelism May 2016 54 / 61

Page 55: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Bitonic sort

Conal Elliott Functional programming and parallelism May 2016 55 / 61

Page 56: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Bitonic sort — depth 1

In

if

32

32

if

32

32

32

32

Out

32

32

Conal Elliott Functional programming and parallelism May 2016 56 / 61

Page 57: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Bitonic sort — depth 2

In

if 32

32

if 32

32

if 32

32

if

32

32

32

32

32

32

Out

if

32

if

32

32

if 32

if 32

32

32

32

32

32

32

32

if

32

if

32

32

if 32

if

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 57 / 61

Page 58: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Bitonic sort — depth 3

In

if

32

32

if

32

32

if 32

32

if

32

32

if

32

32

if

32

32

if

32

32

if

32

32

≤ 32

32

≤ 32

32

32

32

32

32

Out

if

32

if

32

32

if 32

if 32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

if 32

if

32

32

if 32

if 32

32

if

32

if

32

32

if

32

if 32

32

32

32

32

32

3232

if 32

if

32

32

if 32

if

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

if 32

if 32

32

if 32

if

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if 32

if

32

32

32

32

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

32

32

32

32

32

32

32

32

32

32

Conal Elliott Functional programming and parallelism May 2016 58 / 61

Page 59: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Bitonic sort — depth 4

In

if

32

32

if

32

32

if

32

32

if

32

32

if 32

32

if

32

32

if 32

32

if 32

32

if

32

32

if 32

32

if

32

32

if

32

32

if

32

32

if 32

32

if 32

32

if

32

32

32

32

32

32

32

32

≤ 32

32

32

32

32

32

32

32

32

32

Out

if

32

if

32

32

if 32

if

32

32

32

32

32

32

32

32

if 32

if

32

32

if 32

if

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

if 32

if

32

32

if

32

if

32

32

if

32

if

32

32

if

32

if

32

32

32

32

32

32

3232

if 32

if 32

32

if 32

if

32

32

32

32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

if 32

if 32

32

if

32

if 32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

if 32

if

32

32

if

32

if

32

32

if 32

if 32

≤ 32

if

32

if

32

32

32

32

32

32

32

32

if 32

if

32

32

if

32

if

32

32

if

32

if

32

32

if

32

if 32

32

if

32

if 32

32

if

32

if

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

if

32

if

32

32

if

32

if 32

32

if

32

if

32

32

if

32

if

32

32

32

32

32

32

32

32

if 32

if 32

32

if 32

if 32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if 32

if

32

32

if 32

if

32

32

if 32

if

32

≤ 32

32

32

32

32

32

32

32

32

32

32

32

32

if 32

if

32

32

if 32

if 32

≤ 32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if 32

if 32

32

if 32

if

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

≤ 32

if

32

if

32

32

if

32

if

32

32

if

32

if 32

32

if

32

if 32

≤ 32

if

32

if

32

32

if

32

if

32

≤ 32

if 32

if 32

32

32

3232

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

if 32

if

32

32

if 32

if 32

≤ 32

32

32

32

32

32

32

32

3232

32

32

32

if 32

if

32

32

if

32

if

32

32

32

32

32

32

32

3232

32

32

32

if

32

if 32

32

if

32

if

32

32

32

32

32

32

32

32

32

32

32

32

if

32

if

32

32

if

32

if

32

32

if 32

if 32

32

if 32

if

32

≤ 32

32

32

32

32

3232

32

32

32

32

3232

if 32

if

32

32

if 32

if 32

32

32

32

32

32

32

32

32

32

32

32

if 32

if 32

32

if 32

if 32

32

32

32

32

32

32

3232

32

32

32

Conal Elliott Functional programming and parallelism May 2016 59 / 61

Page 60: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Manual vs automatic placement

ENIAC, 1946:

Conal Elliott Functional programming and parallelism May 2016 60 / 61

Page 61: Functional programming and parallelismconal.net/talks/fp-parallel-2016.pdfR&D agenda: elegant, massively parallel FP Algorithm design: Functional & richly typed Parallel-friendly Easily

Manual vs automatic placement

Programmers used to explicitly place computations in space.

Mainstream programming still manually places in time.

Sequential composition: crude placement tool.

Threads: notationally clumsy & hard to manage correctly.

If we relinquish control, automation can do better.

Conal Elliott Functional programming and parallelism May 2016 61 / 61