Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data...

37
Petalisp A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-N¨ urnberg

Transcript of Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data...

Page 1: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Petalisp

A Common Lisp Library for Data Parallel Programming

Marco Heisig

16.04.2018

Chair for System Simulation

FAU Erlangen-Nurnberg

Page 2: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Petalisp

The library Petalisp1 is a new approach to data parallel computing.

The Goal: Elegant High Performance Computing

• Programs that are beautiful and fast

• A programming model that is safe and productive

Drawbacks:

• Limited to operations on structured data

• Significant run-time overhead

1https://github.com/marcoheisig/Petalisp

1

Page 3: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Table of contents

1. Using Petalisp

2. Implementation

3. Performance

4. Conclusions

2

Page 4: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Using Petalisp

Page 5: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Strided Arrays

A strided array in n dimensions is a function

from elements of the cartesian product of n

ranges to a set of Common Lisp objects.

A range with the lower bound xL, the step size s

and the upper bound xU , with xL, s, xU ∈ Z, is

the set of integers

{ x ∈ Z | xL ≤ x ≤ xU ∧ (∃k ∈ Z) [x = xL +ks] }.

stri

de 3

stride 2stride 1

Objects of type cl:simple-array are a special case of strided arrays.

3

Page 6: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 1/5 — First Class Index Spaces

To introduce parallelism, Petalisp always operates on index spaces, not

on individual array elements.

Notation: (σ (START [STEP] END) ...)

(σ (0 5) (0 3)) (σ (1 2 3) (1 2 3) (1 2 3))

Implementation detail: Petalisp can compute the union, difference and

intersection of arbitrary index spaces.

4

Page 7: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 2/5 — Transformations

A transformation is an affine-linear mapping from indices to indices.

Notation: (τ (INDEX ...) (EXPRESSION ...))

(τ (i j)

(0 j (- i 4)))

(τ (0 j k)

((+ k 4) j))

Implementation detail: Petalisp can compute the inverse and

composition of arbitrary transformations.5

Page 8: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 3/5 — Moving Data

The -> operator allows to select, transform or broadcast data.

(-> 0 (σ (0 9) (0 9))) ; a 10 × 10 array of zeros

(-> #(2 3) (σ (0 0))) ; the first element only

(-> A (τ (i j) (j i))) ; transposing A

Admittedly, -> is a mediocre function name. Better suggestions are most

welcome!

6

Page 9: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 4/5 — Combining Data

The fuse and fuse* operator combine multiple arrays into one.

For fuse, the arguments must be non-overlapping. For fuse*, the value

of the rightmost array takes precedence on overlap.

(defvar B (-> #(2) (τ (i) ((1+ i)))))

(fuse #(1) B) ; equivalent to (-> #(1 2))

(fuse #(1 3) B) ; an error!

(fuse* #(1 3) B) ; equivalent to (-> #(1 2))

7

Page 10: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 5/5 — Parallel Operations

The final piece: Application of Common Lisp functions to strided arrays.

• The function α is basically cl:map for strided arrays.

• The function β is basically cl:reduce applied to the last dimension

of a strided array.

(α #’+ 2 3) ; adding two numbers

(α #’+ A B C) ; adding three arrays element-wise

(β #’+ #(2 3)) ; adding two numbers

(defvar B #2A((1 2 3) (4 5 6)))

(β #’+ B) ; summing the rows of B

Remark: No guarantees are made about when and how often the

functions passed to α and β are invoked.

8

Page 11: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API Summary

All core functions at a glance:

• Index spaces, e.g. (σ (a b))

• Transformations, e.g. (τ (x y) (y (- x)))

• Data motion, e.g. (-> A (σ (2 5)))

• Data combination, e.g. (fuse* A B C)

• Parallel map, e.g. (α #’* A B)

• Parallel reduce, e.g. (β #’+ A)

This API is purely functional and declarative.

But how do we obtain values?

9

Page 12: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

API 6/5 — Triggering Evaluation

Petalisp provides two functions to trigger evaluation.

The compute function converts strided arrays into regular Common Lisp

arrays.

(compute (-> 0.0 (σ (0 1)))) => #(0.0 0.0)

(defvar A #(1 2 3))

(compute (β #’+ A)) => 6

(compute (-> A (τ (i) ((- i))))) => #(3 2 1)

Remark: There is also a schedule function for asynchronous evaluation.

10

Page 13: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Example: Matrix Multiplication

The mathematical definition

Cij =n∑

p=1

AipBpj

The corresponding Petalisp code

(β #’+

(α #’*

(-> A (τ (m n) (m 1 n)))

(-> B (τ (n k) (1 k n)))))

mk

n BAC

REDUCTION

11

Page 14: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Implementation

Page 15: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Lazy Arrays are Data Flow Graphs

(jacobi u 1) => #<strided-array-fusion t (σ (0 9) (0 9))>

Internal representation:

fusion(σ (0 1 9) (0 1 9))

reference(τ (i j) (i j))

(σ (0 9 9) (0 9))

reference(τ (i j) (i j))

(σ (1 8) (0 9 9))

application*

(σ (1 8) (1 8))

immediate(σ (0 9) (0 9))

reference(τ (i j) ())

(σ (1 8) (1 8))

application+

(σ (1 8) (1 8))

immediate(σ)

#0A0.25

reference(τ (i j) ((- i 1) j))

(σ (1 8) (1 8))

reference(τ (i j) ((1+ i) j))

(σ (1 8) (1 8))

reference(τ (i j) (i (- j 1)))

(σ (1 8) (1 8))

reference(τ (i j) (i (1+ j)))

(σ (1 8) (1 8))

immediate(σ (0 9) (0 9))

12

Page 16: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 17: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 18: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 19: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 20: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 21: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

application compute

13

Page 22: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application

application

application immediate

referencereference

application compute

fusion

reference

13

Page 23: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application

application

application immediate

referencereference

application compute

fusion

reference

13

Page 24: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application

application

application immediate

reference

application

compute

application

reference reference reference

13

Page 25: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application

application

application immediate

reference

application

compute

application

reference reference reference

13

Page 26: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

a2

[f0 [a2] [a2]]

[f0 [f1 [a0]] [f2 [a0]]

[f0 [a2] [a1]]

a3

a1

a0

13

Page 27: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

a2

[f0 [a2] [a2]]

[f0 [f1 [a0]] [f2 [a0]]

[f0 [a2] [a1]]

a3

a1

a0

13

Page 28: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

[f0 [a2] [a2]]

[f0 [f1 [a0]] [f2 [a0]]

[f0 [a2] [a1]]

a1a0

a2

a3

tim

e

13

Page 29: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

[f0 [a2] [a2]]

[f0 [f1 [a0]] [f2 [a0]]

[f0 [a2] [a1]]

a1a0

a2

a3

tim

e

13

Page 30: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

From Arrays to Executable Code

How do we execute a graph?

1. Determine critical nodes

2. Determine subtrees

3. Lift references

4. Eliminate fusions

5. Construct kernels

6. Schedule & allocate

7. Compile & execute

8. Done!

immediate

application reference

application

reference

application

fusion

immediate

reference

reference

computeimmediate

13

Page 31: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Performance

Page 32: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

On Performance

The long-term goal of Petalisp is to provide a programming model for

Petascale (1015 operations per second) systems.

The constant, high overhead of analysis and JIT-compilation seems to be

at odds with this goal.

However:

• Do not underestimate the power of memoization, hash-consing and

CLOS wizardry.

• Scheduling can often be done asynchronously.

• Petalisp’s analysis is independent of the problem size.

14

Page 33: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Jacobi’s method: Python vs. C++ vs. Petalisp

Hardware: Intel Xeon E3-1275 CPU 3.6GHz

15

Page 34: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Conclusions

Page 35: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Conclusions

Main Result: Our compilation strategy is feasible, with just about

10 – 500 microseconds overhead when calling compute.

Benefits:

• Clean separation between notation and execution.

• Unprecedented potential for optimization.

• Already faster than NumPy.

. . . all in just about 5000 lines of maintainable code.

16

Page 36: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Future Work

My preliminary roadmap for the next years:

• More applications (simulations, image processing, machine learning)

• API finalization

• Sophisticated Scheduling

• Better Shared-Memory Parallelization

• Auto-Vectorization

• Distributed Parallelization

• . . .

• Make this a PhD thesis

17

Page 37: Petalisp - A Common Lisp Library for Data Parallel Programming · A Common Lisp Library for Data Parallel Programming Marco Heisig 16.04.2018 Chair for System Simulation FAU Erlangen-Nurnb

Thank you!

Questions or remarks?

18