Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006...

29
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 1 Wolfram Technology Conference ENVISION. ACCELERATE. ARRIVE. 12 th October 2007 Real Acceleration for Mathematica® Simon McIntosh-Smith VP of Applications, ClearSpeed Technology [email protected]

Transcript of Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006...

Page 1: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 1

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

Real Acceleration for Mathematica®

Simon McIntosh-Smith

VP of Applications, ClearSpeed Technology

[email protected]

Page 2: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 2

Wolfram Technology Conference

Agenda

• Introduction• Accelerators• ClearSpeed math acceleration technology• Accelerating Mathematica• Summary

Page 3: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 3

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

Introduction

Page 4: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 4

Wolfram Technology Conference

Introduction

• Mathematica® is being used to solve more and more computationally intensive problems

• General purpose CPUs keep getting faster, but a new wave of application accelerators are emerging that could give much greater performance– Much as GPUs have done for graphics

• ClearSpeed has been developing hardware accelerators specifically focused on scientific computing, and which accelerate the low-level math libraries used by Mathematica

Page 5: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 5

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

Accelerators

Page 6: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 6

Wolfram Technology Conference

Accelerator technologies

• Visualization and media processing– Good for graphics, video, game physics, speech, …– Graphics Processing Units (GPUs) are well established in the

mainstream– But there was a time not too long ago when your PC still did all

the graphics in software on the main CPU…– Can be applied to some 32-bit applications today (64-bit coming

at much lower speed), but currently they are fairly hard to program and very power hungry – 200W!

• Embedded content processing– Data mining, encryption, XML, compression– Field Programmable Gate Arrays (FPGAs) are often being used

here, mainly to accelerate integer-intensive codes– Poor at floating point, especially 64-bit, and cut corners on

precision so don’t get good accuracy– Very hard to program and get good performance

Page 7: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 7

Wolfram Technology Conference

Accelerator technologies continued

• Math Accelerators– Mostly floating point, 64-bit performance is crucial, high

precision, supporting true IEEE754 floating point– Can accelerate numerically-intensive applications in

• Finance• Oil and Gas• Economics• Electromagnetics• Bioinformatics• And many, many more

– This is what ClearSpeed has developed

• To accelerate Mathematica, a true Math Accelerator is needed…

Page 8: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 8

Wolfram Technology Conference

The other benefit of accelerators – low power

• Running 1 watt for 1 years costs about $1• Modern CPUs can consume around 100W

– $100/year running cost for the CPU alone if used 24/7– Significant associated CO2 emissions

• Accelerators typically bring significant performance per watt gains– Examples later in this presentation show 1 CPU plus a

25W ClearSpeed board running as fast as a 4 CPU (8 core) machine

– This power consumption reduction of around 275W, if applied 24/7, is a $275 energy cost saving

– Not to mention how much smaller and quieter the accelerated system can be…

Page 9: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 9

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

ClearSpeed’s Math Acceleration Technology

Page 10: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 10

Wolfram Technology Conference

What are ClearSpeed’s products?

• Math accelerator boards, ClearSpeed Advance™ e620 & X620– Dual ClearSpeed CSX600 coprocessors– R∞ ≈ 66 GFLOPS for 64-bit matrix multiply (DGEMM) calls

• Hardware also supports 32-bit floating point

– PCI Express x8 and 133 MHz PCI-X 2/3rds support– 1 GByte of memory on the board– Linux drivers today for RedHat and Suse– Low power; 25 to 33 Watts

• Significantly accelerates the low-level math library used by Mathematica (MKL):– Target functions: Level 3 BLAS and LAPACK

Page 11: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 11

Wolfram Technology Conference

Which MKL functions can ClearSpeed accelerate?

Previous release – CSXL 2.51 and before:• L3 BLAS: large matrix arithmetic (preferably at least 1,000 on a side):

– DGEMM – real matrix multiply

• LAPACK: factorize and solve for large systems of linear equations– LU (DGETRF)

New release – CSXL 2.52:• L3 BLAS:

– ZGEMM – complex matrix multiply– DTRSM – triangular solve– Future release: DTRMM, DSYRK and others

• LAPACK:– LU (DGETRS)– QR (DGEQRF, DORGQR & DORMQR)– Cholesky (DPOTRF & DPOTRS)– Future release – complex versions of the above

Page 12: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 12

Wolfram Technology Conference

Software development kit (SDK)

• C compiler with vector extensions (ANSI-C based commercial compiler), assembler, libraries, ddd/gdb-based debugger, newlib-based C-rtl etc.

• ClearSpeed Advance development boards• Available for Linux, Windows

Page 13: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 13

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

Accelerating Mathematica

Page 14: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 14

Wolfram Technology Conference

Mathematica uses libraries underneath

Mathematica

BLAS & LAPACK library:

Intel’s MKL

CPU

Software

Hardware

Page 15: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 15

Wolfram Technology Conference

Mathematica using accelerated libraries

Mathematica

BLAS & LAPACK library:

Intel’s MKL

CPU

Software

Hardware

ClearSpeed’s CSXL Library

ClearSpeed AdvanceTM

board

Page 16: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 16

Wolfram Technology Conference

Plug-and-Play – No changes to your notebooks

• Mathematica has used MKL since v5.2• ClearSpeed provides a modified kernel

– Uses a modified “math” script that launches the kernel– Sets the library path to pick up CSXL as well as MKL

• Functions supported in Mathematica today include:– Dot[]– Det[]– LUDecomposition[]– LinearSolve[]– Inverse[]– CholeskyDecomposition[] – new!– QRDecomposition[] – new!

• If your notebooks spend a high percentage of your total runtime in these functions, and a lot of time in each call to these functions, then you may have a candidate for ClearSpeed acceleration!

• It is very likely that other functions are also accelerated– If you find more, let us know!

Page 17: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 17

Wolfram Technology Conference

• ClearSpeed has been collaborating with ScienceOps to discover what kinds of problems are accelerated

• Early results show a good breadth of applications being accelerated– Performance improvements– Ability to run larger problem sets

• Initial results show speedup ranging from 2 – 5X

What kind of notebooks could be accelerated?

Page 18: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 18

Wolfram Technology Conference

Example notebooks

• Benchmarked on a fast server for comparison:– 4 processors, each dual core (8 cores total), AMD Opteron 870

(2GHz) with 32GBytes of memory running Linux RHE4-64

• Comparisons are between:– Using 2 Opteron cores on their own

– Using all 8 Opteron cores on their own, and

– Using 2 Opteron cores with a single ClearSpeed Advance accelerator board

• We haven’t re-benchmarked these notebooks on our latest release and on the new PCI Express verison of our board yet, both of which should increase performance

Page 19: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 19

Wolfram Technology Conference

Example notebook descriptions

• ANOVA– Analysis of variance, a linear least squares minimisation,

fitting a curve to sampled data

• Microarray– Microarray data analysis, determines coexpression

networks – sets of genes that are commonly expressed together under different experimental conditions. Calculates distance metrics

• ImageDecode– Progressive decoding of images using the Haar wavelet

transform. Grayscale images used in this example

• Spatial Auto Regression (SAR)– Simple regressions iterating on large, dense matrices

Page 20: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 20

Wolfram Technology Conference

Example – ANOVA

• ANOVA notebook benefits from 2X speedup with 4,000 predictors

• Two cores with a ClearSpeed accelerator equivalent in performance to an eight core machine!

ANOVA speedup

0

10

20

30

40

50

60

70

80

90

100

500 1000 2000 4000

Number of Predictors

Tim

e in

sec

onds

(low

er is

be

tter

)Host speed (2 cores)

Host speed (8 cores)

2 cores + accelerator

Page 21: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 21

Wolfram Technology Conference

Example – Microarray

Microarray speedup

0

5

10

15

20

25

30

35

800^2 1000^2 2000^2 4000^2

Yeast size

Tim

e in

sec

onds

(low

er is

bet

ter)

Host speed (2 cores)

Host speed (8 cores)

2 cores + accelerator

• Microarray notebook benefits from nearly a 3X speedup with 4,000 inputs

• Larger problems may receive even more speedup– Data sets with over 6,000 expression levels exist for yeast

Page 22: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 22

Wolfram Technology Conference

Example – ImageDecode

• ImageDecode notebook speedup ranges from 2-3X depending on the image size

• When tuned this speedup should also be achieved for images around 960x960 in size (already around 1.6X)

ImageDecode speedup

0

20

40

60

80

100

120

140

1024x1024 1600x1200 3072x2304

Image size

Tim

e in

sec

ond

s(lo

wer

is b

ette

r)Host speed (2 cores)

Host speed (8 cores)

2 cores + accelerator

Page 23: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 23

Wolfram Technology Conference

Example – Spatial Auto Regression

• SAR notebook speedup nearly 2X• Larger problems should receive even more speedup

– Run-times quite substantial too

SAR speedup

0

200

400

600

800

1000

1200

50 (0.5GBytes)

Problem size

Tim

e in

sec

ond

s(lo

wer

is b

ett

er)

Host speed (2 cores)

Host speed (8 cores)

2 cores + accelerator

Page 24: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 24

Wolfram Technology Conference

New CholeskyDecomposition[] performance

A = Table[Random[], {n}, {n}];

B = Dot[Transpose[A], A]; Clear[A];

AbsoluteTiming[CholeskyDecomposition[B];]

Page 25: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 25

Wolfram Technology Conference

New QRDecomposition[] performance

A = Table[Random[], {n}, {n}];

B = Dot[Transpose[A], A]; Clear[A];

AbsoluteTiming[QRDecomposition[B];]

Page 26: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 26

Wolfram Technology Conference

New complex Dot[] performance

A=Table[Complex[1.5,1.5],{n},{n}];

AbsoluteTiming[Dot[Transpose[A], A];]

Page 27: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 27

Wolfram Technology Conference

The Challenge

• Mathematica does a great job of choosing the right method for the right problem…

• … Which make it hard to know which method is going to be used and when!

• Consequently it’s proving very difficult to know in advance what is going to be accelerated and by how much

• Call to action:– Can you think of any applications that should be

significantly accelerated by ClearSpeed?

Page 28: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 28

Wolfram Technology Conference

ENVISION. ACCELERATE. ARRIVE.

12th October 2007

Summary

Page 29: Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 1 12 th October.

Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 29

Wolfram Technology Conference

Summary

• Accelerators can be used to significantly increase performance and performance per watt across a range of interesting applications in Mathematica

• You need a real 64-bit math accelerator for Mathematica to deliver the precision you depend upon

• ClearSpeed can accelerate notebooks making intensive use of Dot[], Det[], LUDecomposition[], LinearSolve[], Inverse[], CholeskyDecomposition[] and QRDecomposition[]– More in the future as the libraries are developed

• Plug-and-play – no changes to your notebooks

• What could you do if you added 66 GFLOPS of matrix crunching power to your Mathematica performance?