Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006...
-
Upload
martha-powell -
Category
Documents
-
view
214 -
download
0
Transcript of Www.clearspeed.comWolfram Technology Conference ENVISION. ACCELERATE.ARRIVE. Copyright © 2006...
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 1
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
Real Acceleration for Mathematica®
Simon McIntosh-Smith
VP of Applications, ClearSpeed Technology
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 2
Wolfram Technology Conference
Agenda
• Introduction• Accelerators• ClearSpeed math acceleration technology• Accelerating Mathematica• Summary
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 3
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
Introduction
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 4
Wolfram Technology Conference
Introduction
• Mathematica® is being used to solve more and more computationally intensive problems
• General purpose CPUs keep getting faster, but a new wave of application accelerators are emerging that could give much greater performance– Much as GPUs have done for graphics
• ClearSpeed has been developing hardware accelerators specifically focused on scientific computing, and which accelerate the low-level math libraries used by Mathematica
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 5
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
Accelerators
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 6
Wolfram Technology Conference
Accelerator technologies
• Visualization and media processing– Good for graphics, video, game physics, speech, …– Graphics Processing Units (GPUs) are well established in the
mainstream– But there was a time not too long ago when your PC still did all
the graphics in software on the main CPU…– Can be applied to some 32-bit applications today (64-bit coming
at much lower speed), but currently they are fairly hard to program and very power hungry – 200W!
• Embedded content processing– Data mining, encryption, XML, compression– Field Programmable Gate Arrays (FPGAs) are often being used
here, mainly to accelerate integer-intensive codes– Poor at floating point, especially 64-bit, and cut corners on
precision so don’t get good accuracy– Very hard to program and get good performance
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 7
Wolfram Technology Conference
Accelerator technologies continued
• Math Accelerators– Mostly floating point, 64-bit performance is crucial, high
precision, supporting true IEEE754 floating point– Can accelerate numerically-intensive applications in
• Finance• Oil and Gas• Economics• Electromagnetics• Bioinformatics• And many, many more
– This is what ClearSpeed has developed
• To accelerate Mathematica, a true Math Accelerator is needed…
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 8
Wolfram Technology Conference
The other benefit of accelerators – low power
• Running 1 watt for 1 years costs about $1• Modern CPUs can consume around 100W
– $100/year running cost for the CPU alone if used 24/7– Significant associated CO2 emissions
• Accelerators typically bring significant performance per watt gains– Examples later in this presentation show 1 CPU plus a
25W ClearSpeed board running as fast as a 4 CPU (8 core) machine
– This power consumption reduction of around 275W, if applied 24/7, is a $275 energy cost saving
– Not to mention how much smaller and quieter the accelerated system can be…
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 9
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
ClearSpeed’s Math Acceleration Technology
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 10
Wolfram Technology Conference
What are ClearSpeed’s products?
• Math accelerator boards, ClearSpeed Advance™ e620 & X620– Dual ClearSpeed CSX600 coprocessors– R∞ ≈ 66 GFLOPS for 64-bit matrix multiply (DGEMM) calls
• Hardware also supports 32-bit floating point
– PCI Express x8 and 133 MHz PCI-X 2/3rds support– 1 GByte of memory on the board– Linux drivers today for RedHat and Suse– Low power; 25 to 33 Watts
• Significantly accelerates the low-level math library used by Mathematica (MKL):– Target functions: Level 3 BLAS and LAPACK
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 11
Wolfram Technology Conference
Which MKL functions can ClearSpeed accelerate?
Previous release – CSXL 2.51 and before:• L3 BLAS: large matrix arithmetic (preferably at least 1,000 on a side):
– DGEMM – real matrix multiply
• LAPACK: factorize and solve for large systems of linear equations– LU (DGETRF)
New release – CSXL 2.52:• L3 BLAS:
– ZGEMM – complex matrix multiply– DTRSM – triangular solve– Future release: DTRMM, DSYRK and others
• LAPACK:– LU (DGETRS)– QR (DGEQRF, DORGQR & DORMQR)– Cholesky (DPOTRF & DPOTRS)– Future release – complex versions of the above
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 12
Wolfram Technology Conference
Software development kit (SDK)
• C compiler with vector extensions (ANSI-C based commercial compiler), assembler, libraries, ddd/gdb-based debugger, newlib-based C-rtl etc.
• ClearSpeed Advance development boards• Available for Linux, Windows
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 13
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
Accelerating Mathematica
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 14
Wolfram Technology Conference
Mathematica uses libraries underneath
Mathematica
BLAS & LAPACK library:
Intel’s MKL
CPU
Software
Hardware
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 15
Wolfram Technology Conference
Mathematica using accelerated libraries
Mathematica
BLAS & LAPACK library:
Intel’s MKL
CPU
Software
Hardware
ClearSpeed’s CSXL Library
ClearSpeed AdvanceTM
board
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 16
Wolfram Technology Conference
Plug-and-Play – No changes to your notebooks
• Mathematica has used MKL since v5.2• ClearSpeed provides a modified kernel
– Uses a modified “math” script that launches the kernel– Sets the library path to pick up CSXL as well as MKL
• Functions supported in Mathematica today include:– Dot[]– Det[]– LUDecomposition[]– LinearSolve[]– Inverse[]– CholeskyDecomposition[] – new!– QRDecomposition[] – new!
• If your notebooks spend a high percentage of your total runtime in these functions, and a lot of time in each call to these functions, then you may have a candidate for ClearSpeed acceleration!
• It is very likely that other functions are also accelerated– If you find more, let us know!
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 17
Wolfram Technology Conference
• ClearSpeed has been collaborating with ScienceOps to discover what kinds of problems are accelerated
• Early results show a good breadth of applications being accelerated– Performance improvements– Ability to run larger problem sets
• Initial results show speedup ranging from 2 – 5X
What kind of notebooks could be accelerated?
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 18
Wolfram Technology Conference
Example notebooks
• Benchmarked on a fast server for comparison:– 4 processors, each dual core (8 cores total), AMD Opteron 870
(2GHz) with 32GBytes of memory running Linux RHE4-64
• Comparisons are between:– Using 2 Opteron cores on their own
– Using all 8 Opteron cores on their own, and
– Using 2 Opteron cores with a single ClearSpeed Advance accelerator board
• We haven’t re-benchmarked these notebooks on our latest release and on the new PCI Express verison of our board yet, both of which should increase performance
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 19
Wolfram Technology Conference
Example notebook descriptions
• ANOVA– Analysis of variance, a linear least squares minimisation,
fitting a curve to sampled data
• Microarray– Microarray data analysis, determines coexpression
networks – sets of genes that are commonly expressed together under different experimental conditions. Calculates distance metrics
• ImageDecode– Progressive decoding of images using the Haar wavelet
transform. Grayscale images used in this example
• Spatial Auto Regression (SAR)– Simple regressions iterating on large, dense matrices
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 20
Wolfram Technology Conference
Example – ANOVA
• ANOVA notebook benefits from 2X speedup with 4,000 predictors
• Two cores with a ClearSpeed accelerator equivalent in performance to an eight core machine!
ANOVA speedup
0
10
20
30
40
50
60
70
80
90
100
500 1000 2000 4000
Number of Predictors
Tim
e in
sec
onds
(low
er is
be
tter
)Host speed (2 cores)
Host speed (8 cores)
2 cores + accelerator
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 21
Wolfram Technology Conference
Example – Microarray
Microarray speedup
0
5
10
15
20
25
30
35
800^2 1000^2 2000^2 4000^2
Yeast size
Tim
e in
sec
onds
(low
er is
bet
ter)
Host speed (2 cores)
Host speed (8 cores)
2 cores + accelerator
• Microarray notebook benefits from nearly a 3X speedup with 4,000 inputs
• Larger problems may receive even more speedup– Data sets with over 6,000 expression levels exist for yeast
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 22
Wolfram Technology Conference
Example – ImageDecode
• ImageDecode notebook speedup ranges from 2-3X depending on the image size
• When tuned this speedup should also be achieved for images around 960x960 in size (already around 1.6X)
ImageDecode speedup
0
20
40
60
80
100
120
140
1024x1024 1600x1200 3072x2304
Image size
Tim
e in
sec
ond
s(lo
wer
is b
ette
r)Host speed (2 cores)
Host speed (8 cores)
2 cores + accelerator
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 23
Wolfram Technology Conference
Example – Spatial Auto Regression
• SAR notebook speedup nearly 2X• Larger problems should receive even more speedup
– Run-times quite substantial too
SAR speedup
0
200
400
600
800
1000
1200
50 (0.5GBytes)
Problem size
Tim
e in
sec
ond
s(lo
wer
is b
ett
er)
Host speed (2 cores)
Host speed (8 cores)
2 cores + accelerator
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 24
Wolfram Technology Conference
New CholeskyDecomposition[] performance
A = Table[Random[], {n}, {n}];
B = Dot[Transpose[A], A]; Clear[A];
AbsoluteTiming[CholeskyDecomposition[B];]
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 25
Wolfram Technology Conference
New QRDecomposition[] performance
A = Table[Random[], {n}, {n}];
B = Dot[Transpose[A], A]; Clear[A];
AbsoluteTiming[QRDecomposition[B];]
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 26
Wolfram Technology Conference
New complex Dot[] performance
A=Table[Complex[1.5,1.5],{n},{n}];
AbsoluteTiming[Dot[Transpose[A], A];]
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 27
Wolfram Technology Conference
The Challenge
• Mathematica does a great job of choosing the right method for the right problem…
• … Which make it hard to know which method is going to be used and when!
• Consequently it’s proving very difficult to know in advance what is going to be accelerated and by how much
• Call to action:– Can you think of any applications that should be
significantly accelerated by ClearSpeed?
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. www.clearspeed.com 28
Wolfram Technology Conference
ENVISION. ACCELERATE. ARRIVE.
12th October 2007
Summary
Copyright © 2006 ClearSpeed Technology plc. All rights reserved. 12th October 2007 www.clearspeed.com 29
Wolfram Technology Conference
Summary
• Accelerators can be used to significantly increase performance and performance per watt across a range of interesting applications in Mathematica
• You need a real 64-bit math accelerator for Mathematica to deliver the precision you depend upon
• ClearSpeed can accelerate notebooks making intensive use of Dot[], Det[], LUDecomposition[], LinearSolve[], Inverse[], CholeskyDecomposition[] and QRDecomposition[]– More in the future as the libraries are developed
• Plug-and-play – no changes to your notebooks
• What could you do if you added 66 GFLOPS of matrix crunching power to your Mathematica performance?