Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf ·...

26
CISC 879 : Software Support for Multicore Architectures Presented By: Kanik Sem Dept of Computer & Information Sciences University of Delaware Porting Financial Market Applications to the Cell Broadband Engine Architecture John Easton, Ingo Meents, Olaf Stephen, Horst Zisgen, Sei Kato

Transcript of Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf ·...

Page 1: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Presented By: Kanik SemDept of Computer & Information Sciences

University of Delaware

Porting Financial Market Applications tothe Cell Broadband Engine Architecture

John Easton, Ingo Meents, Olaf Stephen, Horst Zisgen, Sei Kato

Page 2: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Outline

• Why Cell B.E. for financial markets?• Porting strategies for the Cell B.E. platform• Performance results• Mixed-precision workloads• Tying it all together• Conclusions

Page 3: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Why Cell B.E. for financial markets?

• Potential for dramatic impact on financialapplications

• Application codes ported to the Cell• Optimized codes to fully exploit Cell• Performance improvements of almost 40x

Page 4: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

A description of the application

• Code used to price a European Option.• Model based on Monte Carlo simulation technique.• Need to generate a large number (200,000,000 in this case) of

uniform, pseudo-random numbers.• Using the random numbers generated, execute the financial

model.

Page 5: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Porting strategies for Cell

• Recompilation of existing code for Cell• XLC better than gcc

• Make some structural changes• Framework to start separate threads on each SPU.• Splitting RNG across all cores.

• Make functional changes to the code.• Re-engineered functions to exploit vectorization on SPU cores.

Page 6: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Analysis of the original code

%time Seconds Calls Function name 62.70 118.32 200000000 getRandom() 37.18 70.16 1 simulateEuropeanOptionValue()

0.14 0.27 1 hpcMonteCarlo::random() 0.00 0.00 2 hpcBlackScholes()

SDK for Cell provides optimized RNG.Can generate 64 number generators at once on Cellblade.Use gettimeofday() function.

Page 7: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

To run the performance tests, the following parameterswere used :

• Compiler used: spuxlc, ppuxlc

• Compiler optimization setting: -03 –qstrict

• Random-number generation method: sdk

• Precision: single

• Number of evaluations: 200,000,000

Page 8: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

Performance by number of SPUs (single precision)

Number of SPUs

Elapsed time (seconds)2.4 GHz

Cell/B.E. processor

(measured)

Elapsed time (seconds)3.2 GHz

Cell/B.E. processor

(estimated)

Speedup

1 65.7 49.27 1 2 32.9 24.6 1.99

3 21.9 16.42 3 4 16.4 12.3 4

5 13.18 9.88 4.98

6 10.9 8.17 6.02 7 9.4 7.05 6.98

8 8.2 6.15 8.01 9 7.3 5.4 9

10 6.6 4.95 9.95 11 6 4.5 10.95

12 5.5 4.12 11.94

13 5.1 3.8 12.88 14 4.7 3.52 13.97

15 4.4 3.3 14.93 16 4.1 3.07 16.02

Page 9: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Initial performance results

Page 10: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Double Precision

Organizations in financial markets require double-precisioncalculations.

Initial target marketplace for Cell does not need this.

Initial implementation of Cell provides limited double-precisionsupport in hardware

Single-precision Fully pipelinedDouble-precision Partially pipelined

Page 11: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Performance results

Performance by number of SPUs (double -precision)

Number of SPUs Elapsed time

(seconds)2.4 GHz Cell/B.E.

processor

(measured)

Elapsed time

(seconds)3.2 GHz Cell/B.E.

processor

(estimated)

Speedup

1 157.3 117.9 1

2 78.6 58.9 2

3 52.4 39.3 3

4 39.3 29.47 4

5 31.49 23.61 4.99

6 26.25 19.68 5.99

7 22.5 16.8 6.99

8 19.7 14.7 7.98

9 17.5 13.12 8.98

10 15.78 11.8 9.96

11 14.3 10.7 11

12 13.1 9.82 12

13 12.1 9.1 13

14 11.3 8.47 13.92

15 10.5 7.87 14.98

16 9.9 7.42 15.89

Page 12: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mersenne-Twister

• Run time with Mersenne-Twister (without optimization): 5 sec• Run time with the Cell/B.E. SDK: 4.1 sec

Mechanisms to improve the performance still further :Optimize Mersenne-Twister code for threading framework.Rewrite the code to utilize the SIMD capabilities of SPUs.

Performance comparison between Cell/B.E. SDK and Mersenne -Twister random -number generators

Precision Runtime

(seconds) SDK

RNG (2.4Ghz)

Runtime

(seconds)

Mersenne -Twister RNG (2.4

GHz)

Runtime

(seconds)

Mersenne -Twister RNG 3.2

GHz (estimated)

Single 4.1 1.02 0.76

Double 9.9 2.47 1.85

Page 13: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloadsMixed-Precision:Only those parts that actually need double-precision arecalculated using double-precision.

Disadvantage:Makes for a slight increase in the programming effortneeded

Identify parts of code which use this sort of precision Make the appropriate changes to the code.

Advantage: Performance improvement.

Page 14: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

The two methods of applying mixed-precision to our codeare:

(1) Concatenating two single-precision random variables.

(2) Generate one single-precision random variable and thendoing a double-precision division.

Page 15: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

# SPU CC_DP_MT CC_DP_SDK M_DP_MT SP_MT SP_SDK

1 40.33 40.33 45.76 12.01 11.16

2 20.33 20.33 22.88 6.06 5.70

3 13.56 13.56 15.26 4.05 3.80

4 10.17 10.17 11.44 3.04 2.85

5 8.13 8.13 9.16 2.43 2.29

6 6.78 6.78 7.64 2.03 1.91

7 5.82 5.82 6.55 1.75 1.64

8 5.09 5.09 5.75 1.53 1.44

9 4.53 4.52 5.11 1.36 1.28

10 4.08 4.08 4.60 1.22 1.15

11 3.70 3.70 4.18 1.11 1.05

12 3.40 3.39 3.84 1.02 0.96

13 3.14 3.14 3.54 0.94 0.89

14 2.92 2.92 3.29 0.88 0.83

15 2.72 2.72 3.07 0.82 0.78

16 2.52 2.53 2.88 0.77 0.73

• CC_DP_MT = Concatenation Double-Precision Mersenne-Twister• CC_DP_SDK = Concatenation Double-Precision SDK• M_DP_MT = Division Double-Precision Mersenne-Twister• SP_MT = Single-Precision Mersenne-Twister• SP_SDK = Single-Precision SDK

Page 16: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

Page 17: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Mixed-precision workloads

Additional optimization techniques :

• Unrolling more parts of Mersenne-Twister RNG.

• Additional software pipelining by parallelizing computation.

• Introducing new variables to eliminate dependencies.

• Pre-calculating some items:a[0]=<something>;for (i=0;i<N;i++){sinf4(a[0]) ;sinf4(a[i+1));......}

Page 18: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Intel optimizations

• A “master” thread forks “slave” threads to perform RNG.• “master” thread part of the Cell/B.E. code that runs on PPU• “slave” threads parts that run on the SPUs.

Difference:• Work scheduled by the OpenMP runtime shares same cores as the

OS threads.• The SPUs on the Cell/B.E. version are not running the operating

system. This enables them to be used entirely to run the applicationcode.

Page 19: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Intel optimizations

System/CPU Operating System Compiler No. of Threads (Cores)

Speed (GHz) 1 2 4 8

x3550/3.0 Red Hat Linux Intel ICPC 31.76 15.9 8.46 -

x336 / 2.8 Red Hat Linux Intel ICPC 43.27 30.02 22.62 -

HS21 / 2.33 Fedora Core 6 gcc 43.38 21.74 10.88 8.26

Page 20: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Tying it all together

Page 21: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Future Work

Results achieved so far are on a system that many viewas being unsuitable for Financial Markets users.

• “Enhanced Double-Precision” version of the CellBroadband Engine technology.

• Systems based on Cell/B.E. technology are an excellentplatform for Financial Markets applications.

Page 22: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Getting the most performance out ofCell/B.E. technology

Offload as much of the computation onto the SPUs aspossible.

Write the SIMD code yourself rather than relying on thecompiler to do it.XLC provides “auto-SIMDize”This may not be a good approximation.

In certain situations, you might find that starting fromscratch is a much quicker way to implement applicationcode.

Page 23: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

Reasons for general-purpose processors make up themajority of the computational infrastructures :

(1) Huge numbers of systems based on these processors.

(2) Large supply of professionals skilled, this leads tolower skills costs.

(3) A lot of application development tooling.

(4) The relatively “easy” code porting to these platforms.

Page 24: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

“ESOTERIC” technologies: Offer high performance for their chip area. Consume much less power per computation.

Disadvantages:(1) Skills to program them are rare and, hence, expensive.(2) Lack of application development tooling.(3) The “porting” process is generally both slow and costly.

Page 25: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Conclusions

Advantages of Cell/B.E. technology:

(1) Consumes less power, space and cooling(2) High computational power.(3) Better data movement and manipulation abilities.(4) A number of strong customer proof points.(5) Support from key Independent Software Vendors(6) Results of experiments such as this one.

Page 26: Porting Financial Market Applications to the Cell ...cavazos/cisc879-spring2008/KanikS.pdf · Porting Financial Market Applications to the Cell Broadband Engine Architecture John

CISC 879 : Software Support for Multicore Architectures

Questions….

Comments….

Caveats ….