ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path

33
Experts in numerical algorithms and HPC services NAG for HPC Finance John Holden [email protected] 14 th July 2015 The good, bad and ugly of accelerators in finance and an alternative a complementary path ISC, Frankfurt

Transcript of ISC Frankfurt 2015: Good, bad and ugly of accelerators and a complementary path

Experts in numerical algorithms and HPC services

NAG for HPC Finance

John Holden

[email protected]

14th July 2015

The good, bad and ugly of

accelerators in finance

and an alternative a

complementary path

ISC, Frankfurt

2

NAG Introduction

NAG for HPC Finance

Why Quants Love NAG

Accelerators NVIDIA

Intel Xeon-Phi

Algorithmic Differentiation

Summary

Agenda

3

Founded 1970 Not-for-profit organisation

Surpluses fund on-going R&D

Mathematical and Statistical Expertise Numerical Libraries of components

Consulting

HPC Services

Computational Science and Engineering (CSE) support

Procurement advice, market watch, benchmarking

NAG Background

4

HPC Services

Government, Academic and Commercial

Full CSE service

Code porting, tuning, scaling, rewriting…

Training

1-20 FTEs per annum

Procurement advice/benchmarking

ARM

5

Financial Services

Many clients in FSI

Most Tier 1 Banks have licences

> 60% have global licences

Typically the NAG Library is embedded in the banks own “quant” libraries (C++, . NET, Java, Python,…)

6

NAG Introduction

NAG for HPC Finance

Why Quants Love NAG

Accelerators NVIDIA

Intel Xeon-Phi

Algorithmic Differentiation

Summary

Agenda

7

Why Quants use NAG Libraries and Toolboxes?

Global reputation for quality – accuracy, reliability and robustness…

Extensively tested, supported and maintained code

Reduces development time

Allows concentration on your key areas

Components

Fit into your environment

Simple interfaces to your favourite packages

Regular performance improvements!

Give “qualified error” messages e.g. tolerances of answers

8

from Finance - k Factor Problem

9

from Finance - k Factor Problem

Principal Factors method (Andersen et al., 2003)

does NOT always converge to correct answer…

(no convergence theory)

Should have come to NAG….

Our* spectral projected gradient method respects

constraints, exploits convexity, converges to a feasible

stationary point*NAG Library G02AE - Borsdorf, Higham & Raydan, 2010,

10

NAG Library and Toolbox Contents

Root Finding

Summation of Series

Quadrature

Ordinary Differential Equations

Partial Differential Equations

Numerical Differentiation

Integral Equations

Mesh Generation

Interpolation

Curve and Surface Fitting

Optimization

Approximations of Special Functions

Dense Linear Algebra

Sparse Linear Algebra

Correlation & Regression Analysis

Multivariate Methods

Analysis of Variance

Random Number Generators

Univariate Estimation

Nonparametric Statistics

Smoothing in Statistics

Contingency Table Analysis

Survival Analysis

Time Series Analysis

Operations Research

11

Use of NAG Software in Finance

Portfolio allocation / Risk management /Stress testing Optimization , interpolation, linear algebra, RNGs, Distributions,

Copulas…

Derivative pricing, Hedging PDEs, RNGs, multivariate normal, curve & surface fitting,

quadrature…

Calibration Optimisation, Interpolation , Root Finders, Splines

Data analysis Time series, GARCH, principal component analysis, data smoothing,

Data Mining…

Monte Carlo simulation RNGs, Brownian Bridge constructor, Linear Algebra

12

Why Quantitative Analysts Love NAG?

General Problem

To build asset models and risk engines in a timely manner that are Robust

Stable

Quick

Solution

Use robust, well tested, fast numerical components

This allows the “expensive” experts to concentrate on the modelling and interpretation avoiding distraction with low level numerical components

13

Problem 1: Simulation (Monte Carlo)

Simulation is important for scenario generation

Several different numerical components needed

Random Number Generators

Brownian bridge constructor

Interpolation/Splines

Principal Component Analysis

Cholesky Decomposition

Distributions (uniform, Normal, exponential gamma, Poisson, Student’s t, Weibull,..)

..

14

Problem 1: Simulation (Monte Carlo)

Simulation is important for scenario generation

NAG to the rescue (CPU or GPU)

Several different numerical components needed

Random Number Generators √

Brownian bridge constructor √

Interpolation/Splines √

Principal Component Analysis√

Cholesky Decomposition √

Distributions (uniform, Normal, exponential gamma, Poisson, Student’s t, Weibull,..)√

.. √ √

15

Problem 2: Calibration

Financial institutions all need to calibrate their models

Several different numerical components needed

Optimisation functions (e.g. constrained non-linear optimisers)

Interpolation functions (used intelligently*)

Spline functions

..

*interpolator must be used carefully –must know the properties to pick appropriate method

16

Problem 2: Calibration

Financial institutions all need to calibrate their models

NAG to the rescue

Several different numerical components needed

Optimisation functions (e.g. constrained non-linear

optimisers) √

Interpolation functions (used intelligently*) √

Spline functions √

.. √ √

*interpolator must be used carefully –must know the properties to pick appropriate method

17

NAG Introduction

NAG for HPC Finance

Why Quants Love NAG

Accelerators NVIDIA

Intel Xeon-Phi

Algorithmic Differentiation

Summary

Agenda

18

Escalator?: Want more performance? Buy the next processor!

To get performance/efficiency we have to go (massively) parallel

Disruption causing serious look at ‘other’ technologies and algorithms!

Even CPUs with tens of cores per node

Hybrid, shared-memory and distributed-memory parallelism

Painful whichever way we turn!

Where has my Escalator gone?

19

Loose definition: hardware on which to run your software better than on your (general purpose) CPU

Generally NOT an easy win

Significant learning curve and effort

Offload disadvantages

Accelerators

20

ClearSpeed

Similar to GPU

Lacked a good software eco-system

IBM Cell

Lacked a good software eco-system

GPGPU

NVIDIA invested in the software eco-system (AMD not!)

Intel Phi

Early days – an encouraging start

Expecting a lot more with Knights Landing!

Accelerators

21

We provide

A suite of Numerical Routines for Monte Carlo simulation from a collaboration with Professor Mike Giles

also MAGMA based Linear Algebra from Jack Dongarra

worked with Professor William Shaw to implement new Inverse CDFs (new distributions and speed up to existing code)

“Bespoke” consultancy codes PDE Solver for Stochastic Local Volatility

FX Basket Option, Local Vol Model

Solutions combining GPU and Algorithmic Differentiation More on that later….

Training courses for CUDA and Open CL

NAG and GPGPUs (NVIDIA)

22

Relatively easy to take existing OpenMP based code and port to Phi

NAG and Intel Xeon-Phi

Tuning for Phi takes some learning and expertise

… but feedback into Xeon code is often very strong

Performance Issues As always, need large enough problems to make the offload

worthwhile

seems to have significant offload overheads

NAG Library for Intel Xeon-Phi available

23

0

50

100

150

200

250

300

350

400

450

0 5000 10000 15000 20000 25000 30000

Tim

e (

s)

Problem Size (n)

NAG Distance Matrix (g03ea) – Intel Xeon Phi

32 threads original Phi offload original Phi offload opt 32 threads opt

n=30k; m=3k

Xeon 32t: 192s

Xeon 32t*: 75.7s

Phi 240t*: 40.6s

Phi gain ~5x over

original or ~2x over

optimised

24

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

100 10,000 1,000,000 100,000,000

Tim

e (

s)

Size of problem (n, log scale)

Uniform RNG – NAG Mersenne Twister (g05sa) – Intel Xeon Phi

8 threads original Native Phi original Native Phi opt 8 threads opt

n=500m

Xeon 8t: 0.25s

Phi 240t:1.50s

Xeon 8t*: 0.22s

Phi gain ~3x

25

NAG & AD: Algorithmic Differentiation in a nutshell

Computers can only add, subtract, multiply and divide numbers.

A computer program implementing a model is many of these basic operations strung together

Elementary to compute the derivatives of these

Chain rule + basic derivatives = program derivative

Classes, templates and operator overloading can do this efficiently and non-intrusively

26

To get acceleration look at the algorithms

0

50

100

150

200

250

300

350

50 100 150 200

Runti

me (

s)

Number of inputs (size of Delta)

5000 gradient evaluations of LIBOR Market Model*

using finitedifferences(bumping)

using adjoints

2nd-orderadjoints(projectedHessian)

*M.B. Giles and P.

Glasserman. `Smoking

adjoints: fast Monte Carlo

Greeks', RISK, January

2006

27

Computing derivatives in finance is important…

Calculating a product’s sensitivities to a range of risk factors (a.k.a. Greeks) creates huge computational demand on risk and price models

Traditional approach “bumping” - finite differences

Which is Computationally very expensive.. more hardware!

The alternatives to finite differences are

Write derivative code by hand

Efficient, but difficult to write & highly error prone (need to develop original and adjoint models)

Algorithmic Differentiation

flexible and just develop the original model - obvious choice

28

NAG and AAD

Adjoint Algorithmic Differentiation (AAD) reduces Runtime

With RWTH Aachen University (Prof. Uwe Naumann et al.) NAG are delivering Algorithmic Differentiation (AD) tools and services to the finance community for C /C++/CUDA codes.

Our example codes include

LIBOR Market Model

PDE based Local Volatility model

GPU accelerated Local Vol Basket Option pricer

Our solutions deliver for accelerators

29

A few numbers

Monte Carlo

n f cfd AD ADf cfdAD

34 0.5s 29.0s 3.0s (2.5MB) 6.0x 9.7x

62 0.7s 80.9s 5.1s (3.2MB) 7.3x 15.9x

142 1.5s 423.5s 12.4s (5.1MB) 8.3x 34.2x

22 2.3s 1010.7s 24.4s (7.1MB) 10.6x 41.4x

PDE

34 0.6s 37.7s 11.6s (535MB) 19.3x 3.3x

62 1.0s 119.5s 18.7s (919MB) 18.7x 6.4x

142 2.6s 741.2s 39s (2GB) 15.0x 19x

222 4.1s 1857.3s 60s (3GB) 14.6x 31x

30

AAD and GPUs

“AAD Vs GPUs: banks turn to maths tricks as chips lose appeal” risk.net, Jan 2015 – NONSENSE… surely combining AAD and GPUs make the ultimate accelerator!

“…. Join the AAD revolution” risk.net, July 15 – making more a lot more sense…

“In computational finance, there is no silver bullet. AAD is an algorithmic advance… …GPUs are parallel compute accelerations. The two are complementary” J. Ashley, IBM

NAG is already delivering “combined” solutions to our clients (in FSI and other sectors)

31

NAG Introduction

NAG for HPC Finance

Why Quants Love NAG

Accelerators NVIDIA

Intel Xeon-Phi

Algorithmic Differentiation

Summary

Agenda

32

NAG is keen to collaborate in building models and risk engines

Requirements are likely to be varied across FSI

We want to make sure we have what you need

The importance of HPC Finance is growing and will involve a LOT of computation (Basel III, CVA,…)

NAG has significant experience in HPC libraries, services, consulting and training

We know how to do large scale computations efficiently

This is non-trivial! Our expertise has been sought out and exploited by organisations such as (AMD, HECToR, Microsoft, Oracle, major banks, major oil & gas cos,…….)

HPC Finance - Summary

33

www.nag.co.uk

AD explained http://www.nag.co.uk/pss/nag-and-algorithmic-differentiation

Adjoint Algorithmic Differentiation Tool Support for Typical Numerical Patterns in Computational Finance http://www.nag.co.uk/doc/techrep/pdf/tr3_14.pdf

Adjoint Algorithmic Differentiation of a GPU Accelerated Application http://www.nag.co.uk/Market/articles/adjoint-algorithmic-differentiation-of-gpu-accelerated-app.pdf

References