GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA...

24
GPU Accelerated Machine Learning for Bond Price Prediction Venkat Bala Rafael Nicolas Fermin Cota

Transcript of GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA...

Page 1: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

GPU Accelerated Machine Learning forBond Price Prediction

Venkat Bala Rafael Nicolas Fermin Cota

Page 2: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Motivation

Primary Goals

• Demonstrate potential benefits of using GPUs over CPUs for machine learning

• Exploit inherent parallelism to improve model performance

• Real world application using a bond trade dataset

1

Page 3: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Highlights

Ensemble

• Bagging: Train independent regressors on equal sized bags of samples• Generally, performance is superior to any single individual regressor• Scalable: Each individual model can be trained independently and in parallel

Hardware Specifications

• CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz• GPU: GeForce GTX 1080 Ti• RAM : 1 TB (DDR4 2400 MHZ)

2

Page 4: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Bond Trade Dataset

Feature Set

• 100+ features per trade• Trade Size/Historical Features• Coupon Rate/Time to Maturity• Bond Rating• Trade Type: Buy/Sell• Reporting Delays• Current Yield/Yield To Maturity

Response

• Trade Price

3

Page 5: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Modeling Approach

Page 6: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

The Machine Learning Pipeline

DATAPROCESSING

TRAINING SET

CV/TEST SET

MODELBUILDING

EVALUATE

DEPLOY

Accelerate each stage in the pipeline for maximum performance

4

Page 7: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Data Preprocessing

Exposing Data Parallelism

• Important stage in the pipeline (Garbage In→ Garbage out)• Many models rely on input data being on the same scale• Standardization, log transformations, imputations, polynomial/non-linear featuregeneration, etc.

• Most cases, no data dependence so each operation can be executed independently• Significant speedups can be obtained using GPUs, given sufficientdata/computation

5

Page 8: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Data Preprocessing: Sequential Approach

Apply function F (·) sequentially to each element in a feature column

a0 a1 a2 a3 . . . aN

F (·)

6

Page 9: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Data Preprocessing: Parallel Approach

Apply function F (·) in parallel to each element in a feature column

a0 a1 a2 a3 . . . aN

b0 b1 b2 b3 . . . bN

F (·) F (·) F (·) F (·) F (·)

7

Page 10: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Programming Details

Implementation Basics

• Task is embarrassingly parallel• Improve CPU code performance

• Auto vectorizations + compiler optimizations• Using performance libraries (Intel MKL)• Adopting Threaded (OpenMP)/Distributed computing (MPI) approaches

• Great application case for GPUs• Offload computations onto the GPU via CUDA kernels• Launch as many threads as there are data elements• Launch several kernels concurrently using CUDA streams

8

Page 11: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Toy Example: Speedup Over Sequential C++

• Log transformation of an array of floats• N = 2p, Number of elements, p = log2(N)

18 19 20 21 22 23p

0

2

4

6

8

10

Sp

eedu

pO

ver

Seq

uent

ial

C+

+

Vectorized C++

CUDA

9

Page 12: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Bond Dataset Preprocessing

Applied Transformations

• Log transformation of highly skewed features (Trade Size, Time to Maturity)• Standardization (Trade Price & historical prices)• Missing value imputation• Winsorizing features to handle outliers• Feature generation (Price differences, Yield measurements)

Implementation Details

• CPU: C++ implementation using Intel MKL/Armadillo• GPU: CUDA

10

Page 13: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

GPU Speedup over CPU implementation

• Nearly 10x speedup obtained after CUDA optimizations

20 21 22 23 24 25p

0

2

4

6

8

10

Sp

eedu

pov

erC

PU

Unoptimized CUDA

Optimized CUDA

11

Page 14: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

CUDA Optimizations

Standard Tricks

• Concurrent kernel executions of kernels using CUDA streams to maximizing GPUutilization

• Use of optimized libraries such as cuBLAS/Thrust• Coalesced memory access• Maximizing memory bandwidth for low arithmetic intensive operations• Caching using GPU shared memory

12

Page 15: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Model Building

Page 16: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Ensemble Model

Model Choices

• GBT: XGBoost, DNN: Tensorflow/Keras

ENSEMBLEMODEL

GBT

MODELSDNN

13

Page 17: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Hyperparameter Tuning: Hyperopt

GBT: XGBoost

• Learning Rate• Max depth• Minimum child weight• Subsample, Colsample-bytree• Regularization parameters

DNN: MLPs

• Learning Rate/Decay Rate• Batch Size• Epochs• Hidden layers/Layer width• Activations/Dropouts

14

Page 18: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Hyperparameters Tuning: Hyperopt

0 200 400 600 800 1000Iterations

0.0

0.2

0.4

0.6

0.8

1.0

Lea

rnin

gR

ate

15

Page 19: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

XGBoost: Training & Hyperparameter Optimization Time

0 2 4 6 8Avg. Training Time (H)

GPU

CPU

GBT, Speedup ≈ 3x

Intel(R) Xeon(R) E5-2699, 32 cores

GTX 1080 Ti

16

Page 20: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

TensorFlow/Keras Time Per Epoch

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Time Per Epoch (s)

15

16

17

18

p Speedup ≈ 3 x

GTX 1080 Ti

Intel(R) Xeon(R) E5-2699, 32 cores

17

Page 21: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Model Test Set Performance

20 40 60 80 100 120 140 160Prediction

20

40

60

80

100

120

140

160

Val

id

TEST SET R2 : 0.9858

18

Page 22: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Summary

Page 23: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Summary

Final Remarks

• Leveraging the GPU computation power→ dramatic speedups• Maximum performance when GPUs incorporated into every stage of the pipeline• Ensembles: Bagging/Boosting to improve model accuracy/throughput• Shorter training times allows more experimentation• Extensive support available• Deploy this pipeline now in our in-house DGX-1

19

Page 24: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance

Questions?