GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA...
Transcript of GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA...
![Page 1: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/1.jpg)
GPU Accelerated Machine Learning forBond Price Prediction
Venkat Bala Rafael Nicolas Fermin Cota
![Page 2: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/2.jpg)
Motivation
Primary Goals
• Demonstrate potential benefits of using GPUs over CPUs for machine learning
• Exploit inherent parallelism to improve model performance
• Real world application using a bond trade dataset
1
![Page 3: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/3.jpg)
Highlights
Ensemble
• Bagging: Train independent regressors on equal sized bags of samples• Generally, performance is superior to any single individual regressor• Scalable: Each individual model can be trained independently and in parallel
Hardware Specifications
• CPU: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz• GPU: GeForce GTX 1080 Ti• RAM : 1 TB (DDR4 2400 MHZ)
2
![Page 4: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/4.jpg)
Bond Trade Dataset
Feature Set
• 100+ features per trade• Trade Size/Historical Features• Coupon Rate/Time to Maturity• Bond Rating• Trade Type: Buy/Sell• Reporting Delays• Current Yield/Yield To Maturity
Response
• Trade Price
3
![Page 5: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/5.jpg)
Modeling Approach
![Page 6: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/6.jpg)
The Machine Learning Pipeline
DATAPROCESSING
TRAINING SET
CV/TEST SET
MODELBUILDING
EVALUATE
DEPLOY
Accelerate each stage in the pipeline for maximum performance
4
![Page 7: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/7.jpg)
Data Preprocessing
Exposing Data Parallelism
• Important stage in the pipeline (Garbage In→ Garbage out)• Many models rely on input data being on the same scale• Standardization, log transformations, imputations, polynomial/non-linear featuregeneration, etc.
• Most cases, no data dependence so each operation can be executed independently• Significant speedups can be obtained using GPUs, given sufficientdata/computation
5
![Page 8: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/8.jpg)
Data Preprocessing: Sequential Approach
Apply function F (·) sequentially to each element in a feature column
a0 a1 a2 a3 . . . aN
F (·)
6
![Page 9: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/9.jpg)
Data Preprocessing: Parallel Approach
Apply function F (·) in parallel to each element in a feature column
a0 a1 a2 a3 . . . aN
b0 b1 b2 b3 . . . bN
F (·) F (·) F (·) F (·) F (·)
7
![Page 10: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/10.jpg)
Programming Details
Implementation Basics
• Task is embarrassingly parallel• Improve CPU code performance
• Auto vectorizations + compiler optimizations• Using performance libraries (Intel MKL)• Adopting Threaded (OpenMP)/Distributed computing (MPI) approaches
• Great application case for GPUs• Offload computations onto the GPU via CUDA kernels• Launch as many threads as there are data elements• Launch several kernels concurrently using CUDA streams
8
![Page 11: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/11.jpg)
Toy Example: Speedup Over Sequential C++
• Log transformation of an array of floats• N = 2p, Number of elements, p = log2(N)
18 19 20 21 22 23p
0
2
4
6
8
10
Sp
eedu
pO
ver
Seq
uent
ial
C+
+
Vectorized C++
CUDA
9
![Page 12: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/12.jpg)
Bond Dataset Preprocessing
Applied Transformations
• Log transformation of highly skewed features (Trade Size, Time to Maturity)• Standardization (Trade Price & historical prices)• Missing value imputation• Winsorizing features to handle outliers• Feature generation (Price differences, Yield measurements)
Implementation Details
• CPU: C++ implementation using Intel MKL/Armadillo• GPU: CUDA
10
![Page 13: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/13.jpg)
GPU Speedup over CPU implementation
• Nearly 10x speedup obtained after CUDA optimizations
20 21 22 23 24 25p
0
2
4
6
8
10
Sp
eedu
pov
erC
PU
Unoptimized CUDA
Optimized CUDA
11
![Page 14: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/14.jpg)
CUDA Optimizations
Standard Tricks
• Concurrent kernel executions of kernels using CUDA streams to maximizing GPUutilization
• Use of optimized libraries such as cuBLAS/Thrust• Coalesced memory access• Maximizing memory bandwidth for low arithmetic intensive operations• Caching using GPU shared memory
12
![Page 15: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/15.jpg)
Model Building
![Page 16: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/16.jpg)
Ensemble Model
Model Choices
• GBT: XGBoost, DNN: Tensorflow/Keras
ENSEMBLEMODEL
GBT
MODELSDNN
13
![Page 17: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/17.jpg)
Hyperparameter Tuning: Hyperopt
GBT: XGBoost
• Learning Rate• Max depth• Minimum child weight• Subsample, Colsample-bytree• Regularization parameters
DNN: MLPs
• Learning Rate/Decay Rate• Batch Size• Epochs• Hidden layers/Layer width• Activations/Dropouts
14
![Page 18: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/18.jpg)
Hyperparameters Tuning: Hyperopt
0 200 400 600 800 1000Iterations
0.0
0.2
0.4
0.6
0.8
1.0
Lea
rnin
gR
ate
15
![Page 19: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/19.jpg)
XGBoost: Training & Hyperparameter Optimization Time
0 2 4 6 8Avg. Training Time (H)
GPU
CPU
GBT, Speedup ≈ 3x
Intel(R) Xeon(R) E5-2699, 32 cores
GTX 1080 Ti
16
![Page 20: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/20.jpg)
TensorFlow/Keras Time Per Epoch
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Time Per Epoch (s)
15
16
17
18
p Speedup ≈ 3 x
GTX 1080 Ti
Intel(R) Xeon(R) E5-2699, 32 cores
17
![Page 21: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/21.jpg)
Model Test Set Performance
20 40 60 80 100 120 140 160Prediction
20
40
60
80
100
120
140
160
Val
id
TEST SET R2 : 0.9858
18
![Page 22: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/22.jpg)
Summary
![Page 23: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/23.jpg)
Summary
Final Remarks
• Leveraging the GPU computation power→ dramatic speedups• Maximum performance when GPUs incorporated into every stage of the pipeline• Ensembles: Bagging/Boosting to improve model accuracy/throughput• Shorter training times allows more experimentation• Extensive support available• Deploy this pipeline now in our in-house DGX-1
19
![Page 24: GPU Accelerated Machine Learning for Bond Price Prediction...TheMachineLearningPipeline DATA PROCESSING TRAININGSET CV/TESTSET MODEL BUILDING EVALUATE DEPLOY Accelerateeachstageinthepipelineformaximumperformance](https://reader035.fdocuments.in/reader035/viewer/2022081617/60223a1e85329b187931a7c6/html5/thumbnails/24.jpg)
Questions?