XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles...

19
XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR ETH ZÜRICH

Transcript of XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles...

Page 1: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016)

NATALLIE BAIKEVICH

HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

ETH ZÜRICH

Page 2: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

MOTIVATION

ü  Effective statistical models

ü  Scalable system ü  Successful

real-world applications

XGBoost eXtreme Gradient Boosting

Page 3: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

BIAS-VARIANCE TRADEOFF

Random Forest Variance ↓

Boosting Bias ↓

Voting

+ +

Page 4: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

A BIT OF HISTORY AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001

Page 5: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

A BIT OF HISTORY

Page 6: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

1st Kaggle success: Higgs Boson Challenge 17/29 winning solutions in 2015

A BIT OF HISTORY

Page 7: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

WHY DOES XGBOOST WIN "EVERY" MACHINE LEARNING COMPETITION? - (MASTER THESIS, D. NIELSEN, 2016)

Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

Page 8: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

TREE ENSEMBLE

Page 9: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

REGULARIZED LEARNING OBJECTIVE

L = l(yi, yi )i∑ + Ω( fk )

k∑

Ω( f ) = γT + 12λ w 2

Source: http://xgboost.readthedocs.io/en/latest/model.html

yi = fk (xi )k=1

K

loss regularization

# of leaves

Page 10: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

SCORE CALCULATION

1st order gradient 2nd order gradient

Statistics for each leaf

Score

Page 11: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

ALGORITHM FEATURES

ü  Regularized objective ü  Shrinkage and column subsampling ü  Split finding: exact & approximate,

global & local ü  Weighted quantile sketch ü  Sparsity-awareness

Page 12: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

SYSTEM DESIGN: BLOCK STRUCTURE

O(Kd x0logn) O(Kd x

0+ x

0logB)

Blocks can be ü Distributed across machines ü  Stored on disk in out-of-core setting

Sorted structure –> linear scan

# trees

Max depth

# non-missing entries

Page 13: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

SYSTEM DESIGN: CACHE-AWARE ACCESS

Improved split finding

ü Allocate internal buffer ü Prefetch gradient statistics

Non-continuous memory access

Datasets: Larger vs Smaller

Page 14: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

SYSTEM DESIGN: BLOCK STRUCTURE

Compression by columns (CSC): Decompression vs Disk Reading

Block sharding: Use multiple disks

Too large blocks, cache misses

Too small, inefficient parallelization

Prefetch in independent thread

Page 15: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

EVALUATION

AWS c3.8xlarge machine: 32 virtual cores, 2x320GB SSD, 60 GB RAM

32 m3.2xlarge machines, each: 8 virtual cores, 2x80GB SSD, 30GB RAM

Page 16: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

DATASETS Dataset n m Task Allstate 10M 4227 Insurance claim classification Higgs Boson 10M 28 Event classification Yahoo LTRC 473K 700 Learning to rank Criteo 1.7B 67 Click through rate prediction

Page 17: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

WHAT’S NEXT?

Model Extensions DART (+ Dropouts) LinXGBoost

Parallel Processing GPU FPGA

Tuning Hyperparameter optimization

More Applications

XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression

Page 18: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

+ Nicely structured paper, easily comprehensible + Real framework, widely used for many ML problems + Combination of improvements both on model and implementation sides to achieve scalability + Reference point for further research in tree boosting, - The concepts are not that novel themselves - Does not explain why some of the models are not compared in all experiments - Is the compression efficient for dense datasets? - What if there’s a lot of columns rather than rows (e.g. medical data)?

QUICK OVERVIEW

Page 19: XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression + Nicely structured paper, easily comprehensible

THANK YOU!