XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles...

XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016)

NATALLIE BAIKEVICH

HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR

ETH ZÜRICH

MOTIVATION

ü  Effective statistical models

ü  Scalable system ü  Successful

real-world applications

XGBoost eXtreme Gradient Boosting

BIAS-VARIANCE TRADEOFF

Random Forest Variance ↓

Boosting Bias ↓

Voting

A BIT OF HISTORY AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

A BIT OF HISTORY

AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package

1st Kaggle success: Higgs Boson Challenge 17/29 winning solutions in 2015

A BIT OF HISTORY

WHY DOES XGBOOST WIN "EVERY" MACHINE LEARNING COMPETITION? - (MASTER THESIS, D. NIELSEN, 2016)

Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions

TREE ENSEMBLE

REGULARIZED LEARNING OBJECTIVE

L = l(yi, yi )i∑ + Ω( fk )

Ω( f ) = γT + 12λ w 2

Source: http://xgboost.readthedocs.io/en/latest/model.html

yi = fk (xi )k=1

loss regularization

# of leaves

SCORE CALCULATION

1st order gradient 2nd order gradient

Statistics for each leaf

ALGORITHM FEATURES

ü  Regularized objective ü  Shrinkage and column subsampling ü  Split finding: exact & approximate,

global & local ü  Weighted quantile sketch ü  Sparsity-awareness

SYSTEM DESIGN: BLOCK STRUCTURE

O(Kd x0logn) O(Kd x

0logB)

Blocks can be ü Distributed across machines ü  Stored on disk in out-of-core setting

Sorted structure –> linear scan

# trees

Max depth

# non-missing entries

SYSTEM DESIGN: CACHE-AWARE ACCESS

Improved split finding

ü Allocate internal buffer ü Prefetch gradient statistics

Non-continuous memory access

Datasets: Larger vs Smaller

SYSTEM DESIGN: BLOCK STRUCTURE

Compression by columns (CSC): Decompression vs Disk Reading

Block sharding: Use multiple disks

Too large blocks, cache misses

Too small, inefficient parallelization

Prefetch in independent thread

EVALUATION

AWS c3.8xlarge machine: 32 virtual cores, 2x320GB SSD, 60 GB RAM

32 m3.2xlarge machines, each: 8 virtual cores, 2x80GB SSD, 30GB RAM

DATASETS Dataset n m Task Allstate 10M 4227 Insurance claim classification Higgs Boson 10M 28 Event classification Yahoo LTRC 473K 700 Learning to rank Criteo 1.7B 67 Click through rate prediction

WHAT’S NEXT?

Model Extensions DART (+ Dropouts) LinXGBoost

Parallel Processing GPU FPGA

Tuning Hyperparameter optimization

More Applications

XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression

+ Nicely structured paper, easily comprehensible + Real framework, widely used for many ML problems + Combination of improvements both on model and implementation sides to achieve scalability + Reference point for further research in tree boosting, - The concepts are not that novel themselves - Does not explain why some of the models are not compared in all experiments - Is the compression efficient for dense datasets? - What if there’s a lot of columns rather than rows (e.g. medical data)?

QUICK OVERVIEW

THANK YOU!

XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles...

Documents

Transcript of XGBOOST : A SCALABLE TREE BOOSTING SYSTEM · XGBoost Scalability Weighted quantiles...

XGBoost: the algorithm that wins every competition

02 comprehensible input

Making Input Comprehensible for Foreign Language …damascusuniversity.edu.sy/mag/edu/images/stories/399000.pdfMaking Input Comprehensible for Foreign Language ... Making Input Comprehensible

Dive into XGBoost - GitHub Pages · 1.XGBoost 的Github地址 ... Dive into XGBoost Created Date: 10/17/2017 12:15:40 PM ...

Introduction to XGboost

Vaidas Pilkauskas - Comprehensible code

Inference on distributions, quantiles, and quantile ...faculty.missouri.edu/~kaplandm/pdfs/talks/talk_2014_stats.pdfInference on distributions, quantiles, and quantile treatment e

R xgboost-Tutorial

Marginal quantiles for stationary processes

REGRESSION QUANTILES FOR TIME SERIES

A Host-Based Anomaly Detection Framework Using XGBoost …

LinXGBoost: Extension of XGBoost to Generalized Local Linear … · LinXGBoost: Extension of XGBoost to Generalized Local Linear Models Laurent de Vito devitolaurent@gmail.com Abstract

Comprehensible input theory 1

Multivariate -Quantiles: a Spatial Approach

Averaging Quantiles, Variance Shrinkage and Overconfidence

Designing Comprehensible Data Visualisations

contents easily comprehensible English

GNSS-R Soil Moisture Retrieval Based on a XGboost Machine ...202.127.29.4/geodesy/publications/JiaJin_2019RS.pdf · (XGBoost) was proposed in 2015 [42]. In recent years, XGBoost has

Malware classification using XGboost-Gradient Boosted Decision … · 2020. 10. 15. · 536 Malware classification using XGboost-Gradient Boosted Decision Tree Rajesh Kumar *, Geetha

Comprehensible Input