Introduction to XGboost

30
Introduction to XGBOOST School of Computer Sicience and Shuai Zhang, UNSW

Transcript of Introduction to XGboost

Page 1: Introduction to XGboost

Introduction to XGBOOST

School of Computer Sicience and Engineering

Shuai Zhang, UNSW

Page 2: Introduction to XGboost

1. Introduction

2. Boosted Tree

3. Tree Ensemble

4. Additive Training

5. Split Algorithm School of Computer Sicience and Engineering

Page 3: Introduction to XGboost

1 Introduction• What Xgboost can do ?

School of Computer Sicience and Engineering

Binary Classification

Multiclass Classification

Regression Learning to Rank

By 02. March.2017

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library

Support Language• Python• R• Java• Scala• C++ and more

Support Platform• Runs on single machine, • Hadoop• Spark• Flink• DataFlow

Page 4: Introduction to XGboost

2 Boosted Tree• Variants:• GBDT: gradient boosted decision tree• GBRT: gradient boosted regression tree• MART: Multiple Additive Regression Trees• LambdaMART, for ranking task• ...

School of Computer Sicience and Engineering

Page 5: Introduction to XGboost

2.1 CART• CART: Classification and Regression Tree

• Classification• Three Classes• Two Variables

School of Computer Sicience and Engineering

Page 6: Introduction to XGboost

2.1 CARTPrediction• predicting price of 1993-model cars.• standardized (zero mean,unit variance)

School of Computer Sicience and Engineering

partition

Page 7: Introduction to XGboost

2.1 CART• Information Gain • Gain Ratio• Gini Index

• Pruning: prevent overfitting

School of Computer Sicience and Engineering

Which variable to use for division

Page 8: Introduction to XGboost

2.2 CART• Input: Age, gender, occupation • Goal: Does the person like computer games

School of Computer Sicience and Engineering

Page 9: Introduction to XGboost

3 Tree Ensemble• What is Tree Ensemble ?• Single Tree is not powerful enough

• Benifts of Tree Ensemble ?• Very widely used• Invariant to scaling of inputs• Learn higher order interaction between features• Scalable

School of Computer Sicience and Engineering

Boosted Tree

Random Forest

Tree Ensemble

Page 10: Introduction to XGboost

3 Tree Ensemble

School of Computer Sicience and Engineering

Prediction of is sum of scores predicted by each of the tree

Page 11: Introduction to XGboost

3 Tree Ensemble-Elements of Supervised Learning • Linear model

School of Computer Sicience and Engineering

Optimizing training loss encourages predictive models

Opyimizing regularization encourages simple models

Page 12: Introduction to XGboost

3 Tree Ensemble • Assuming we have k trees

School of Computer Sicience and Engineering

• Parameters• Including structure of each tree, and the score in the leaf• Or simply use function as parameters

• Instead learning weights in R^d, we are learning functions ( trees)

Page 13: Introduction to XGboost

3 Tree Ensemble • How can we learn functions?

School of Computer Sicience and Engineering

The height in each segment

Splitting positions

• Training loss: How will the function fit on the points?

• Regularization: How do we define complexity of the function?

Page 14: Introduction to XGboost

3 Tree Ensemble

School of Computer Sicience and Engineering

Regularization

Number of splitting pointsL2 norm of the leaf weights

Training loss:error =

Page 15: Introduction to XGboost

3 Tree Ensemble • We define tree by a vector of scores in leafs, and a leaf index mapping

function that maps an instance to a leaf

School of Computer Sicience and Engineering

Page 16: Introduction to XGboost

3 Tree Ensemble • Objective:

• Definiation of Complexity

School of Computer Sicience and Engineering

Page 17: Introduction to XGboost

4 Addictive Training (Boosting)• We can not use methods such as SGD, to find f ( since thet are trees,

instead of just numerical vectors)

• Start from constant prediction, add a new function each time.

School of Computer Sicience and Engineering

Page 18: Introduction to XGboost

4 Addictive Training (Boosting) • How do we decide which f to add ?

• The prediction at round t is

• Consider square loss

School of Computer Sicience and Engineering

Page 19: Introduction to XGboost

4 Addictive Training (Boosting) • Taylor expansion of the objective

• Objective after expansion

School of Computer Sicience and Engineering

Page 20: Introduction to XGboost

4 Addictive Training (Boosting) • Our new goal, with constants removed

• Benifits

School of Computer Sicience and Engineering

Page 21: Introduction to XGboost

4 Addictive Training (Boosting) • Define the instance set in leaf j as

• Regroup the objective by each leaf

• This is sum of T independent quadratic functions• Two facts about single variable quadratic function

School of Computer Sicience and Engineering

Page 22: Introduction to XGboost

4 Addictive Training (Boosting) • Let us define

• Results

School of Computer Sicience and Engineering

There can be infinite possible tree structures

Page 23: Introduction to XGboost

4 Addictive Training (Boosting) • Greedy Learning , we grow the tree greedily

School of Computer Sicience and Engineering

Page 24: Introduction to XGboost

5 Spliting algorithm• Efficeint finding of the best split

• What is the gain of a split rule xj < a ? say xj is age

School of Computer Sicience and Engineering

All we need is sume of g and h in each side, and calculate

• Left to right linear scan over sorted instance is enough to decide the best split

Page 25: Introduction to XGboost

5 Spliting algorithm

School of Computer Sicience and Engineering

Page 26: Introduction to XGboost

5 Spliting algorithm

School of Computer Sicience and Engineering

Page 27: Introduction to XGboost

5 Spliting algorithm

School of Computer Sicience and Engineering

Page 28: Introduction to XGboost

References• http://www.52cs.org/?p=429• http://www.stat.cmu.edu/~cshalizi/350-2006/lecture-10.pdf• http://www.sigkdd.org/node/362• http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf• http://www.stat.wisc.edu/~loh/treeprogs/guide/wires11.pdf• https://github.com/dmlc/xgboost/blob/master/demo/README.md• http://datascience.la/xgboost-workshop-and-meetup-talk-with-tianqi-chen/• http://xgboost.readthedocs.io/en/latest/model.html• http://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-

learning/

School of Computer Sicience and Engineering

Page 29: Introduction to XGboost

Suplementary• Tree model, works very well on tabular data, easy to use,

and interpret and control

• It can not extrapolate

• Deep Forest: Towards An Alternative to Deep Neural Networks, Zhi-Hua Zhou, Ji Feng, Nanjing University• Submitted on 28 Feb 2017• Comparable performance and easy to train (less parameters)

School of Computer Sicience and Engineering

Page 30: Introduction to XGboost

Thank you!

School of Computer Sicience and Engineering