Download - boosting - Penn Engineeringcis520/lectures/12_boosting.pdf · Gradient Tree Boosting for Regression uLoss function: L 2 uBase learners h i(x) lFixed depth regression tree fit on pseudo-residual

BoostingLyle UngarLearning objectives

Review stagewise regressionKnow adaboost and gradient boosting algorithms

Stagewise Regressionu Sequentially learn the weights at

l Never readjust previously learned weightsl h(x) =

h(x) = 0 or average(y)For t =1:T

rt = y - h(x)regress rt = at h(x) to find at

h(x) = h(x) + at ht(x)

Boostingu Ensemble method

l Weighted combination of weak learners ht(x)

u Learned stagewisel At each stage, boosting gives more weight to what it got

wrong before

Adaboost

Where at is the log-odds of the weighted probability of the prediction being wrong

Adaboost exampleu https://alliance.seas.upenn.edu/~cis520/dynami

c/2019/wiki/index.php?n=Lectures.Boosting

https://alliance.seas.upenn.edu/~cis520/dynamic/2019/wiki/index.php?n=Lectures.Boosting

Adaboost minimizes exponential loss

And it learns it exponentially fast

Average Error Exponential in stages T and the accuracy of the weak learner g

Gradient Tree Boostingu Current state-of-the-art for moderate-sized data sets

l i.e. on average very slightly better than random forests when you don’t have enough data to do deep learning

Gradient Boostingu Model: F(x) = Si gi hi(x) + const

u Loss function: L(y,F(x))l L2 or logistic or …

u Base learner: hi(x)l Decision tree of specified depth

u Optionally subsample featuresl “stochastic gradient boosting”

u Do stagewise estimation of F(x)l Estimate hi(x) and gi at each iteration i

https://en.wikipedia.org/wiki/Gradient_boosting

For squared error, this is just the standard residual

Gradient Tree Boostingfor Regression

u Loss function: L2

u Base learners hi(x)l Fixed depth regression tree fit on pseudo-residuall Gives a constant prediction for each leaf of the tree

u Stagewise: find weights on each hi(x)l Fancy version: fit different weights for each leaf of tree

Regularization helps

http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html

Subsample = stochastic gradient boosting

Learning rate = shrinkage on g

What you should knowu Boosting

l Stagewise regression upweighting previous errorsl Gives highly accurate ensemble modelsl Relatively fastl Tends not to overfit (but still: use early stopping!)

u Gradient Tree Boostingl Uses pseudo-residualsl Very accurate!!!