BoostingLyle UngarLearning objectives
Review stagewise regressionKnow adaboost and gradient boosting algorithms
Stagewise Regressionu Sequentially learn the weights at
l Never readjust previously learned weightsl h(x) =
h(x) = 0 or average(y)For t =1:T
rt = y - h(x)regress rt = at h(x) to find at
h(x) = h(x) + at ht(x)
Boostingu Ensemble method
l Weighted combination of weak learners ht(x)
u Learned stagewisel At each stage, boosting gives more weight to what it got
wrong before
Adaboost
Where at is the log-odds of the weighted probability of the prediction being wrong
Adaboost exampleu https://alliance.seas.upenn.edu/~cis520/dynami
c/2019/wiki/index.php?n=Lectures.Boosting
Adaboost minimizes exponential loss
And it learns it exponentially fast
Average Error Exponential in stages T and the accuracy of the weak learner g
Gradient Tree Boostingu Current state-of-the-art for moderate-sized data sets
l i.e. on average very slightly better than random forests when you don’t have enough data to do deep learning
Gradient Boostingu Model: F(x) = Si gi hi(x) + const
u Loss function: L(y,F(x))l L2 or logistic or …
u Base learner: hi(x)l Decision tree of specified depth
u Optionally subsample featuresl “stochastic gradient boosting”
u Do stagewise estimation of F(x)l Estimate hi(x) and gi at each iteration i
https://en.wikipedia.org/wiki/Gradient_boosting
For squared error, this is just the standard residual
Gradient Tree Boostingfor Regression
u Loss function: L2
u Base learners hi(x)l Fixed depth regression tree fit on pseudo-residuall Gives a constant prediction for each leaf of the tree
u Stagewise: find weights on each hi(x)l Fancy version: fit different weights for each leaf of tree
Regularization helps
http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html
Subsample = stochastic gradient boosting
Learning rate = shrinkage on g
What you should knowu Boosting
l Stagewise regression upweighting previous errorsl Gives highly accurate ensemble modelsl Relatively fastl Tends not to overfit (but still: use early stopping!)
u Gradient Tree Boostingl Uses pseudo-residualsl Very accurate!!!
Top Related