Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent...

LinearRegression&GradientDescent

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withgratefulacknowledgementtoEricEatonandthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

RegressionGiven:– Datawhere

– Correspondinglabelswhere

1970 1980 1990 2000 2010 2020

Septem

rcticSeaIceExtent

(1,000,000sq

DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)

LinearRegressionQuadraticRegression

(1), . . . ,x(n)o

(i) 2 Rd

y(1), . . . , y(n)o

y(i) 2 R

LinearRegression• Hypothesis:

• Fitmodelbyminimizingsumofsquarederrors

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

✓jxj

Assumex0 =1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

✓jxj

Figures are courtesy ofGregShakhnarovich

LeastSquaresLinearRegression

• CostFunction

• Fitbysolving

J(✓) =1

⇣h✓

(i)⌘� y(i)

min✓

J(✓)

IntuitionBehindCostFunction

5SlidebyAndrewNg

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

✓ J(✓)

J(q0,q1)

FigurebyAndrewNg

J(✓)

J(q0,q1)

FigurebyAndrewNg

J(✓)

J(q0,q1)

FigurebyAndrewNg

Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminimainlinearregression

GradientDescent• Initialize• Repeatuntilconvergence

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

learningrate(small)e.g.,α=0.05

J(✓)

-0.5 0 0.5 1 1.5 2 2.5

GradientDescent• Initialize• Repeatuntilconvergence

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

⇣h✓

(i)⌘� y

(i)⌘2

✓kx(i)k � y

!⇥ @

✓kx(i)k � y

⇣h✓

(i)⌘� y

(i)⌘x

GradientDescentforLinearRegression

• Initialize• Repeatuntilconvergence

simultaneousupdateforj =0...d

✓j ✓j � ↵

⇣h✓

(i)⌘� y

(i)⌘x

• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop

(i)⌘

kvk2 =

v2i =q

v21 + v22 + . . .+ v2|v|L2 norm:

k✓new

� ✓old

k2 < ✏• Assumeconvergencewhen

GradientDescent

h(x)=-900– 0.1x

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

GradientDescent

SlidebyAndrewNg

Choosingα

αtoosmall

slowconvergence

αtoolarge

Increasingvaluefor J(✓)

• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα

J(✓)

ExtendingLinearRegressiontoMoreComplexModels

• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs

• e.g.log,exp,squareroot,square,etc.

– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3

– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables

• example:x3 =x1 × x2

Thisallowsuseoflinear regressiontechniquestofitnon-linear datasets.

LinearBasisFunctionModels

• Generally,

• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:

h✓(x) =dX

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfunction

BasedonslidebyChristopherBishop(PRML)

LinearBasisFunctionModels

– Theseareglobal;asmallchangeinx affectsallbasisfunctions

• Polynomialbasisfunctions:

• Gaussianbasisfunctions:

– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).

LinearBasisFunctionModels• Sigmoidal basisfunctions:

– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).

ExampleofFittingaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

✓jxj

QualityofFit

Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )

• ...butfailstogeneralizetonewexamples

Underfitting(highbias)

Overfitting(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis

• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel

• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)

Regularization• Linearregressionobjectivefunction

– istheregularizationparameter()– Noregularizationon!

J(✓) =1

⇣h✓

(i)⌘� y(i)

⌘2+ �

modelfittodata regularization

� � � 0

J(✓) =1

⇣h✓

(i)⌘� y(i)

UnderstandingRegularization

• Notethat

– Thisisthemagnitudeofthefeaturecoefficientvector!

• Wecanalsothinkofthisas:

• L2 regularizationpullscoefficientstoward0

✓2j = k✓1:dk22

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

⇣h✓

(i)⌘� y(i)

UnderstandingRegularization

• Whathappensifwesettobehuge(e.g.,1010)?

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

⇣h✓

(i)⌘� y(i)

RegularizedLinearRegression

• CostFunction

• Fitbysolving

• Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

⇣h✓

(i)⌘� y

(i)⌘x

✓0 ✓0 � ↵1

⇣h✓

(i)⌘� y(i)

regularization

@✓jJ(✓)

@✓0J(✓)

✓j ✓j � ↵

⇣h✓

(i)⌘� y

(i)⌘x

(i)j � �✓j

J(✓) =1

⇣h✓

(i)⌘� y(i)

RegularizedLinearRegression

✓0 ✓0 � ↵1

⇣h✓

(i)⌘� y(i)

• Wecanrewritethegradientstepas:

J(✓) =1

⇣h✓

(i)⌘� y(i)

✓j ✓j � ↵

⇣h✓

(i)⌘� y

(i)⌘x

(i)j � �✓j

✓j ✓j (1� ↵�)� ↵

⇣h✓

(i)⌘� y

(i)⌘x

Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent...

Documents

Transcript of Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent...

Efficient Logistic Regression with Stochastic Gradient Descent

Classification - Regression · 2016. 10. 17. · Linear regression Solution { Gradient descent Solution - Normal equation R code More Regression problem Regression problem: predict

Scaling Gaussian Process Regression with Derivativeserichanslee/gp2018poster.pdfI Gradient information is valuable for GP regression, but scalability is a problem. I Our methods make

Neural Networks. - cvut.cz · Relations to neural networks Intro •Notation •Multiple regression •Logistic regression •Question •Gradient descent •Ex: Grad. for MR •Ex:

Regression and Gradient Descent - GitHub Pages · 2020. 12. 20. · Linear regression with Batch Gradient Descent Repeat { (for every ) } Learning rate is typically held constant.

Machine Learning: Logistic Regression Lecture 04oucsace.cs.ohiou.edu/~razvan/courses/ml4900/lecture04a.pdf · Machine Learning: Logistic Regression Lecture 04. ... gradient descent

Gradient Boosted Regression Trees in scikit-learn

Linear Regression and Gradient Descent

Programming Exercise 1: Linear Regression - Jingwei Zhujingweizhu.weebly.com/uploads/1/3/5/4/13548262/ex1.pdf · Programming Exercise 1: Linear Regression ... Function to run gradient

Machine Learning: Linear Regression - Jarrar€¦ · qPart 1: Motivation (Regression Problems) qPart 2: Linear Regression qPart 3: The Cost Function qPart 4: The Gradient Descent

Linear Regression & Gradient Descent

Intro Logistic+Regression Gradient+Descent+++SGD...9 SGD:+Stochastic+Gradient+Ascent+(or+Descent) • “True”gradient: • Samplebasedapproximation: • Whatifweestimategradientwithjustonesample???

Intro Logistic Regression Gradient Descent + SGDIntro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

INTEGRATING GRADIENT SEARCH, LOGISTIC REGRESSION AND ...

Efficient Logistic Regression with Stochastic Gradient Descent

Efficient Logistic Regression with Stochastic Gradient Descent – part 2 William Cohen.

Getting to the Bottom of Regression with Gradient Descent...Getting to the Bottom of Regression with Gradient Descent Jocelyn T. Chi and Eric C. Chi January 10, 2014 This article on

GRADIENT BASED SMOOTHING PARAMETER SELECTION FOR NONPARAMETRIC REGRESSION … · 2013-08-19 · Our problem, which we term gradient based cross-validation (GBCV), poses more di culty

Image deblurring with matrix regression and gradient evolution

Package ‘gradDescent’ · Description An implementation of various learning algorithms based on Gradient Descent for deal-ing with regression tasks. The variants of gradient descent