Download - Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Transcript
Page 1: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearRegression&GradientDescent

RobotImageCredit:Viktoriya Sukhanova ©123RF.com

TheseslideswereassembledbyByronBoots,withgratefulacknowledgementtoEricEatonandthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperattribution.

Page 2: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegressionGiven:– Datawhere

– Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1970 1980 1990 2000 2010 2020

Septem

berA

rcticSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.Witt.JournalofStatisticsEducation,Volume21,Number1(2013)

LinearRegressionQuadraticRegression

X =n

x

(1), . . . ,x(n)o

x

(i) 2 Rd

y =n

y(1), . . . , y(n)o

y(i) 2 R

Page 3: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearRegression• Hypothesis:

• Fitmodelbyminimizingsumofsquarederrors

3

x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0 =1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Figures are courtesy ofGregShakhnarovich

Page 4: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LeastSquaresLinearRegression

4

• CostFunction

• Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

min✓

J(✓)

Page 5: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

5SlidebyAndrewNg

Page 6: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

6

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 7: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

7

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 8: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

8

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 9: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

IntuitionBehindCostFunction

9

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 10: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

10

✓ J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 11: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

11

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Page 12: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

BasicSearchProcedure• Chooseinitialvaluefor• Untilwereachaminimum:– Chooseanewvaluefortoreduce

12

J(✓)

q1q0

J(q0,q1)

FigurebyAndrewNg

Sincetheleastsquaresobjectivefunctionisconvex(concave),wedon’tneedtoworryaboutlocalminimainlinearregression

Page 13: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent• Initialize• Repeatuntilconvergence

13

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 14: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent• Initialize• Repeatuntilconvergence

14

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj =0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 15: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescentforLinearRegression

• Initialize• Repeatuntilconvergence

15

simultaneousupdateforj =0...d

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

• Toachievesimultaneousupdate• AtthestartofeachGDiteration,compute• Usethisstoredvalueintheupdatesteploop

h✓

⇣x

(i)⌘

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2 norm:

k✓new

� ✓old

k2 < ✏• Assumeconvergencewhen

Page 16: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

16

(forfixed,thisisafunctionofx) (functionoftheparameters)

h(x)=-900– 0.1x

SlidebyAndrewNg

Page 17: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

17

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 18: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

18

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 19: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

19

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 20: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

20

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 21: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

21

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 22: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

22

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 23: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

23

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 24: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

GradientDescent

24

(forfixed,thisisafunctionofx) (functionoftheparameters)

SlidebyAndrewNg

Page 25: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Choosingα

25

αtoosmall

slowconvergence

αtoolarge

Increasingvaluefor J(✓)

• Mayovershoottheminimum• Mayfailtoconverge• Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteration• Thevalueshoulddecreaseateachiteration• Ifitdoesn’t,adjustα

J(✓)

Page 26: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

ExtendingLinearRegressiontoMoreComplexModels

• TheinputsX forlinearregressioncanbe:– Originalquantitativeinputs– Transformationofquantitativeinputs

• e.g.log,exp,squareroot,square,etc.

– Polynomialtransformation• example:y =b0 +b1×x +b2×x2 +b3×x3

– Basisexpansions– Dummycodingofcategoricalinputs– Interactionsbetweenvariables

• example:x3 =x1 × x2

Thisallowsuseoflinear regressiontechniquestofitnon-linear datasets.

Page 27: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels

• Generally,

• Typically,sothatactsasabias• Inthesimplestcase,weuselinearbasisfunctions:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfunction

BasedonslidebyChristopherBishop(PRML)

Page 28: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels

– Theseareglobal;asmallchangeinx affectsallbasisfunctions

• Polynomialbasisfunctions:

• Gaussianbasisfunctions:

– Thesearelocal;asmallchangeinx onlyaffectnearbybasisfunctions.μj ands controllocationandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 29: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

LinearBasisFunctionModels• Sigmoidal basisfunctions:

where

– Thesearealsolocal;asmallchangeinx onlyaffectsnearbybasisfunctions.μjands controllocationandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 30: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

ExampleofFittingaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 31: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

QualityofFit

Overfitting:• Thelearnedhypothesismayfitthetrainingsetverywell( )

• ...butfailstogeneralizetonewexamples

31

Price

Size

Price

Size

Price

Size

Underfitting(highbias)

Overfitting(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 32: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Regularization• Amethodforautomaticallycontrollingthecomplexityofthelearnedhypothesis

• Idea:penalizeforlargevaluesof– Canincorporateintothecostfunction– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredictingthelabel

• Canalsoaddressoverfitting byeliminatingfeatures(eithermanuallyorviamodelselection)

32

✓j

Page 33: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

Regularization• Linearregressionobjectivefunction

– istheregularizationparameter()– Noregularizationon!

33

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularization

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 34: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

UnderstandingRegularization

• Notethat

– Thisisthemagnitudeofthefeaturecoefficientvector!

• Wecanalsothinkofthisas:

• L2 regularizationpullscoefficientstoward0

34

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 35: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

UnderstandingRegularization

• Whathappensifwesettobehuge(e.g.,1010)?

35

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 36: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegularizedLinearRegression

36

• CostFunction

• Fitbysolving

• Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

regularization

@

@✓jJ(✓)

@

@✓0J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 37: Linear Regression & Gradient Descentbboots3/CS4641-Fall2018/...Linear Regression & Gradient Descent Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by

RegularizedLinearRegression

37

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

• Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � �✓j

✓j ✓j (1� ↵�)� ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j