Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures ›...

61
Linear Regression Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper aHribuIon. Please send comments and correcIons to Eric.

Transcript of Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures ›...

Page 1: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearRegression

RobotImageCredit:ViktoriyaSukhanova©123RF.com

TheseslideswereassembledbyEricEaton,withgratefulacknowledgementofthemanyotherswhomadetheircoursematerialsfreelyavailableonline.Feelfreetoreuseoradapttheseslidesforyourownacademicpurposes,providedthatyouincludeproperaHribuIon.PleasesendcommentsandcorrecIonstoEric.

Page 2: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegressionGiven:–  Datawhere

–  Correspondinglabelswhere

2

0

1

2

3

4

5

6

7

8

9

1975 1980 1985 1990 1995 2000 2005 2010 2015

Septem

berA

rc+cSeaIceExtent

(1,000,000sq

km)

Year

DatafromG.WiH.JournalofStaIsIcsEducaIon,Volume21,Number1(2013)

LinearRegressionQuadraIcRegression

X =n

x

(1), . . . ,x(n)o

x

(i) 2 Rd

y =n

y(1), . . . , y(n)o

y(i) 2 R

Page 3: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

•  97samples,parIIonedinto67train/30test•  Eightpredictors(features):

–  6conInuous(4logtransforms),1binary,1ordinal•  ConInuousoutcomevariable:

–  lpsa:log(prostatespecificanIgenlevel)

ProstateCancerDataset

BasedonslidebyJeffHowbert

Page 4: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearRegression•  Hypothesis:

•  Fitmodelbyminimizingsumofsquarederrors

5

x x

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

Assumex0=1

y = ✓0 + ✓1x1 + ✓2x2 + . . .+ ✓dxd =dX

j=0

✓jxj

FiguresarecourtesyofGregShakhnarovich

Page 5: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LeastSquaresLinearRegression

6

•  CostFuncIon

•  Fitbysolving

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

min✓

J(✓)

Page 6: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

7

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 7: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

8

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

BasedonexamplebyAndrewNg

Page 8: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

9

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0.5]) =1

2⇥ 3

⇥(0.5� 1)2 + (1� 2)2 + (1.5� 3)2

⇤⇡ 0.58Basedonexample

byAndrewNg

Page 9: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

10

0

1

2

3

0 1 2 3

y

x

(forfixed,thisisafuncIonofx) (funcIonoftheparameter)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

ForinsightonJ(),let’sassumesox 2 R ✓ = [✓0, ✓1]

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

J([0, 0]) ⇡ 2.333

BasedonexamplebyAndrewNg

J()isconcave

Page 10: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

11SlidebyAndrewNg

Page 11: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

12

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 12: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

13

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 13: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

14

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 14: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

IntuiIonBehindCostFuncIon

15

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 15: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

16

✓ J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 16: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

17

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

Page 17: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

BasicSearchProcedure•  ChooseiniIalvaluefor•  UnIlwereachaminimum:–  Chooseanewvaluefortoreduce

18

J(✓)

�1�0

J(�0,�1)

FigurebyAndrewNg

SincetheleastsquaresobjecIvefuncIonisconvex(concave),wedon’tneedtoworryaboutlocalminima

Page 18: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent•  IniIalize•  RepeatunIlconvergence

19

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

learningrate(small)e.g.,α=0.05

J(✓)

0

1

2

3

-0.5 0 0.5 1 1.5 2 2.5

Page 19: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent•  IniIalize•  RepeatunIlconvergence

20

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 20: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent•  IniIalize•  RepeatunIlconvergence

21

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 21: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent•  IniIalize•  RepeatunIlconvergence

22

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 22: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent•  IniIalize•  RepeatunIlconvergence

23

✓j ✓j � ↵@

@✓jJ(✓) simultaneousupdate

forj=0...d

ForLinearRegression:@

@✓jJ(✓) =

@

@✓j

1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘2

=@

@✓j

1

2n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!2

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!⇥ @

@✓j

dX

k=0

✓kx(i)k � y

(i)

!

=1

n

nX

i=1

dX

k=0

✓kx(i)k � y

(i)

!x

(i)j

=1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

Page 23: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescentforLinearRegression

•  IniIalize•  RepeatunIlconvergence

24

simultaneousupdateforj=0...d

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

•  Toachievesimultaneousupdate•  AtthestartofeachGDiteraIon,compute•  Usethisstoredvalueintheupdatesteploop

h✓

⇣x

(i)⌘

kvk2 =

sX

i

v2i =q

v21 + v22 + . . .+ v2|v|L2norm:

k✓new

� ✓old

k2 < ✏•  Assumeconvergencewhen

Page 24: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

25

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

h(x)=-900–0.1x

SlidebyAndrewNg

Page 25: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

26

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 26: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

27

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 27: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

28

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 28: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

29

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 29: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

30

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 30: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

31

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 31: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

32

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 32: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescent

33

(forfixed,thisisafuncIonofx) (funcIonoftheparameters)

SlidebyAndrewNg

Page 33: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

Choosingα

34

αtoosmall

slowconvergence

αtoolarge

IncreasingvalueforJ(✓)

•  Mayovershoottheminimum•  Mayfailtoconverge•  Mayevendiverge

Toseeifgradientdescentisworking,printouteachiteraIon•  ThevalueshoulddecreaseateachiteraIon•  Ifitdoesn’t,adjustα

J(✓)

Page 34: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

ExtendingLinearRegressiontoMoreComplexModels

•  TheinputsXforlinearregressioncanbe:–  OriginalquanItaIveinputs–  TransformaIonofquanItaIveinputs

•  e.g.log,exp,squareroot,square,etc.–  PolynomialtransformaIon

•  example:y=�0+�1�x+�2�x2+�3�x3–  Basisexpansions–  Dummycodingofcategoricalinputs–  InteracIonsbetweenvariables

•  example:x3=x1�x2

Thisallowsuseoflinearregressiontechniquestofitnon-lineardatasets.

Page 35: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearBasisFuncIonModels

•  Generally,

•  Typically,sothatactsasabias•  Inthesimplestcase,weuselinearbasisfuncIons:

h✓(x) =dX

j=0

✓j�j(x)

�0(x) = 1 ✓0

�j(x) = xj

basisfuncIon

BasedonslidebyChristopherBishop(PRML)

Page 36: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearBasisFuncIonModels

–  Theseareglobal;asmallchangeinxaffectsallbasisfuncIons

•  PolynomialbasisfuncIons:

•  GaussianbasisfuncIons:

–  Thesearelocal;asmallchangeinxonlyaffectnearbybasisfuncIons.μjandscontrollocaIonandscale(width).

BasedonslidebyChristopherBishop(PRML)

Page 37: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearBasisFuncIonModels•  SigmoidalbasisfuncIons:

where

–  Thesearealsolocal;asmallchangeinxonlyaffectsnearbybasisfuncIons.μjandscontrollocaIonandscale(slope).

BasedonslidebyChristopherBishop(PRML)

Page 38: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

ExampleofFinngaPolynomialCurvewithaLinearModel

y = ✓0 + ✓1x+ ✓2x2 + . . .+ ✓px

p =pX

j=0

✓jxj

Page 39: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearBasisFuncIonModels

•  BasicLinearModel:

•  GeneralizedLinearModel:•  OncewehavereplacedthedatabytheoutputsofthebasisfuncIons,finngthegeneralizedmodelisexactlythesameproblemasfinngthebasicmodel–  Unlessweusethekerneltrick–moreonthatwhenwecoversupportvectormachines

–  Therefore,thereisnopointincluHeringthemathwithbasisfuncIons

40

h✓(x) =dX

j=0

✓j�j(x)

h✓(x) =dX

j=0

✓jxj

BasedonslidebyGeoffHinton

Page 40: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

LinearAlgebraConcepts•  Vectorinisanorderedsetofdrealnumbers

–  e.g.,v=[1,6,3,4]isin–  “[1,6,3,4]” isacolumnvector:–  asopposedtoarowvector:

•  Anm-by-nmatrixisanobjectwithmrowsandncolumns,whereeachentryisarealnumber:

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

4361

( )4361

⎟⎟⎟

⎜⎜⎜

2396784821

Rd

R4

BasedonslidesbyJosephBradley

Page 41: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

•  Transpose:reflectvector/matrixonline:

( )baba T

=⎟⎟⎠

⎞⎜⎜⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛dbca

dcba T

–  Note:(Ax)T=xTAT(We’lldefinemulIplicaIonsoon…)

•  Vectornorms:–  Lpnormofv=(v1,…,vk)is–  Commonnorms:L1,L2–  Linfinity=maxi|vi|

•  LengthofavectorvisL2(v)

X

i

|vi|p! 1

p

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 42: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

•  Vectordotproduct:

–  Note:dotproductofuwithitself=length(u)2=

•  Matrixproduct:

( ) ( ) 22112121 vuvuvvuuvu +=•=•

⎟⎟⎠

⎞⎜⎜⎝

⎛++++

=

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2222122121221121

2212121121121111

2221

1211

2221

1211 ,

babababababababa

AB

bbbb

Baaaa

A

kuk22

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 43: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

•  Vectorproducts:–  Dotproduct:

–  Outerproduct:

( ) 22112

121 vuvuvv

uuvuvu T +=⎟⎟⎠

⎞⎜⎜⎝

⎛==•

( ) ⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛=

2212

211121

2

1

vuvuvuvu

vvuu

uvT

BasedonslidesbyJosephBradley

LinearAlgebraConcepts

Page 44: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

h(x) = ✓

|x

x

| =⇥1 x1 . . . xd

VectorizaIon•  BenefitsofvectorizaIon– MorecompactequaIons–  Fastercode(usingopImizedmatrixlibraries)

•  Considerourmodel:•  Let

•  Canwritethemodelinvectorizedformas45

h(x) =dX

j=0

✓jxj

✓ =

2

6664

✓0✓1...✓d

3

7775

Page 45: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

VectorizaIon•  Considerourmodelforninstances:•  Let

•  Canwritethemodelinvectorizedformas46

h✓(x) = X✓

X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ =

2

6664

✓0✓1...✓d

3

7775

h

⇣x

(i)⌘=

dX

j=0

✓jx(i)j

R(d+1)⇥1 Rn⇥(d+1)

Page 46: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

J(✓) =1

2n

nX

i=1

⇣✓

|x

(i) � y(i)⌘2

VectorizaIon•  ForthelinearregressioncostfuncIon:

47

J(✓) =1

2n(X✓ � y)| (X✓ � y)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2

Rn⇥(d+1)

R(d+1)⇥1

Rn⇥1R1⇥n

Let:

y =

2

6664

y(1)

y(2)

...y(n)

3

7775

Page 47: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

ClosedFormSoluIon:

ClosedFormSoluIon•  InsteadofusingGD,solveforopImal analyIcally–  NoIcethatthesoluIoniswhen

•  DerivaIon:

TakederivaIveandsetequalto0,thensolvefor:

48

✓@

@✓J(✓) = 0

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

1x1J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

J (✓) =1

2n(X✓ � y)| (X✓ � y)

/ ✓|X|X✓ � y|X✓ � ✓|X|y + y|y/ ✓|X|X✓ � 2✓|X|y + y|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

✓@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

Page 48: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

ClosedFormSoluIon•  CanobtainbysimplypluggingXand into

•  IfXTXisnotinverIble(i.e.,singular),mayneedto:–  Usepseudo-inverseinsteadoftheinverse

•  Inpython,numpy.linalg.pinv(a) –  Removeredundant(notlinearlyindependent)features–  Removeextrafeaturestoensurethatd≤n

49

@

@✓(✓|X|X✓ � 2✓|X|y + y|y) = 0

(X|X)✓ �X|y = 0

(X|X)✓ = X|y

✓ = (X|X)�1X|y

y =

2

6664

y(1)

y(2)

...y(n)

3

7775X =

2

66666664

1 x

(1)1 . . . x

(1)d

......

. . ....

1 x

(i)1 . . . x

(i)d

......

. . ....

1 x

(n)1 . . . x

(n)d

3

77777775

✓ y

Page 49: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

GradientDescentvsClosedForm

GradientDescentClosedFormSolu+on

50

•  RequiresmulIpleiteraIons•  Needtochooseα•  Workswellwhennislarge•  Cansupportincremental

learning

•  Non-iteraIve•  Noneedforα•  Slowifnislarge

–CompuIng(XTX)-1isroughlyO(n3)

Page 50: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

ImprovingLearning:FeatureScaling

•  Idea:Ensurethatfeaturehavesimilarscales

•  Makesgradientdescentconvergemuchfaster

51

0

5

10

15

20

0 5 10 15 20✓1

✓2

BeforeFeatureScaling

0

5

10

15

20

0 5 10 15 20✓1

✓2

AverFeatureScaling

Page 51: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

FeatureStandardizaIon•  Rescalesfeaturestohavezeromeanandunitvariance

– Letμjbethemeanoffeaturej:

– Replaceeachvaluewith:

•  sjisthestandarddeviaIonoffeaturej •  Couldalsousetherangeoffeaturej (maxj–minj)forsj

•  MustapplythesametransformaIontoinstancesforbothtrainingandpredicIon

•  Outlierscancauseproblems

52

µj =1

n

nX

i=1

x

(i)j

x

(i)j

x

(i)j � µj

sj

forj=1...d(notx0!)

Page 52: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

QualityofFit

OverfiHng:•  Thelearnedhypothesismayfitthetrainingsetverywell()

•  ...butfailstogeneralizetonewexamples

53

Price

Size

Price

Size

Price

Size

Underfinng(highbias)

Overfinng(highvariance)

Correctfit

J(✓) ⇡ 0

BasedonexamplebyAndrewNg

Page 53: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizaIon•  AmethodforautomaIcallycontrollingthecomplexityofthelearnedhypothesis

•  Idea:penalizeforlargevaluesof–  CanincorporateintothecostfuncIon– Workswellwhenwehavealotoffeatures,eachthatcontributesabittopredicIngthelabel

•  CanalsoaddressoverfinngbyeliminaIngfeatures(eithermanuallyorviamodelselecIon)

54

✓j

Page 54: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizaIon•  LinearregressionobjecIvefuncIon

–  istheregularizaIonparameter()– NoregularizaIonon!

55

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+ �

dX

j=1

✓2j

modelfittodata regularizaIon

✓0

� � � 0

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 55: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

UnderstandingRegularizaIon

•  Notethat

–  Thisisthemagnitudeofthefeaturecoefficientvector!

•  Wecanalsothinkofthisas:

•  L2regularizaIonpullscoefficientstoward0

56

dX

j=1

✓2j = k✓1:dk22

dX

j=1

(✓j � 0)2 = k✓1:d � ~0k22

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 56: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

57

�Price

Size

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 57: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

UnderstandingRegularizaIon

•  Whathappensifwesettobehuge(e.g.,1010)?

58

�Price

Size0 0 0 0

BasedonexamplebyAndrewNg

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

Page 58: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizedLinearRegression

59

•  CostFuncIon

•  Fitbysolving

•  Gradientupdate:

min✓

J(✓)

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

regularizaIon

@

@✓jJ(✓)

@

@✓0J(✓)

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

� ↵�✓j

Page 59: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizedLinearRegression

60

✓0 ✓0 � ↵1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

•  Wecanrewritethegradientstepas:

J(✓) =1

2n

nX

i=1

⇣h✓

⇣x

(i)⌘� y(i)

⌘2+

2

dX

j=1

✓2j

✓j ✓j (1� ↵�)� ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j

✓j ✓j � ↵

1

n

nX

i=1

⇣h✓

⇣x

(i)⌘� y

(i)⌘x

(i)j � ↵�✓j

Page 60: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizedLinearRegression

61

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

•  ToincorporateregularizaIonintotheclosedformsoluIon:

Page 61: Linear Regression - Information and Computer Science › ~cis519 › fall2016 › lectures › 04_Linear... · Based on slide by Christopher Bishop (PRML) Example of Fing a Polynomial

RegularizedLinearRegression

62

•  ToincorporateregularizaIonintotheclosedformsoluIon:

•  Canderivethisthesameway,bysolving

•  Canprovethatforλ>0,inverseexistsintheequaIonabove

✓ =

0

BBBBB@X|X + �

2

666664

0 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0...

......

. . ....

0 0 0 . . . 1

3

777775

1

CCCCCA

�1

X|y

@

@✓J(✓) = 0