Regression

48
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION

description

Regression Class PPT

Transcript of Regression

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 1/48

PATTERN RECOGNITIONAND MACHINE LEARNING

CHAPTER 3: LINEAR MODELS FOR

REGRESSION

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 2/48

Outline

• Discuss tutorial.• Regression Examples.

•  The Gaussian distribution.• Linear Regression.• Maximum Likelihood estimation.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 3/48

Polynomial ur!e "itting

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 4/48

#cademia Example

• Predict$ %nal percentage mark &or student.• "eatures$ ' assignment grades( midterm exam( %nal

exam( pro)ect( age.• *uestions +e could ask.

• , &orgot the +eights o& components. an youreco!er them &rom a spreadsheet o& the %nalgrades-

• , lost the %nal exam grades. o+ +ell can , still

predict the %nal mark-• o+ important is each component( actually-

ould , guess +ell someone/s %nal mark gi!entheir assignments- Gi!en their exams-

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 5/48

 The Gaussian Distribution

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 6/48

entral Limit Theorem

 The distribution o& the sum o& 0 i.i.d.random !ariables becomesincreasingly Gaussian as 0 gro+s.Example$ 0 uni&orm 12(34 random!ariables.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 7/48

Reading exponential prob&ormulas• ,n in%nite space( cannot )ust &orm sum

5x p6x7  gro+s to in%nity.• ,nstead( use exponential( e.g.

p6n7 8 639:7n

• ;uppose there is a rele!ant &eature&6x7 and , +ant to express that <the

greater &6x7 is( the less probable xis=.• >se p6x7 8 exp6?&6x77.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 8/48

Example$ exponential &ormsample si@e• "air oin$ The longer the sample

si@e( the less likely it is.•

p6n7 8 :?n.

ln[p(n)]

Sample size n

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 9/48

Exponential "orm$ Gaussianmean•  The &urther x  is &rom the mean( the

less likely it is.

ln[p(x)]

2(x-μ)

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 10/48

;maller !ariance decreasesprobability•  The smaller the !ariance A:( the less

likely x  is 6a+ay &rom the mean7. Or$ thegreater the precision( the less likely x is.

ln[p(x)]

1/σ2 = β

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 11/48

Minimal energy 8 maxprobability•  The greater the energy 6o& the )oint

state7( the less probable the stateis.

ln[p(x)]

E(x)

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 12/48

Linear Basis "unction Models637Generally

+here C )6x7 are kno+n as basis functions.

 Typically( C26x7 8 3( so that +2 acts as abias.

,n the simplest case( +e use linear basis&unctions $ Cd6x7 8 xd.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 13/48

Linear Basis "unction Models6:7Polynomial basis&unctions$

 These are global a smallchange in x aect allbasis &unctions.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 14/48

Linear Basis "unction Models6F7Gaussian basis &unctions$

 These are local a smallchange in x only aectnearby basis &unctions.  ) and s control location and

scale 6+idth7.

Related to kernel methods.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 15/48

Linear Basis "unction Models6H7;igmoidal basis &unctions$

+here

#lso these are local asmall change in x only

aect nearby basis&unctions.  ) and s controllocation and scale6slope7.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 16/48

ur!e "itting Iith 0oise

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 17/48

Maximum Likelihood and Least;Juares 637

#ssume obser!ations &rom a deterministic&unction +ith added Gaussian noise$

+hich is the same as saying(

Gi!en obser!ed inputs( (

and targets(  ( +e obtain the likelihood&unction

+her

e

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 18/48

Maximum Likelihood and Least;Juares 6:7

 Taking the logarithm( +e get

+here

is the sum?o&?sJuares error.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 19/48

omputing the gradient and setting it to@ero yields

;ol!ing &or +( +e get

+here

Maximum Likelihood and Least;Juares 6F7

 The Moore?Penrose pseudo?in!erse( .

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 20/48

Linear #lgebra9Geometry o& Least;Juaresonsider

; is spanned by.

+ML minimi@es thedistance bet+een  andits orthogonal pro)ectionon ;( i.e. .

0?dimensional

M?dimensional

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 21/48

Maximum Likelihood and Least;Juares 6H7

Maximi@ing +ith respect to the bias( +2(alone( +e see that

Ie can also maximi@e +ith respect to K(gi!ing

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 22/48

2th Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 23/48

Frd Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 24/48

th Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 25/48

O!er?%tting

Root?Mean?;Juare 6RM;7Error$

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 26/48

Polynomial oecients

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 27/48

Data ;et ;i@e$

th Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 28/48

3st Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 29/48

Data ;et ;i@e$

th Order Polynomial

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 30/48

*uadratic Regulari@ation

Penali@e large coecient !alues

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 31/48

Regulari@ation$

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 32/48

Regulari@ation$

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 33/48

Regulari@ation$ !s.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 34/48

Regulari@ed Least ;Juares 637

onsider the error &unction$

Iith the sum?o&?sJuares error &unction anda Juadratic regulari@er( +e get

+hich is minimi@ed by

Data term N Regulari@ation term

is calledtheregulari@ationcoecient.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 35/48

Regulari@ed Least ;Juares 6:7

Iith a more general regulari@er( +e ha!e

Lasso *uadratic

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 36/48

Regulari@ed Least ;Juares 6F7

Lasso tends to generate sparser solutionsthan a Juadraticregulari@er.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 37/48

ross?alidation &orRegulari@ation

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 38/48

Bayesian Linear Regression 637

• De%ne a con)ugate shrinkage priro!er +eight !ector !$

p6!Q7 8 06!Q2(?3I7

• ombining this +ith the likelihood&unction and using results &or marginaland conditional Gaussian distributions(gi!es a posterior distribution.

• Log o& the posterior 8 sum o& sJuarederrors N Juadratic regulari@ation.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 39/48

Bayesian Linear Regression 6F7

2 data points obser!ed

Prior Data ;pace

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 40/48

Bayesian Linear Regression 6H7

3 data point obser!ed

Likelihood Posterior Data ;pace

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 41/48

Bayesian Linear Regression 6S7

: data points obser!ed

Likelihood Posterior Data ;pace

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 42/48

Bayesian Linear Regression 6'7

:2 data points obser!ed

Likelihood Posterior Data ;pace

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 43/48

Predicti!e Distribution 637

• Predict t &or ne+ !alues o& x byintegrating o!er +.

• an be sol!ed analytically.

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 44/48

Predicti!e Distribution 6:7

Example$ ;inusoidal data( Gaussian basis&unctions( 3 data point

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 45/48

Predicti!e Distribution 6F7

Example$ ;inusoidal data( Gaussian basis&unctions( : data points

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 46/48

Predicti!e Distribution 6H7

Example$ ;inusoidal data( Gaussian basis&unctions( H data points

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 47/48

Predicti!e Distribution 6S7

Example$ ;inusoidal data( Gaussian basis&unctions( :S data points

7/21/2019 Regression

http://slidepdf.com/reader/full/regression-56da9aa4a77a5 48/48

Limitations o& "ixed Basis"unctions• M basis &unction along each

dimension o& a D?dimensional inputspace reJuires MD basis &unctions$

the curse o& dimensionality.• ,n later chapters( +e shall see ho+

+e can get a+ay +ith &e+er basis

&unctions( by choosing these usingthe training data.