Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

22
Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2

Transcript of Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Page 1: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 1

Machine Learning

Math EssentialsPart 2

Page 2: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 2

Most commonly used continuous probability distribution

Also known as the normal distribution

Two parameters define a Gaussian:– Mean location of center– Variance 2 width of curve

Gaussian distribution

Page 3: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 3

2

2

2

)(

2/122

)2(

1),|(

x

exΝ

Gaussian distribution

In one dimension

Page 4: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 4

2

2

2

)(

2/122

)2(

1),|(

x

exΝ

Gaussian distribution

In one dimension

Normalizing constant: insures that distribution

integrates to 1

Controls width of curve

Causes pdf to decrease as distance from center

increases

Page 5: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 5

Gaussian distribution

= 0 2 = 1 = 2 2 = 1

= 0 2 = 5 = -2 2 = 0.3

2

2

1

2

1 xex

Page 6: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 6

)()(2

1

2/12/

1T

||

1

)2(

1),|(

μxΣμx

ΣΣμx

eΝd

Multivariate Gaussian distribution

In d dimensions

x and now d-dimensional vectors– gives center of distribution in d-dimensional space

2 replaced by , the d x d covariance matrix– contains pairwise covariances of every pair of features

– Diagonal elements of are variances 2 of individual features

– describes distribution’s shape and spread

Page 7: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 7

Covariance– Measures tendency for two variables to deviate from

their means in same (or opposite) directions at same time

Multivariate Gaussian distribution

no

co

vari

an

ceh

igh

(po

sitive)

covarian

ce

Page 8: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 8

In two dimensions

Multivariate Gaussian distribution

13.0

3.025.0

0

0

Σ

μ

Page 9: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 9

In two dimensions

Multivariate Gaussian distribution

26.0

6.02Σ

10

02Σ

10

01Σ

Page 10: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 10

In three dimensions

Multivariate Gaussian distribution

rng( 1 );mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10; 0.30 1.00 0.70; 0.10 0.70 2.00] ;x = randn( 1000, 3 );x = x * sigma;x = x + repmat( mu', 1000, 1 );scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );

Page 11: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 11

Orthogonal projection of y onto x

– Can take place in any space of dimensionality > 2

– Unit vector in direction of x isx / || x ||

– Length of projection of y indirection of x is

|| y || cos( ) – Orthogonal projection of

y onto x is the vectorprojx( y ) = x || y || cos( ) / || x || =[ ( x y ) / || x ||2 ] x (using dot product

alternate form)

Vector projection

y

x

projx( y )

Page 12: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 12

There are many types of linear models in machine learning.

– Common in both classification and regression.

– A linear model consists of a vector in d-dimensional feature space.

– The vector attempts to capture the strongest gradient (rate of change) in the output variable, as seen across all training samples.

– Different linear models optimize in different ways.

– A point x in feature space is mapped from d dimensions to a scalar (1-dimensional) output z by projection onto :

Linear models

dd xxz 11xβ

Page 13: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 13

There are many types of linear models in machine learning.

– The projection output z is typically transformed to a final predicted output y by some function :

example: for logistic regression, is logistic function example: for linear regression, ( z ) = z

– Models are called linear because they are a linear function of the model vector components 1, …, d.

– Key feature of all linear models: no matter what is, a constant value of z is transformed to a constant value of y, so decision boundaries remain linear even after transform.

Linear models

)()()( 11 dd xxffzfy xβ

Page 14: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 14

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

w0 w

Page 15: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 15

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 16: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 16

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 17: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 17

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 18: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 18

Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

margin

Page 19: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 19

From projection to prediction

positive margin class 1

negative margin class 0

Page 20: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 20

Interpreting the model vector of coefficients

From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ] = B( 1 ), = [ 1 2 ] = B( 2 : 3 )

, define location and orientationof decision boundary

– - is distance of decisionboundary from origin

– decision boundary isperpendicular to

magnitude of defines gradientof probabilities between 0 and 1

Logistic regression in two dimensions

Page 21: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 21

Logistic function in d dimensions

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

Page 22: Jeff Howbert Introduction to Machine Learning Winter 2012 1 Machine Learning Math Essentials Part 2.

Jeff Howbert Introduction to Machine Learning Winter 2012 22

Decision boundary for logistic regression

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)