MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4...

105
Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from GutierrezOsuna

Transcript of MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4...

Page 1: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Multivariate  Methods

Slides  from  Machine  Learning  by  Ethem  Alpaydin Expanded  by  some  slides  from  Gutierrez-­‐‑Osuna

Page 2: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Overview

n  We learned how to use the Bayesian approach for classification if we had the probability distribution of the underlying classes (p(x|Ci)).

n  Now going to look into how to estimate those densities from given samples.

2

Page 3: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

3

Expectations

Approximate  Expecta.on  (discrete  and  con.nuous)  

The average value of a function f(x) under a probability distribution p(x) is called the expectation of f(x). Average is weighted by the relative probabilities of different values of x.

Now we are going to look at concepts: variance and co-variance , of 1 or more random variables, using the concept of expectation.

Page 4: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

4

Variance and Covariance

The variance of f(x) provides a measure for how much f(x) varies around its mean E[f(x)]. Given a set of N points {xi} in the1D-space, the variance of the corresponding random variable x is var[x] = E[(x-µ)2] where µ=E[x]. You can estimate the expected value as

var(x) = E[(x-µ)2] ≈ 1/N Σ (xi-µ)2 x

i Remember the definition of expectation:

Page 5: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

5

Variance and Covariance

The variance of x provides a measure for how much x varies around its mean µ=E[x]. var(x) = E[(x-µ)2] Co-variance of two random variables x and y measures the extent to which they vary together.

Page 6: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

6

Variance and Covariance

Co-variance of two random variables x and y measures the extent to which they vary together.

: two random variables x, y : two vector random variables x, y – covariance is a matrix

Page 7: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Multivariate  Normal  Distribution

Page 8: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

8

The Gaussian Distribution

Page 9: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

9

Expectations

For normally distributed x:

Page 10: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

10

n  Assume we have a d-dimensional input (e.g. 2D), x.

n  We will see how can we characterize p(x), assuming x is normally distributed.

¨  For 1-dimension it was the mean (µ) and variance (σ2)

n  Mean=E[x]

n  Variance=E[(x - µ)2]

¨  For d-dimensions, we need n  the d-dimensional mean vector n  dxd dimensional covariance matrix

¨  If x ~ Nd(µ,Σ) then each dimension of x is univariate normal n  Converse is not true

Page 11: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

11

Normal Distribution & Multivariate Normal Distribution

n  For a single variable, the normal density function is:

n  For variables in higher dimensions, this generalizes to:

n  where the mean µ is now a d-dimensional vector, Σ is a d x d covariance matrix

|Σ| is the determinant of Σ:

Page 12: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

12

Multivariate Parameters: Mean, Covariance

( ) TTT

ddd

d

d

EE µµxxµxµxx −=−−=

⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢

=≡Σ ][]))([(Cov

221

22221

11221

σσσ

σσσ

σσσ

!"

!!

[ ] [ ]TdE µµ ,...,:Mean 1== µx

( ) )])([(,Cov jjiijiij xxExx µµσ −−≡≡

Page 13: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Matlab code

n  close all;

n  rand('twister', 1987) % seed

%Define the parameters of two 2d-Normal distribution

n  mu1 = [5 -5];

n  mu2 = [0 0];

n  sigma1 = [2 0; 0 2];

n  sigma2 = [5 5; 5 5];

n  N=500; %Number of samples we want to generate from this distribution

n  samp1 = mvnrnd(mu1,sigma1, N);

n  samp2 = mvnrnd(mu2, sigma2, N);

n 

n  figure; clf;

n  plot(samp1(:,1), samp1(:,2),'.', 'MarkerEdgeColor', 'b');

n  hold on;

n  plot(samp2(:,1), samp2(:,2),'*', 'MarkerEdgeColor', 'r');

n  axis([-20 20 -20 20]); legend('d1', 'd2'); 13

Page 14: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

14

mu1  =  [5  -­‐‑5]; mu2  =  [0    0]; sigma1  =  [2  0;  0  2]; sigma2  =  [5  5;  5  5];

Page 15: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

15

mu1  =  [5  -­‐‑5]; mu2  =  [0    0]; sigma1  =  [2  0;  0  2]; sigma2  =  [5  2;  2  5];

Page 16: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Matlab sample cont.

n  % Lets compute the mean and covariance as if we are given this data

n  sampmu1 = sum(samp1)/N;

n  sampmu2 = sum(samp2)/N;

n  sampcov1 = zeros(2,2);

n  sampcov2 = zeros(2,2);

n  for i =1:N

n  sampcov1 = sampcov1 + (samp1(i,:)-sampmu1)' * (samp1(i,:)-sampmu1);

n  sampcov2 = sampcov2 + (samp2(i,:)-sampmu2)' * (samp2(i,:)-sampmu2);

n  End

n  sampcov1 = sampcov1 /N;

n  sampcov2 = sampcov2 /N;

n  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% n  % Lets compute the mean and covariance as if we are given this data USING MATRIX OPERATIONS n  % Notice that in samp1, samples are given in ROWS – but for this multiplication, columns * rows is req. n  sampcov1 = (samp1'*samp1)/N - sampmu1'*sampmu1; n  %Or simply

n  mu=mean(samp1);

n  cov=cov(samp1); 16

Page 17: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

17

n  Variance: How much X varies around the expected value

n  Covariance is the measure the strength of the linear relationship between two random variables ¨  covariance becomes more positive

for each pair of values which differ from their mean in the same direction

¨  covariance becomes more negative with each pair of values which differ from their mean in opposite directions.

¨  if two variables are independent, then their covariance/correlation is zero (converse is not true).

( )

( )ji

ijijji

jjiijiij

xx

xxExx

σσ

σρ

µµσ

=≡

−−≡≡

,Corr :nCorrelatio

)])([(,Cov:Covariance

n  Correlation is a dimensionless measure of linear dependence. ¨  range between –1 and +1

Page 18: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

18

How  to  characterize  differences  between  these  distributions

Page 19: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

19

Covariance Matrices

Page 20: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

20

Contours of constant probability density for a 2D Gaussian distributions with a) general covariance matrix b) diagonal covariance matrix (covariance of x1,x2 is 0) c) Σ proportional to the identity matrix (covariances are 0, variances of each dimension are the same)

Page 21: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

21

Shape and orientation of the hyper-ellipsoid centered at µ is defined by Σ

v1 v2

Page 22: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

22

Properties of Σ

n  A small value of |Σ| (determinant of the covariance matrix) indicates that samples are close to µ

n  Small |Σ| may also indicate that there is a high correlation between variables

n  If some of the variables are linearly dependent, or

if the variance of one variable is 0, then Σ is singular and |Σ| is 0.

n  Dimensionality should be reduced to get a positive definite matrix

Page 23: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

23

From the equation for the normal density, it is apparent that points which have the same density must have the same constant term: Mahalanobis distance measures the distance from x to μ in terms of ∑

⎥⎦⎤

⎢⎣⎡ −−−= − )()(21exp

)2(1)( 1

2/12/µxΣµx

Σx t

dp

πMahalanobis Distance

Page 24: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

24

Points that are the same distance from µ

n  The

• The ellipse consists of points that are equi-distant to the center w.r.t. Mahalanobis distance. • The circle consists of points that are equi-distant to the center w.r.t. The Euclidian distance.

Page 25: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

25

Why Mahalanobis Distance

It takes into account the covariance of the data. •  Point P is at closer (Euclidean) to the mean for the orange class, but using the Mahalanobis distance, it is found to be closer to 'apple‘ class.

Page 26: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

26

Positive Semi-Definite-Advanced

n  The covariance matrix Σ is symmetric and positive semi-definite

n  An n × n real symmetric matrix M is positive definite if zTMz > 0 for all non-zero vectors z with real entries.

n  An n × n real symmetric matrix M is positive semi-definite

if zTMz >= 0 for all non-zero vectors z with real entries. n  Comes from the requirement that the variance of each dimension is >= 0

and that the matrix is symmetric.

n  When you randomly generate a covariance matrix, it may violate this rule ¨  Test to see if all the eigenvalues are >= 0 ¨  Higham 2002 – how to find nearest valid covariance matrix ¨  Set negative eigenvalues to small positive values

Page 27: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Parameter  Estimation

Covered  only  ML  estimator

Page 28: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

n  You have some samples coming from an unknown distribution and you want to characterize that distribution; i.e. find the necessary parameters.

n  For instance, assuming you are given the samples in Slides 14 or 15, and that you assume that they are normally distributed, you will try to find the parameters µ and Σ of the corresponding normal distribution.

n  We have referred to the sample mean (mean of the samples, m) and sample variance S before, but why use those instead of µ and Σ?

28

Page 29: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

n  Given an i.i.d data set X, sampled from a normal distribution with parameters µ and Σ, we would like to determine these parameters, given the samples.

n  Once we have those, we can estimate p(x) for any given x (little x), given the known distribution.

n  Two approaches in parameter estimation: ¨  Maximum Likelihood approach ¨  Bayesian approach

29

Page 30: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

30

Page 31: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

31

Page 32: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

32

Page 33: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

33

Gaussian Parameter Estimation

Likelihood  func.on  

Assuming iid data

Page 34: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

34

Page 35: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

35

Page 36: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

36

Reminder

n  In order to maximize or minimize a function f(x) w.r.to x, we compute the derivative df(x)/dx and set to 0, since a necessary condition for extrema is that the derivative is 0.

n  Commonly used derivative rules are given on the right.

Page 37: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

37

Derivation – general case-ADVANCED

Page 38: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

38

Maximum (Log) Likelihood for Multivarate Case

µ =

Page 39: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

39

Maximum (Log) Likelihood for a 1D-Gaussian

In other words, maximum likelihood estimates of mean and variance are the same as sample mean and variance.

Page 40: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

40

Sample Mean and Variance for Multivariate Case

( )( )

ji

ijij

jtj

N

t iti

ij

N

tti

i

sss

r

Nmxmx

s

diNx

m

=

−−=

==

=

=

:matrix n Correlatio

:matrix Covariance

,...,1,:mean Sample

1

1

R

S

m

Where N is the number of data points  xt:

Page 41: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

ML  estimates  for  the  mean  and  variance  in  Bernouilli/Multinomial  distributions

Not  covered  in  class  –  but  you  should  be  able  to  do  as  a  take  home  etc.  Same  idea  as  before,  starting  from  the  likelihood,  you  find  the  value  that  maximizes  the  likelihood.  

Page 42: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

42

Examples: Bernoulli/Multinomial

n  Bernouilli: Binary random variable x may take one of the two values: ¨  success/failure, 0/1 with probabilities: ¨  P(x=1)= po

¨  P(x=0)=1-po ¨  Unifying the above, we get: P (x) = po

x (1 – po ) (1 – x)

n  Given a sample set X={x1,x2,…}, we can estimate p using the ML

estimate by maximizing the log-likelihood of the sample set:

Log likelihood: log P(X|po) = log ∏t po

xt (1 – po ) (1 – xt)

Page 43: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

43

Examples: Bernoulli

L=P(X|po) = log ∏t po

xt (1 – po ) (1 – xt) xt in {0,1}

Solving for the necessary condition for extrema, (we must have dL/dp = 0)

MLE: po = ∑

t xt / N

ratio of the number of occurences of the event to the number of experiments

Page 44: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

44

Examples: Multinomial

n  Multinomial: K>2 states, xi in {0,1}

n  Instead of two states, we now have K mutually exclusive and exhaustive events, with probability of occurence of

pi where Σpi = 1.

n  Ex. A dice with 6 outcomes.

P (x1,x2,...,xK) = ∏i pi

xi where xi is 1 if the outcome is state i 0 otherwise

L(p1,p2,...,pK|X) = log ∏

t ∏

i pi

xit

MLE: pi = ∑

t xi

t / N Ratio of experiments with outcome of state i (e.g. 60 dice throws, 15 of them were 6 p6 = 15/60

Page 45: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

45

Discrete Features

n  Binary features:

If xj are independent (Naive Bayes’ assumption)

( ) ( ) ( )( ) ( )[ ] ( )i

jijjijj

iii

CPpxpxCPCpg

log1 log 1 log log| log

+−−+=

+=

∑xx

Estimated parameters

( )ijij Cxpp |1=≡

( ) ( )( )∏=

−−=d

j

xij

xiji

jj ppCxp1

11|

∑∑

=t

ti

t

ti

tj

ij r

rxp̂

ri=1  if  xt  in  Ci

             0  otherwise

Page 46: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

46

Discrete Features

n  Multinomial (1-of-nj) features: xj ∈ {v1, v2,..., vnj}

if xj are independent

( ) ( )ikjijkijk CvxpCzpp ||1 ===≡

( )

( ) ( )

∑∑∑ ∑

∏∏

=

+=

== =

tti

tti

tjk

ijk

iijkj k jki

d

j

n

k

zijki

rrz

p

CPpzg

pCpj

jk

ˆ

log log

|1 1

x

x

Page 47: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Parametric  Classification

We will use the Bayesian decision criteria applied to normally distributed classes, whose parameters are either known or estimated from the sample.

Page 48: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

48

Parametric Classification

n  If p (x | Ci ) ~ N ( μi , ∑i )

n  Discriminant functions are:

( )( )

( ) ( )⎥⎦⎤

⎢⎣

⎡ −−−= −ii

Ti

idiCp µxµxx 1

2/12/ Σ21exp

Σ21|

π

( )( ) ( )

( ) ( ) ( )iiiT

ii

ii

ii

CPdCPCp

CPg

logµΣµ21Σ log

212log

2

log| log )|(log

1 +−−−−−=

+=

=

− xx

xxx

π

Page 49: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

49

Estimation of Parameters

( ) ( ) ( ) ( )iiiT

iii CPg ˆ log21 log

21 1 +−−−−= − mxmxx SS

If we estimate the unknown parameters from the sample, the discriminant function becomes:

Page 50: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

50

Case 1) Different Si (each class has a separate variance)

( ) ( ) ( )

( )iiiiTii

iii

ii

iTii

T

iiiTiii

Ti

Tii

CP̂w

w

CP̂g

log log21

21

21

where

log221

log21

10

1

1

0

111

+−−=

=

−=

++=

++−−−=

−−−

SS

S

SW

W

SSSS

mm

mw

xwxx

mmmxxxx

if we group the terms, we see that there are second order terms, which means the discriminant is quadratic.

Page 51: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

51

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Page 52: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

52

Two bi-variate normals, with completely different covariance matrix, showing a hyper-quadratic decision boundary.

Page 53: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

53

Page 54: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

54

Hyperbola: A hyperbola is an open curve with two branches, the intersection of a plane with both halves of a double cone. The plane may or may not be parallel to the

axis of the cone. Definition from Wikipedia.

Page 55: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

55

Page 56: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

56

Typical single-variable normal distributions showing a disconnected decision region R2

Page 57: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

57

Notation: Ethem book

( )

( )( )∑

∑∑∑

−−=

=

=

t

ti

T

it

t itt

ii

t

ti

t

tti

i

t

ti

i

r

r

r

rN

rCP̂

mxmx

xm

S

tir

Using the notation in Ethem book, the sample mean and sample covariance… can be estimated as follows:

where is 1 if the tth sample belongs to class i  

Page 58: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

58

n  If d (dimension) is large with respect to N (number of samples), we may have a problem with this approach: ¨  | Σ | may be zero, thus Σ will be singular (inverse does not exist) ¨  | Σ | may be non-zero, but very small, instability would result

n  Small changes in Σ would cause large changes in Σ-1

n  Solutions: ¨  Reduce the dimensionality

n  Feature selection n  Feature extraction: PCA

¨  Pool the data and estimate a common covariance matrix for all classes

Σ = Σi P(Ci) * Σi

Page 59: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

59

Case 2) Common Covariance Matrix S=Si

n  Shared common sample covariance S ¨  An arbitrary covariance matrix – but shared between the classes

n  We had this full discriminant function:

which now reduces to:

which is a linear discriminant

( ) ( ) ( ) ( )iiT

ii CPg ˆ log21 1 +−−−= − mxmxx S

( )

( )iiTiiii

iTii

CPw

wg

ˆ log21

where

10

1

0

+−==

+=

−− mmmw

xwx

SS

( ) ( ) ( ) ( )iiiT

iii CPg ˆ log21 log

21 1 +−−−−= − mxmxx SS

Page 60: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

60

n  Linear discriminant: ¨  Decision boundaries are hyper-planes ¨  Convex decision regions:

n  All points between two arbitrary points chosen from one decision region belongs to the same decision region

n  If we also assume equal class priors, the classifier becomes a minimum Mahalanobis classifier

Page 61: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

61

Case 2) Common Covariance Matrix S

Linear  discriminant

Page 62: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

62

Page 63: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

63

Page 64: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

64

Unequal priors shift the decision boundary towards the less likely class, as before.

Page 65: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

65

Page 66: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

66

Case 3) Common Covariance Matrix S which is Diagonal

n  In the previous case, we had a common, general covariance matrix, resulting in these discriminant functions:

n  When xj (j = 1,..d) are independent, ∑ is diagonal

n  Classification is done based on weighted Euclidean distance (in sj units) to the nearest mean.

Naive Bayes classifier where p(xj|Ci) are univariate Gaussian

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’ assumption)

( ) ( )id

j j

ijtj

i CPsmx

g ˆ log21

1

2

+⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= ∑

=

x

( ) ( ) ( ) ( )iiT

ii CPg ˆ log21 1 +−−−= − mxmxx S

Page 67: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

67

Case 3) Common Covariance Matrix S which is Diagonal

variances may be different

Page 68: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

68

Page 69: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

69

Case 4) Common Covariance Matrix S which is Diagonal + equal variances

n  We had this before (S which is diagonal):

n  If the priors are also equal, we have the Nearest Mean classifier: Classify based on Euclidean distance to the nearest mean!

n  Each mean can be considered a prototype or template and this is template matching

( ) ( )

( ) ( )id

jij

tj

ii

i

CPmxs

CPs

g

ˆ log21

ˆ log2

2

12

2

2

+−−=

+−

−=

∑=

mxx

( ) ( )id

j j

ijtj

i CPsmx

g ˆ log21

1

2

+⎟⎟⎠

⎞⎜⎜⎝

⎛ −−= ∑

=

x

Page 70: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

70

Case 4) Common Covariance Matrix S which is Diagonal + equal variances

* ?

Page 71: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

71

Case  4)  Common  Covariance  Matrix  S  which  is  Diagonal  +  equal  variances

Page 72: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

72

A second example where the priors have been changed: The decision boundary has shifted away from the more likely class, although it is still orthogonal to the line joining the 2 means.

Page 73: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

73

Page 74: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

74

Case 5: Si=σi2I

Page 75: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

75

Model Selection

n  As we increase complexity (less restricted S), bias decreases and variance increases

n  Assume simple models (allow some bias) to control variance (regularization)

Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal Si K d(d+1)/2

Page 76: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

76

Estimation of Missing Values

n  What to do if certain instances have missing attributes? 1.  Ignore those instances: not a good idea if the sample is small 2.  Use ‘missing’ as an attribute: may give information 3.  Imputation: Fill in the missing value

n  Mean imputation: Use the most likely value (e.g., mean) n  Imputation by regression: Predict based on other attributes

n  Another important problem is sensitivity due to outliers

Page 77: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

77

Tuning Complexity

n  When we use Euclidian distance to measure similarity, we are assuming that all variables have the same variance and that they are independent ¨  E.g. Two variables age and yearly income

n  When these assumptions don’t hold, ¨  Normalization may be used (use PCA, whitening, make each dimension

zero mean and unit variance…) to use Euclidian distance ¨  We may still want to use simpler models in order to estimate the related

parameters more accurately

Page 78: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

78

Conclusions

n  The Bayes classifier for normally distributed classes is a quadratic classifier

n  The Bayes classifier for normally distributed classes with equal covariance matrices is a linear classifier

n  The minimum Mahalanobis distance is Bayes-optimal for ¨  Normally distributed classes, having ¨  Equal covariance matrices and ¨  Equal priors

n  The minimum Euclidian distance is Bayes-optimal for ¨  Normally distributed classes -and- ¨  Equal covariance matrices proportional to the identity matrix–and- ¨  Equal priors

n  Both Euclidian and Mahalanobis distance classifiers are linear classifiers

Page 79: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

79

PRTOOLS n  load nutsbolts; //an existing Prtools dataset with 4 classes n  w1=ldc(z); //linear Bayes Normal classifier n  w2=qdc(z); //quadratic Bayes Normal classifier n  figure(1); scatterd(z); hold on; plotc(w1); n  figure(2); scatterd(z); hold on; plotc(w2);

Page 80: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

80

Page 81: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

81

n  >> A = gendath([50 50]); //Generate data with 2 classes x 50 samples each n  >> [C,D] = gendat(A,[20 20]); //Split 20 as C=train; rest becomes D=test n  >> W1 = ldc(C); //linear n  >> W2 = qdc(C); //quadratic n  >> figure(5); scatterd(C); hold on; plotm(W2);

Page 82: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

82

n  scatterd(A); //scatter plot n  plotc({W1,W2});

Page 83: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

83

Page 84: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

84

Page 85: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

85

error  on  test:  9.5%

Page 86: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

86

error on test: 11.5%

Page 87: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

87

with  N=20  points

Page 88: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

88

Page 89: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

89

Page 90: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

90

Page 91: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

91

Page 92: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

92

Page 93: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

93

Page 94: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

Eigenvalue  Decomposition  of  the  Covariance  Matrix

Skip  Until  Dimensionality  Reduction  Topic

Page 95: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

95

Eigenvector and Eigenvalue

¨  Given a linear transformation A, a non-zero vector x is defined to be an eigenvector of the transformation if it satisfies the eigenvalue equation Ax = λ x for some scalar λ.

n  In this situation, the scalar λ is called an eigenvalue of A corresponding to the eigenvector x.

n  (A- λI)x=0 => det(A- λI) must be 0. ¨  Gives the characteristic polynomial whore roots are the eigenvalues of A

Page 96: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

96

The covariance matrix is symmetrical and it can always be diagonalized as:

TΦΛΦ=Σ

where is the column matrix consisting of the eigenvectors of Σ, ΦT=Φ-­‐‑1 and Λ is the diagonal matrix whose elements are the eigenvalues of Σ.

],...,,[ 21 lυυυ=Φ

Eigenvalues of the covariance matrix

Page 97: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

97

Contours of equal Mahalanobis distance:

( ) cxxd iT

im =−Σ−= − 2/11 )()( µµ

Define x’= ΦTx. The coordinates of x’ are equal to νkTx, k=1,2,…,l

that is, the projections of x onto the eigenvectors.

21 )()( cxx iTT

i =−ΦΦΛ− − µµ

22

1

211 )(...)( cxx

l

illi =ʹ′−ʹ′

++ʹ′−ʹ′

λµ

λµ

Thus all points having the same Mahalabonis distance from a specific point are located on an ellipse with center of mass at µi, and the principle axes are aligned with the corresponding eigenvectors and have lengths ckλ2

Page 98: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

98

Page 99: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

99

n  Eigenvalue decomposition of the covariance matrix is very useful: ¨  PCA ¨  Decorrelate data (Whitening transform) ¨  …

Page 100: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

100

Matrix Transpose

n  Given the matrix A, the transpose of A is the , denoted AT, whose columns are formed from the corresponding rows of A.

n  Some properties of transpose 1.  (AT)T = A

2.  (A + B)T = AT + BT

3.  (rA)T = rAT where r is any scalar.

4.  (AB)T = BTAT

5.  ATB=BTA where A and B are vectors

6.  …

Page 101: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

101

Matrix Derivation

n  If y = xTAx where A is a square matrix ¨  ∂y/∂x = Ax + AT x

n  If y = xTAx and A is symmetric ¨  ∂y/∂x = 2Ax.

n  If y = xTx ¨  ∂y/∂x = 2x.

Page 102: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

102

n  If x ~ Nd(µ,Σ) then each dimension of x is univariate normal ¨  Converse is not true – counterexample?

n  The projection of a d-dimensional normal distribution onto the vector w is univariate normal

wTx ~ N (wTµ, wTΣw)

n  More generally, when W is a dxk matrix with k < d, then the k-dimensional Wx is k-variate normal with:

WTx ~ Nk(WTµ, WTΣW)

Page 103: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

103

Univariate normal – transformed x

n  E[wTx] = wT E[x] = wT µ

n  Var[wTx] = E[ (wTx - wT µ) (wTx - wT µ)Τ]

= E[ (wTx - wT µ)2] since (wTx - wT µ) is a scalar = E[ (wTx - wT µ) (wTx - wT µ) ]

= Ε[ wT(x-µ) (x-µ)Τ w ] based on slide 27.rule 5 = wTΣ w move out and use defin. of Σ

Page 104: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

104

Page 105: MultivariateMethods - Sabancı Üniversitesipeople.sabanciuniv.edu/berrin/cs512/lectures/... · 4 Variance and Covariance The variance of f(x) provides a measure for how much f(x)

105

Whitening Transform

n  We can decorrelate variables and obtain Σ=I by using a whitening transform Aw=Λ-1/2ΦT where ¨  Λ is the diagonal matrix of the eigenvalues of the original distribution

and ¨  Φ is the matrix composed of eigenvectors as its columns