Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian...

Post on 19-Oct-2020

6 views 0 download

Transcript of Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian...

Multidimensional Gaussian distribution andclassification with Gaussians

Guido Sanguinetti

Informatics 2B— Learning and Data Lecture 96 March 2012

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians1

Overview

Today’s lecture

Gaussians

The multidimensional Gaussian distribution

Bayes theorem and probability density functions

The Gaussian classifier

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians2

(One-dimensional) Gaussian distribution

One-dimensional Gaussian with zero mean and unit variance(µ = 0, σ2 = 1):

−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

p(x|

m,s

)

pdf of Gaussian Distribution

mean=0variance=1

p(x |µ, σ2) = N(x ;µ, σ2) =1√

2πσ2exp

(−(x − µ)2

2σ2

)Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians3

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;

If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Spherical Gaussian

−2−1.5

−1−0.5

00.5

11.5

2

−2

−1

0

1

20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

µ =

(00

)Σ =

(1 00 1

)ρ12 = 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians7

Diagonal Covariance Gaussian

−4−2

02

4

−4−2

02

40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ =

(00

)Σ =

(1 00 4

)ρ12 = 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians8

Full covariance Gaussian

−4−3

−2−1

01

23

4

−4

−2

0

2

40

0.02

0.04

0.06

0.08

0.1

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ =

(00

)Σ =

(1 −1−1 4

)ρ12 = −0.5

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians9

Parameter estimation

It is possible to show that the mean vector µ̂ and covariancematrix Σ̂ that maximize the likelihood of the training data aregiven by:

µ̂ =1

N

N∑n=1

xn

Σ̂ =1

N

N∑n=1

(xn − µ̂)(xn − µ̂)T

The mean of the distribution is estimated by the sample meanand the covariance by the sample covariance

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians10

Example data

−4 −2 0 2 4 6 8 10−5

0

5

10

X1

X2

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians11

Maximum likelihood fit to a Gaussian

−4 −2 0 2 4 6 8 10−5

0

5

10

X1

X2

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians12

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)

The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15

Gaussian pdfs for S and T

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

p(x)

P(x|S)

P(x(T)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians16

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

The following unlabelled data points are available:

x1 = 10 x2 = 11 x3 = 6

To which class should each of the data points be assigned?Assume the two classes have equal prior probabilities.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians17

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)

If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Log odds

5 6 7 8 9 10 11 12 13 14 15−12

−10

−8

−6

−4

−2

0

2

x

ln P

(S|x

)/P(T

|x)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians19

Example: unequal priors

Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?

Again compute the log odds:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln(3/7)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20

Example: unequal priors

Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?

Again compute the log odds:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln(3/7)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20

Log odds

5 6 7 8 9 10 11 12 13 14 15−12

−10

−8

−6

−4

−2

0

2

x

ln P

(S|x

)/P(T

|x)

P(S)=0.5P(S)=0.3

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians21

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)

Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Training data

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians24

Gaussians estimated from training data

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians25

Testing data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians26

Testing data — with estimated class distributions

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians27

Testing data — with true classes indicated

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians28

Classifying test data from class A

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians29

Classifying test data from class B

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians30

Classifying test data from class C

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians31

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)

Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labels

Element (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class r

Total number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Decision Regions

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

x1

x 2

Decision regions for 3−class example

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians33

Example: Classifying spoken vowels

10 Spoken vowels in American English

Vowels can be characterised by formant frequencies —resonances of vocal tract

there are usually three or four identifiable formantsfirst two formants written as F1 and F2

Peterson-Barney data — recordings of spoken vowels byAmerican men, women, and children

two examples of each vowel per personfor this example, data split into training and test setschildren’s data not used in this exampledifferent speakers in training and test sets

(see http://en.wikipedia.org/wiki/Vowel for more)

Classify the data using a Gaussian classifier

Assume equal priors

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians34

The data

Ten steady-state vowels, frequencies of F1 and F2 at their centre:

IY — “bee”

IH — “big”

EH — “red”

AE — “at”

AH — “honey”

AA — “heart”

AO — “frost”

UH — “could”

UW — “you”

ER — “bird”

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians35

Vowel data — 10 classes

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians36

Gaussian for class 1 (IY)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAHAAAOUHUWER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians37

Gaussian for class 2 (IH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians38

Gaussian for class 3 (EH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians39

Gaussian for class 4 (AE)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians40

Gaussian for class 5 (AH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians41

Gaussian for class 6 (AA)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians42

Gaussian for class 7 (AO)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians43

Gaussian for class 8 (UH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians44

Gaussian for class 9 (UW)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians45

Gaussian for class 10 (ER)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAH

AA AOUH

UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians46

Gaussians for each class

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians47

Test data for class 1 (IY)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians48

Confusion matrix

True classIY

IY 20IH 0

EH 0AE 0AH 0AA 0AO 0UH 0UW 0ER 0

% corr. 100

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians49

Test data for class 2 (IH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians50

Confusion matrix

True classIY IH

IY 20 0IH 0 20

EH 0 0AE 0 0AH 0 0AA 0 0AO 0 0UH 0 0UW 0 0ER 0 0

% corr. 100 100

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians51

Test data for class 3 (EH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians52

Confusion matrix

True classIY IH EH

IY 20 0 0IH 0 20 0

EH 0 0 15AE 0 0 1AH 0 0 0AA 0 0 0AO 0 0 0UH 0 0 0UW 0 0 0ER 0 0 4

% corr. 100 100 75

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians53

Test data for class 4 (AE)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians54

Confusion matrix

True classIY IH EH AE

IY 20 0 0 0IH 0 20 0 0

EH 0 0 15 3AE 0 0 1 16AH 0 0 0 1AA 0 0 0 0AO 0 0 0 0UH 0 0 0 0UW 0 0 0 0ER 0 0 4 0

% corr. 100 100 75 80

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians55

Test data for class 5 (AH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians56

Confusion matrix

True classIY IH EH AE AH

IY 20 0 0 0 0IH 0 20 0 0 0

EH 0 0 15 3 0AE 0 0 1 16 0AH 0 0 0 1 18AA 0 0 0 0 2AO 0 0 0 0 0UH 0 0 0 0 0UW 0 0 0 0 0ER 0 0 4 0 0

% corr. 100 100 75 80 90

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians57

Test data for class 6 (AA)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians58

Confusion matrix

True classIY IH EH AE AH AA

IY 20 0 0 0 0 0IH 0 20 0 0 0 0

EH 0 0 15 3 0 0AE 0 0 1 16 0 0AH 0 0 0 1 18 2AA 0 0 0 0 2 17AO 0 0 0 0 0 1UH 0 0 0 0 0 0UW 0 0 0 0 0 0ER 0 0 4 0 0 0

% corr. 100 100 75 80 90 85

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians59

Test data for class 7 (AO)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians60

Confusion matrix

True classIY IH EH AE AH AA AO

IY 20 0 0 0 0 0 0IH 0 20 0 0 0 0 0

EH 0 0 15 3 0 0 0AE 0 0 1 16 0 0 0AH 0 0 0 1 18 2 0AA 0 0 0 0 2 17 4AO 0 0 0 0 0 1 16UH 0 0 0 0 0 0 0UW 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0

% corr. 100 100 75 80 90 85 80

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians61

Test data for class 8 (UH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians62

Confusion matrix

True classIY IH EH AE AH AA AO UH

IY 20 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0AE 0 0 1 16 0 0 0 0AH 0 0 0 1 18 2 0 2AA 0 0 0 0 2 17 4 0AO 0 0 0 0 0 1 16 0UH 0 0 0 0 0 0 0 18UW 0 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0 0

% corr. 100 100 75 80 90 85 80 90

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians63

Test data for class 9 (UW)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians64

Confusion matrix

True classIY IH EH AE AH AA AO UH UW

IY 20 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0AA 0 0 0 0 2 17 4 0 0AO 0 0 0 0 0 1 16 0 0UH 0 0 0 0 0 0 0 18 5UW 0 0 0 0 0 0 0 0 15ER 0 0 4 0 0 0 0 0 0

% corr. 100 100 75 80 90 85 80 90 75

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians65

Test data for class 10 (ER)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians66

Final confusion matrix

True classIY IH EH AE AH AA AO UH UW ER

IY 20 0 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0 0AA 0 0 0 0 2 17 4 0 0 0AO 0 0 0 0 0 1 16 0 0 0UH 0 0 0 0 0 0 0 18 5 2UW 0 0 0 0 0 0 0 0 15 0ER 0 0 4 0 0 0 0 0 0 18

% corr. 100 100 75 80 90 85 80 90 75 90Total: 86.5% correct

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians67

Classifying the training set

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAH

AA AOUH

UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians68

Training set confusion matrix

True classIY IH EH AE AH AA AO UH UW ER

IY 99 8 0 0 0 0 0 0 0 0IH 3 85 15 0 0 0 0 0 0 3

EH 0 7 69 11 0 0 0 0 0 11AE 0 0 5 86 4 0 0 0 0 4AH 0 0 0 3 87 8 3 2 0 1AA 0 0 0 0 4 82 10 0 0 0AO 0 0 0 0 5 12 86 2 0 0UH 0 0 0 0 0 0 2 73 19 10UW 0 0 0 0 0 0 1 15 79 1ER 0 2 13 2 2 0 0 10 4 72% 97.1 83.3 67.6 84.3 85.3 80.4 84.3 71.6 77.5 70.6

Total: 80.2% correct

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians69

Decision Regions

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Gaussian Decision Regions

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians70

Summary

Using Bayes’ theorem with pdfs

The Gaussian classifier: 1-dimensional and multi-dimensional

Vowel classification example

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians71