Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian...

99
Multidimensional Gaussian distribution and classification with Gaussians Guido Sanguinetti Informatics 2B— Learning and Data Lecture 9 6 March 2012 Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with G

Transcript of Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian...

Page 1: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Multidimensional Gaussian distribution andclassification with Gaussians

Guido Sanguinetti

Informatics 2B— Learning and Data Lecture 96 March 2012

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians1

Page 2: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Overview

Today’s lecture

Gaussians

The multidimensional Gaussian distribution

Bayes theorem and probability density functions

The Gaussian classifier

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians2

Page 3: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

(One-dimensional) Gaussian distribution

One-dimensional Gaussian with zero mean and unit variance(µ = 0, σ2 = 1):

−4 −3 −2 −1 0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

p(x|

m,s

)

pdf of Gaussian Distribution

mean=0variance=1

p(x |µ, σ2) = N(x ;µ, σ2) =1√

2πσ2exp

(−(x − µ)2

2σ2

)Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians3

Page 4: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

Page 5: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

Page 6: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

The multidimensional Gaussian distribution

The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.

The 1-dimensional Gaussian is a special case of this pdf

The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4

Page 7: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 8: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 9: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 10: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 11: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;

If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 12: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Covariance matrix

The mean vector µ is the expectation of x:

µ = E [x]

The covariance matrix Σ is the expectation of the deviation ofx from the mean:

Σ = E [(x− µ)(x− µ)T ]

Σ is a d × d symmetric matrix:

Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji

The sign of the covariance helps to determine the relationshipbetween two components:

If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5

Page 13: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Page 14: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Page 15: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Correlation matrix

The covariance matrix is not scale-independent: Define thecorrelation coefficient:

ρ(xj , xk) = ρjk =Sjk√SjjSkk

Scale-independent (ie independent of the measurement units)and location-independent, ie:

ρ(xj , xk) = ρ(axj + b, sxk + t)

The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and

ρ(x , y) = +1 if y = ax + b a > 0

ρ(x , y) = −1 if y = ax + b a < 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6

Page 16: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Spherical Gaussian

−2−1.5

−1−0.5

00.5

11.5

2

−2

−1

0

1

20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

µ =

(00

)Σ =

(1 00 1

)ρ12 = 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians7

Page 17: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Diagonal Covariance Gaussian

−4−2

02

4

−4−2

02

40

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ =

(00

)Σ =

(1 00 4

)ρ12 = 0

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians8

Page 18: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Full covariance Gaussian

−4−3

−2−1

01

23

4

−4

−2

0

2

40

0.02

0.04

0.06

0.08

0.1

x1

Surface plot of p(x1, x2)

x2

p(x 1, x

2)

x1

x 2

Contour plot of p(x1, x2)

−4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

µ =

(00

)Σ =

(1 −1−1 4

)ρ12 = −0.5

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians9

Page 19: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Parameter estimation

It is possible to show that the mean vector µ̂ and covariancematrix Σ̂ that maximize the likelihood of the training data aregiven by:

µ̂ =1

N

N∑n=1

xn

Σ̂ =1

N

N∑n=1

(xn − µ̂)(xn − µ̂)T

The mean of the distribution is estimated by the sample meanand the covariance by the sample covariance

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians10

Page 20: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example data

−4 −2 0 2 4 6 8 10−5

0

5

10

X1

X2

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians11

Page 21: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Maximum likelihood fit to a Gaussian

−4 −2 0 2 4 6 8 10−5

0

5

10

X1

X2

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians12

Page 22: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Page 23: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Page 24: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and probability densities

Rules for probability densities are similar to those forprobabilities:

p(x , y) = p(x |y)p(y)

p(x) =

∫p(x , y)dy

We may mix probabilities of discrete variables and probabilitydensities of continuous variables:

p(x ,Z ) = p(x |Z )P(Z )

Bayes’ theorem for continuous data x and class C :

P(C |x) =p(x |C )P(C )

p(x)

P(C |x) ∝ p(x |C )P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13

Page 25: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Page 26: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)

The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Page 27: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Bayes theorem and univariate Gaussians

If p(x | C ) is Gaussian with mean µc and variance σ2c :

P(C | x) ∝ p(x | C )P(C )

∝ N(x ;µc , σ2c )P(C )

∝ 1√2πσ2

c

exp

(−(x − µc)2

2σ2c

)P(C )

Taking logs, we have the log likelihood LL(x | C ):

LL(x | C ) = ln p(x | µc , σ2c )

=1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)The log posterior probability LP(C | x) is:

LP(C | x) ∝ LL(x | C ) + LP(C )

∝ 1

2

(− ln(2π)− lnσ2

c −(x − µc)2

σ2c

)+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14

Page 28: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15

Page 29: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15

Page 30: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian pdfs for S and T

0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

x

p(x)

P(x|S)

P(x(T)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians16

Page 31: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: 1-dimensional Gaussian classifier

Two classes, S and T , with some observations:

Class S 10 8 10 10 11 11

Class T 12 9 15 10 13 13

Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:

µ(S) = 10 σ2(S) = 1

µ(T ) = 12 σ2(T ) = 4

The following unlabelled data points are available:

x1 = 10 x2 = 11 x3 = 6

To which class should each of the data points be assigned?Assume the two classes have equal prior probabilities.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians17

Page 32: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Page 33: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)

If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Page 34: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Log odds

Take the log odds (posterior probability ratios):

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

In the example the priors are equal, so:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)If log odds are less than 0 assign to T , otherwise assign to S .

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18

Page 35: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Log odds

5 6 7 8 9 10 11 12 13 14 15−12

−10

−8

−6

−4

−2

0

2

x

ln P

(S|x

)/P(T

|x)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians19

Page 36: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: unequal priors

Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?

Again compute the log odds:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln(3/7)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20

Page 37: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: unequal priors

Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?

Again compute the log odds:

lnP(S |X = x)

P(T |X = x)= −1

2

((x − µS)2

σ2S

− (x − µT )2

σ2T

+ lnσ2S − lnσ2

T

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln P(S)− ln P(T )

= −1

2

((x − 10)2 − (x − 12)2

4− ln 4

)+ ln(3/7)

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20

Page 38: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Log odds

5 6 7 8 9 10 11 12 13 14 15−12

−10

−8

−6

−4

−2

0

2

x

ln P

(S|x

)/P(T

|x)

P(S)=0.5P(S)=0.3

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians21

Page 39: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)

Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Page 40: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Page 41: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Multivariate Gaussian classifier

Multivariate Gaussian (in d dimensions):

p(x|µ,Σ) =1

(2π)d/2|Σ|1/2exp

(−1

2(x− µ)T Σ−1(x− µ)

)Log likelihood:

LL(x|µ,Σ) = −d

2ln(2π)− 1

2ln |Σ| − 1

2(x− µ)T Σ−1(x− µ)

If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:

ln P(C |x) ∝ −1

2(x− µ)T Σ−1(x− µ)− 1

2ln |Σ|+ ln P(C )

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22

Page 42: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Page 43: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Page 44: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Page 45: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example

2-dimensional data from three classes (A, B, C ).

The classes have equal prior probabilities.

200 points in each class

Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:

xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23

Page 46: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Training data

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians24

Page 47: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussians estimated from training data

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians25

Page 48: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Testing data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians26

Page 49: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Testing data — with estimated class distributions

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians27

Page 50: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Testing data — with true classes indicated

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians28

Page 51: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Classifying test data from class A

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians29

Page 52: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Classifying test data from class B

−8 −6 −4 −2 0 2 4 6−8

−6

−4

−2

0

2

4

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians30

Page 53: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Classifying test data from class C

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians31

Page 54: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 55: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)

Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 56: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labels

Element (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 57: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class r

Total number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 58: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 59: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 60: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Results

Analyze results by percent correct, and in more detail with aconfusion matrix

Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal

Confusion matrix in this case:

True classTest Data A B C

Predicted A 77 5 9class B 15 88 2

C 8 7 89

Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32

Page 61: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Decision Regions

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8

x1

x 2

Decision regions for 3−class example

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians33

Page 62: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Example: Classifying spoken vowels

10 Spoken vowels in American English

Vowels can be characterised by formant frequencies —resonances of vocal tract

there are usually three or four identifiable formantsfirst two formants written as F1 and F2

Peterson-Barney data — recordings of spoken vowels byAmerican men, women, and children

two examples of each vowel per personfor this example, data split into training and test setschildren’s data not used in this exampledifferent speakers in training and test sets

(see http://en.wikipedia.org/wiki/Vowel for more)

Classify the data using a Gaussian classifier

Assume equal priors

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians34

Page 63: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

The data

Ten steady-state vowels, frequencies of F1 and F2 at their centre:

IY — “bee”

IH — “big”

EH — “red”

AE — “at”

AH — “honey”

AA — “heart”

AO — “frost”

UH — “could”

UW — “you”

ER — “bird”

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians35

Page 64: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Vowel data — 10 classes

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians36

Page 65: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 1 (IY)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAHAAAOUHUWER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians37

Page 66: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 2 (IH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians38

Page 67: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 3 (EH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians39

Page 68: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 4 (AE)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians40

Page 69: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 5 (AH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians41

Page 70: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 6 (AA)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians42

Page 71: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 7 (AO)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians43

Page 72: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 8 (UH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians44

Page 73: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 9 (UW)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / Hz

F2 / H

z

IY

IH

EH

AE

AH

AA

AO

UH

UW

ER

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians45

Page 74: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussian for class 10 (ER)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAH

AA AOUH

UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians46

Page 75: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Gaussians for each class

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians47

Page 76: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 1 (IY)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians48

Page 77: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY

IY 20IH 0

EH 0AE 0AH 0AA 0AO 0UH 0UW 0ER 0

% corr. 100

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians49

Page 78: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 2 (IH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians50

Page 79: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH

IY 20 0IH 0 20

EH 0 0AE 0 0AH 0 0AA 0 0AO 0 0UH 0 0UW 0 0ER 0 0

% corr. 100 100

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians51

Page 80: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 3 (EH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians52

Page 81: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH

IY 20 0 0IH 0 20 0

EH 0 0 15AE 0 0 1AH 0 0 0AA 0 0 0AO 0 0 0UH 0 0 0UW 0 0 0ER 0 0 4

% corr. 100 100 75

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians53

Page 82: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 4 (AE)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians54

Page 83: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE

IY 20 0 0 0IH 0 20 0 0

EH 0 0 15 3AE 0 0 1 16AH 0 0 0 1AA 0 0 0 0AO 0 0 0 0UH 0 0 0 0UW 0 0 0 0ER 0 0 4 0

% corr. 100 100 75 80

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians55

Page 84: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 5 (AH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians56

Page 85: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE AH

IY 20 0 0 0 0IH 0 20 0 0 0

EH 0 0 15 3 0AE 0 0 1 16 0AH 0 0 0 1 18AA 0 0 0 0 2AO 0 0 0 0 0UH 0 0 0 0 0UW 0 0 0 0 0ER 0 0 4 0 0

% corr. 100 100 75 80 90

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians57

Page 86: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 6 (AA)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians58

Page 87: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE AH AA

IY 20 0 0 0 0 0IH 0 20 0 0 0 0

EH 0 0 15 3 0 0AE 0 0 1 16 0 0AH 0 0 0 1 18 2AA 0 0 0 0 2 17AO 0 0 0 0 0 1UH 0 0 0 0 0 0UW 0 0 0 0 0 0ER 0 0 4 0 0 0

% corr. 100 100 75 80 90 85

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians59

Page 88: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 7 (AO)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians60

Page 89: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE AH AA AO

IY 20 0 0 0 0 0 0IH 0 20 0 0 0 0 0

EH 0 0 15 3 0 0 0AE 0 0 1 16 0 0 0AH 0 0 0 1 18 2 0AA 0 0 0 0 2 17 4AO 0 0 0 0 0 1 16UH 0 0 0 0 0 0 0UW 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0

% corr. 100 100 75 80 90 85 80

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians61

Page 90: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 8 (UH)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians62

Page 91: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE AH AA AO UH

IY 20 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0AE 0 0 1 16 0 0 0 0AH 0 0 0 1 18 2 0 2AA 0 0 0 0 2 17 4 0AO 0 0 0 0 0 1 16 0UH 0 0 0 0 0 0 0 18UW 0 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0 0

% corr. 100 100 75 80 90 85 80 90

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians63

Page 92: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 9 (UW)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians64

Page 93: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Confusion matrix

True classIY IH EH AE AH AA AO UH UW

IY 20 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0AA 0 0 0 0 2 17 4 0 0AO 0 0 0 0 0 1 16 0 0UH 0 0 0 0 0 0 0 18 5UW 0 0 0 0 0 0 0 0 15ER 0 0 4 0 0 0 0 0 0

% corr. 100 100 75 80 90 85 80 90 75

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians65

Page 94: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Test data for class 10 (ER)

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Vowel Test Data

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians66

Page 95: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Final confusion matrix

True classIY IH EH AE AH AA AO UH UW ER

IY 20 0 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0 0

EH 0 0 15 3 0 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0 0AA 0 0 0 0 2 17 4 0 0 0AO 0 0 0 0 0 1 16 0 0 0UH 0 0 0 0 0 0 0 18 5 2UW 0 0 0 0 0 0 0 0 15 0ER 0 0 4 0 0 0 0 0 0 18

% corr. 100 100 75 80 90 85 80 90 75 90Total: 86.5% correct

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians67

Page 96: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Classifying the training set

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

Peterson−Barney F1−F2 Vowel Training Data

F1 / HzF2 / Hz

IYIHEHAEAH

AA AOUH

UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians68

Page 97: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Training set confusion matrix

True classIY IH EH AE AH AA AO UH UW ER

IY 99 8 0 0 0 0 0 0 0 0IH 3 85 15 0 0 0 0 0 0 3

EH 0 7 69 11 0 0 0 0 0 11AE 0 0 5 86 4 0 0 0 0 4AH 0 0 0 3 87 8 3 2 0 1AA 0 0 0 0 4 82 10 0 0 0AO 0 0 0 0 5 12 86 2 0 0UH 0 0 0 0 0 0 2 73 19 10UW 0 0 0 0 0 0 1 15 79 1ER 0 2 13 2 2 0 0 10 4 72% 97.1 83.3 67.6 84.3 85.3 80.4 84.3 71.6 77.5 70.6

Total: 80.2% correct

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians69

Page 98: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Decision Regions

0 200 400 600 800 1000 1200500

1000

1500

2000

2500

3000

3500

F1 / Hz

F2 / H

z

Peterson−Barney F1−F2 Gaussian Decision Regions

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians70

Page 99: Multidimensional Gaussian distribution and classification ...€¦ · The multidimensional Gaussian distribution The d-dimensional vector x is multivariate Gaussian if it has a probability

Summary

Using Bayes’ theorem with pdfs

The Gaussian classifier: 1-dimensional and multi-dimensional

Vowel classification example

Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians71