Multidimensional Gaussian distribution andclassification with Gaussians
Guido Sanguinetti
Informatics 2B— Learning and Data Lecture 96 March 2012
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians1
Overview
Today’s lecture
Gaussians
The multidimensional Gaussian distribution
Bayes theorem and probability density functions
The Gaussian classifier
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians2
(One-dimensional) Gaussian distribution
One-dimensional Gaussian with zero mean and unit variance(µ = 0, σ2 = 1):
−4 −3 −2 −1 0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
p(x|
m,s
)
pdf of Gaussian Distribution
mean=0variance=1
p(x |µ, σ2) = N(x ;µ, σ2) =1√
2πσ2exp
(−(x − µ)2
2σ2
)Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians3
The multidimensional Gaussian distribution
The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.
The 1-dimensional Gaussian is a special case of this pdf
The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4
The multidimensional Gaussian distribution
The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.
The 1-dimensional Gaussian is a special case of this pdf
The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4
The multidimensional Gaussian distribution
The d-dimensional vector x is multivariate Gaussian if it has aprobability density function of the following form:
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)The pdf is parameterized by the mean vector µ and thecovariance matrix Σ.
The 1-dimensional Gaussian is a special case of this pdf
The argument to the exponential 0.5(x− µ)T Σ−1(x− µ) isreferred to as a quadratic form.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians4
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;
If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Covariance matrix
The mean vector µ is the expectation of x:
µ = E [x]
The covariance matrix Σ is the expectation of the deviation ofx from the mean:
Σ = E [(x− µ)(x− µ)T ]
Σ is a d × d symmetric matrix:
Σij = E [(xi − µi )(xj − µj)] = E [(xj − µj)(xi − µi )] = Σji
The sign of the covariance helps to determine the relationshipbetween two components:
If xj is large when xi is large, then (xj − µj)(xi − µi ) will tendto be positive;If xj is small when xi is large, then (xj − µj)(xi − µi ) will tendto be negative.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians5
Correlation matrix
The covariance matrix is not scale-independent: Define thecorrelation coefficient:
ρ(xj , xk) = ρjk =Sjk√SjjSkk
Scale-independent (ie independent of the measurement units)and location-independent, ie:
ρ(xj , xk) = ρ(axj + b, sxk + t)
The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and
ρ(x , y) = +1 if y = ax + b a > 0
ρ(x , y) = −1 if y = ax + b a < 0
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6
Correlation matrix
The covariance matrix is not scale-independent: Define thecorrelation coefficient:
ρ(xj , xk) = ρjk =Sjk√SjjSkk
Scale-independent (ie independent of the measurement units)and location-independent, ie:
ρ(xj , xk) = ρ(axj + b, sxk + t)
The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and
ρ(x , y) = +1 if y = ax + b a > 0
ρ(x , y) = −1 if y = ax + b a < 0
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6
Correlation matrix
The covariance matrix is not scale-independent: Define thecorrelation coefficient:
ρ(xj , xk) = ρjk =Sjk√SjjSkk
Scale-independent (ie independent of the measurement units)and location-independent, ie:
ρ(xj , xk) = ρ(axj + b, sxk + t)
The correlation coefficient satisfies −1 ≤ ρ ≤ 1, and
ρ(x , y) = +1 if y = ax + b a > 0
ρ(x , y) = −1 if y = ax + b a < 0
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians6
Spherical Gaussian
−2−1.5
−1−0.5
00.5
11.5
2
−2
−1
0
1
20
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
x1
Surface plot of p(x1, x2)
x2
p(x 1, x
2)
x1
x 2
Contour plot of p(x1, x2)
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
µ =
(00
)Σ =
(1 00 1
)ρ12 = 0
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians7
Diagonal Covariance Gaussian
−4−2
02
4
−4−2
02
40
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
x1
Surface plot of p(x1, x2)
x2
p(x 1, x
2)
x1
x 2
Contour plot of p(x1, x2)
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ =
(00
)Σ =
(1 00 4
)ρ12 = 0
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians8
Full covariance Gaussian
−4−3
−2−1
01
23
4
−4
−2
0
2
40
0.02
0.04
0.06
0.08
0.1
x1
Surface plot of p(x1, x2)
x2
p(x 1, x
2)
x1
x 2
Contour plot of p(x1, x2)
−4 −3 −2 −1 0 1 2 3 4−4
−3
−2
−1
0
1
2
3
4
µ =
(00
)Σ =
(1 −1−1 4
)ρ12 = −0.5
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians9
Parameter estimation
It is possible to show that the mean vector µ̂ and covariancematrix Σ̂ that maximize the likelihood of the training data aregiven by:
µ̂ =1
N
N∑n=1
xn
Σ̂ =1
N
N∑n=1
(xn − µ̂)(xn − µ̂)T
The mean of the distribution is estimated by the sample meanand the covariance by the sample covariance
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians10
Example data
−4 −2 0 2 4 6 8 10−5
0
5
10
X1
X2
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians11
Maximum likelihood fit to a Gaussian
−4 −2 0 2 4 6 8 10−5
0
5
10
X1
X2
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians12
Bayes theorem and probability densities
Rules for probability densities are similar to those forprobabilities:
p(x , y) = p(x |y)p(y)
p(x) =
∫p(x , y)dy
We may mix probabilities of discrete variables and probabilitydensities of continuous variables:
p(x ,Z ) = p(x |Z )P(Z )
Bayes’ theorem for continuous data x and class C :
P(C |x) =p(x |C )P(C )
p(x)
P(C |x) ∝ p(x |C )P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13
Bayes theorem and probability densities
Rules for probability densities are similar to those forprobabilities:
p(x , y) = p(x |y)p(y)
p(x) =
∫p(x , y)dy
We may mix probabilities of discrete variables and probabilitydensities of continuous variables:
p(x ,Z ) = p(x |Z )P(Z )
Bayes’ theorem for continuous data x and class C :
P(C |x) =p(x |C )P(C )
p(x)
P(C |x) ∝ p(x |C )P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13
Bayes theorem and probability densities
Rules for probability densities are similar to those forprobabilities:
p(x , y) = p(x |y)p(y)
p(x) =
∫p(x , y)dy
We may mix probabilities of discrete variables and probabilitydensities of continuous variables:
p(x ,Z ) = p(x |Z )P(Z )
Bayes’ theorem for continuous data x and class C :
P(C |x) =p(x |C )P(C )
p(x)
P(C |x) ∝ p(x |C )P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians13
Bayes theorem and univariate Gaussians
If p(x | C ) is Gaussian with mean µc and variance σ2c :
P(C | x) ∝ p(x | C )P(C )
∝ N(x ;µc , σ2c )P(C )
∝ 1√2πσ2
c
exp
(−(x − µc)2
2σ2c
)P(C )
Taking logs, we have the log likelihood LL(x | C ):
LL(x | C ) = ln p(x | µc , σ2c )
=1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)The log posterior probability LP(C | x) is:
LP(C | x) ∝ LL(x | C ) + LP(C )
∝ 1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14
Bayes theorem and univariate Gaussians
If p(x | C ) is Gaussian with mean µc and variance σ2c :
P(C | x) ∝ p(x | C )P(C )
∝ N(x ;µc , σ2c )P(C )
∝ 1√2πσ2
c
exp
(−(x − µc)2
2σ2c
)P(C )
Taking logs, we have the log likelihood LL(x | C ):
LL(x | C ) = ln p(x | µc , σ2c )
=1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)
The log posterior probability LP(C | x) is:
LP(C | x) ∝ LL(x | C ) + LP(C )
∝ 1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14
Bayes theorem and univariate Gaussians
If p(x | C ) is Gaussian with mean µc and variance σ2c :
P(C | x) ∝ p(x | C )P(C )
∝ N(x ;µc , σ2c )P(C )
∝ 1√2πσ2
c
exp
(−(x − µc)2
2σ2c
)P(C )
Taking logs, we have the log likelihood LL(x | C ):
LL(x | C ) = ln p(x | µc , σ2c )
=1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)The log posterior probability LP(C | x) is:
LP(C | x) ∝ LL(x | C ) + LP(C )
∝ 1
2
(− ln(2π)− lnσ2
c −(x − µc)2
σ2c
)+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians14
Example: 1-dimensional Gaussian classifier
Two classes, S and T , with some observations:
Class S 10 8 10 10 11 11
Class T 12 9 15 10 13 13
Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:
µ(S) = 10 σ2(S) = 1
µ(T ) = 12 σ2(T ) = 4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15
Example: 1-dimensional Gaussian classifier
Two classes, S and T , with some observations:
Class S 10 8 10 10 11 11
Class T 12 9 15 10 13 13
Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:
µ(S) = 10 σ2(S) = 1
µ(T ) = 12 σ2(T ) = 4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians15
Gaussian pdfs for S and T
0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
p(x)
P(x|S)
P(x(T)
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians16
Example: 1-dimensional Gaussian classifier
Two classes, S and T , with some observations:
Class S 10 8 10 10 11 11
Class T 12 9 15 10 13 13
Assume that each class may be modelled by a Gaussian. Themean and variance of each pdf are estimated by the samplemean and sample variance:
µ(S) = 10 σ2(S) = 1
µ(T ) = 12 σ2(T ) = 4
The following unlabelled data points are available:
x1 = 10 x2 = 11 x3 = 6
To which class should each of the data points be assigned?Assume the two classes have equal prior probabilities.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians17
Log odds
Take the log odds (posterior probability ratios):
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)+ ln P(S)− ln P(T )
In the example the priors are equal, so:
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)If log odds are less than 0 assign to T , otherwise assign to S .
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18
Log odds
Take the log odds (posterior probability ratios):
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)+ ln P(S)− ln P(T )
In the example the priors are equal, so:
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)
If log odds are less than 0 assign to T , otherwise assign to S .
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18
Log odds
Take the log odds (posterior probability ratios):
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)+ ln P(S)− ln P(T )
In the example the priors are equal, so:
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)If log odds are less than 0 assign to T , otherwise assign to S .
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians18
Log odds
5 6 7 8 9 10 11 12 13 14 15−12
−10
−8
−6
−4
−2
0
2
x
ln P
(S|x
)/P(T
|x)
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians19
Example: unequal priors
Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?
Again compute the log odds:
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)+ ln P(S)− ln P(T )
= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)+ ln P(S)− ln P(T )
= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)+ ln(3/7)
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20
Example: unequal priors
Now, assume P(S) = 0.3, P(T ) = 0.7. Including this priorinformation, to which class should each of the above test datapoints (x1, x2, x3) be assigned?
Again compute the log odds:
lnP(S |X = x)
P(T |X = x)= −1
2
((x − µS)2
σ2S
− (x − µT )2
σ2T
+ lnσ2S − lnσ2
T
)+ ln P(S)− ln P(T )
= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)+ ln P(S)− ln P(T )
= −1
2
((x − 10)2 − (x − 12)2
4− ln 4
)+ ln(3/7)
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians20
Log odds
5 6 7 8 9 10 11 12 13 14 15−12
−10
−8
−6
−4
−2
0
2
x
ln P
(S|x
)/P(T
|x)
P(S)=0.5P(S)=0.3
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians21
Multivariate Gaussian classifier
Multivariate Gaussian (in d dimensions):
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)
Log likelihood:
LL(x|µ,Σ) = −d
2ln(2π)− 1
2ln |Σ| − 1
2(x− µ)T Σ−1(x− µ)
If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:
ln P(C |x) ∝ −1
2(x− µ)T Σ−1(x− µ)− 1
2ln |Σ|+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22
Multivariate Gaussian classifier
Multivariate Gaussian (in d dimensions):
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)Log likelihood:
LL(x|µ,Σ) = −d
2ln(2π)− 1
2ln |Σ| − 1
2(x− µ)T Σ−1(x− µ)
If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:
ln P(C |x) ∝ −1
2(x− µ)T Σ−1(x− µ)− 1
2ln |Σ|+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22
Multivariate Gaussian classifier
Multivariate Gaussian (in d dimensions):
p(x|µ,Σ) =1
(2π)d/2|Σ|1/2exp
(−1
2(x− µ)T Σ−1(x− µ)
)Log likelihood:
LL(x|µ,Σ) = −d
2ln(2π)− 1
2ln |Σ| − 1
2(x− µ)T Σ−1(x− µ)
If p(x | C ) ∼ p(x | µ,Σ), the log posterior probability is:
ln P(C |x) ∝ −1
2(x− µ)T Σ−1(x− µ)− 1
2ln |Σ|+ ln P(C )
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians22
Example
2-dimensional data from three classes (A, B, C ).
The classes have equal prior probabilities.
200 points in each class
Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:
xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23
Example
2-dimensional data from three classes (A, B, C ).
The classes have equal prior probabilities.
200 points in each class
Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:
xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23
Example
2-dimensional data from three classes (A, B, C ).
The classes have equal prior probabilities.
200 points in each class
Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:
xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23
Example
2-dimensional data from three classes (A, B, C ).
The classes have equal prior probabilities.
200 points in each class
Load into Matlab (n × 2 matrices, each row is a data point)and display using a scatter plot:
xa = load(’trainA.dat’);xb = load(’trainB.dat’);xc = load(’trainC.dat’);hold on;scatter(xa(:, 1), xa(:,2), ’r’, ’o’);scatter(xb(:, 1), xb(:,2), ’b’, ’x’);scatter(xc(:, 1), xc(:,2), ’c’, ’*’);
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians23
Training data
−8 −6 −4 −2 0 2 4 6−8
−6
−4
−2
0
2
4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians24
Gaussians estimated from training data
−8 −6 −4 −2 0 2 4 6−8
−6
−4
−2
0
2
4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians25
Testing data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians26
Testing data — with estimated class distributions
−8 −6 −4 −2 0 2 4 6−8
−6
−4
−2
0
2
4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians27
Testing data — with true classes indicated
−8 −6 −4 −2 0 2 4 6−8
−6
−4
−2
0
2
4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians28
Classifying test data from class A
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians29
Classifying test data from class B
−8 −6 −4 −2 0 2 4 6−8
−6
−4
−2
0
2
4
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians30
Classifying test data from class C
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians31
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)
Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labels
Element (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class r
Total number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Results
Analyze results by percent correct, and in more detail with aconfusion matrix
Rows of a confusion matrix correspond to the predicted classes(classifier outputs)Columns correspond to the true class labelsElement (r , c) is the number of patterns from true class c thatwere classified as class rTotal number of correctly classified patterns is obtained bysumming the numbers on the leading diagonal
Confusion matrix in this case:
True classTest Data A B C
Predicted A 77 5 9class B 15 88 2
C 8 7 89
Overall proportion of test patterns correctly classified is(77 + 88 + 89)/300 = 254/300 = 0.85.Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians32
Decision Regions
−8 −6 −4 −2 0 2 4 6 8−8
−6
−4
−2
0
2
4
6
8
x1
x 2
Decision regions for 3−class example
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians33
Example: Classifying spoken vowels
10 Spoken vowels in American English
Vowels can be characterised by formant frequencies —resonances of vocal tract
there are usually three or four identifiable formantsfirst two formants written as F1 and F2
Peterson-Barney data — recordings of spoken vowels byAmerican men, women, and children
two examples of each vowel per personfor this example, data split into training and test setschildren’s data not used in this exampledifferent speakers in training and test sets
(see http://en.wikipedia.org/wiki/Vowel for more)
Classify the data using a Gaussian classifier
Assume equal priors
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians34
The data
Ten steady-state vowels, frequencies of F1 and F2 at their centre:
IY — “bee”
IH — “big”
EH — “red”
AE — “at”
AH — “honey”
AA — “heart”
AO — “frost”
UH — “could”
UW — “you”
ER — “bird”
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians35
Vowel data — 10 classes
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians36
Gaussian for class 1 (IY)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / HzF2 / Hz
IYIHEHAEAHAAAOUHUWER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians37
Gaussian for class 2 (IH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians38
Gaussian for class 3 (EH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians39
Gaussian for class 4 (AE)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians40
Gaussian for class 5 (AH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians41
Gaussian for class 6 (AA)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians42
Gaussian for class 7 (AO)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians43
Gaussian for class 8 (UH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians44
Gaussian for class 9 (UW)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / Hz
F2 / H
z
IY
IH
EH
AE
AH
AA
AO
UH
UW
ER
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians45
Gaussian for class 10 (ER)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / HzF2 / Hz
IYIHEHAEAH
AA AOUH
UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians46
Gaussians for each class
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians47
Test data for class 1 (IY)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians48
Confusion matrix
True classIY
IY 20IH 0
EH 0AE 0AH 0AA 0AO 0UH 0UW 0ER 0
% corr. 100
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians49
Test data for class 2 (IH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians50
Confusion matrix
True classIY IH
IY 20 0IH 0 20
EH 0 0AE 0 0AH 0 0AA 0 0AO 0 0UH 0 0UW 0 0ER 0 0
% corr. 100 100
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians51
Test data for class 3 (EH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians52
Confusion matrix
True classIY IH EH
IY 20 0 0IH 0 20 0
EH 0 0 15AE 0 0 1AH 0 0 0AA 0 0 0AO 0 0 0UH 0 0 0UW 0 0 0ER 0 0 4
% corr. 100 100 75
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians53
Test data for class 4 (AE)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians54
Confusion matrix
True classIY IH EH AE
IY 20 0 0 0IH 0 20 0 0
EH 0 0 15 3AE 0 0 1 16AH 0 0 0 1AA 0 0 0 0AO 0 0 0 0UH 0 0 0 0UW 0 0 0 0ER 0 0 4 0
% corr. 100 100 75 80
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians55
Test data for class 5 (AH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians56
Confusion matrix
True classIY IH EH AE AH
IY 20 0 0 0 0IH 0 20 0 0 0
EH 0 0 15 3 0AE 0 0 1 16 0AH 0 0 0 1 18AA 0 0 0 0 2AO 0 0 0 0 0UH 0 0 0 0 0UW 0 0 0 0 0ER 0 0 4 0 0
% corr. 100 100 75 80 90
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians57
Test data for class 6 (AA)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians58
Confusion matrix
True classIY IH EH AE AH AA
IY 20 0 0 0 0 0IH 0 20 0 0 0 0
EH 0 0 15 3 0 0AE 0 0 1 16 0 0AH 0 0 0 1 18 2AA 0 0 0 0 2 17AO 0 0 0 0 0 1UH 0 0 0 0 0 0UW 0 0 0 0 0 0ER 0 0 4 0 0 0
% corr. 100 100 75 80 90 85
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians59
Test data for class 7 (AO)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians60
Confusion matrix
True classIY IH EH AE AH AA AO
IY 20 0 0 0 0 0 0IH 0 20 0 0 0 0 0
EH 0 0 15 3 0 0 0AE 0 0 1 16 0 0 0AH 0 0 0 1 18 2 0AA 0 0 0 0 2 17 4AO 0 0 0 0 0 1 16UH 0 0 0 0 0 0 0UW 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0
% corr. 100 100 75 80 90 85 80
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians61
Test data for class 8 (UH)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians62
Confusion matrix
True classIY IH EH AE AH AA AO UH
IY 20 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0
EH 0 0 15 3 0 0 0 0AE 0 0 1 16 0 0 0 0AH 0 0 0 1 18 2 0 2AA 0 0 0 0 2 17 4 0AO 0 0 0 0 0 1 16 0UH 0 0 0 0 0 0 0 18UW 0 0 0 0 0 0 0 0ER 0 0 4 0 0 0 0 0
% corr. 100 100 75 80 90 85 80 90
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians63
Test data for class 9 (UW)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians64
Confusion matrix
True classIY IH EH AE AH AA AO UH UW
IY 20 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0
EH 0 0 15 3 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0AA 0 0 0 0 2 17 4 0 0AO 0 0 0 0 0 1 16 0 0UH 0 0 0 0 0 0 0 18 5UW 0 0 0 0 0 0 0 0 15ER 0 0 4 0 0 0 0 0 0
% corr. 100 100 75 80 90 85 80 90 75
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians65
Test data for class 10 (ER)
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Vowel Test Data
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians66
Final confusion matrix
True classIY IH EH AE AH AA AO UH UW ER
IY 20 0 0 0 0 0 0 0 0 0IH 0 20 0 0 0 0 0 0 0 0
EH 0 0 15 3 0 0 0 0 0 0AE 0 0 1 16 0 0 0 0 0 0AH 0 0 0 1 18 2 0 2 0 0AA 0 0 0 0 2 17 4 0 0 0AO 0 0 0 0 0 1 16 0 0 0UH 0 0 0 0 0 0 0 18 5 2UW 0 0 0 0 0 0 0 0 15 0ER 0 0 4 0 0 0 0 0 0 18
% corr. 100 100 75 80 90 85 80 90 75 90Total: 86.5% correct
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians67
Classifying the training set
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
Peterson−Barney F1−F2 Vowel Training Data
F1 / HzF2 / Hz
IYIHEHAEAH
AA AOUH
UWER Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians68
Training set confusion matrix
True classIY IH EH AE AH AA AO UH UW ER
IY 99 8 0 0 0 0 0 0 0 0IH 3 85 15 0 0 0 0 0 0 3
EH 0 7 69 11 0 0 0 0 0 11AE 0 0 5 86 4 0 0 0 0 4AH 0 0 0 3 87 8 3 2 0 1AA 0 0 0 0 4 82 10 0 0 0AO 0 0 0 0 5 12 86 2 0 0UH 0 0 0 0 0 0 2 73 19 10UW 0 0 0 0 0 0 1 15 79 1ER 0 2 13 2 2 0 0 10 4 72% 97.1 83.3 67.6 84.3 85.3 80.4 84.3 71.6 77.5 70.6
Total: 80.2% correct
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians69
Decision Regions
0 200 400 600 800 1000 1200500
1000
1500
2000
2500
3000
3500
F1 / Hz
F2 / H
z
Peterson−Barney F1−F2 Gaussian Decision Regions
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians70
Summary
Using Bayes’ theorem with pdfs
The Gaussian classifier: 1-dimensional and multi-dimensional
Vowel classification example
Informatics 2B: Learning and Data Lecture 9 Multidimensional Gaussian distribution and classification with Gaussians71
Top Related