Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s...

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: [email protected]

URL: http://www.zabaras.com/

August 7, 2014

1

mailto:[email protected]

http://mpdc.mae.cornell.edu/


Gamma Distribution as a Conjugate prior for the precision of a Gaussian

Student’s T, Student’s T Approaching a Gaussian

Robustness of Student’s T to Outliers

Multivariate Student’s T

The Laplace Distribution

2

Contents

• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

http://research.microsoft.com/en-us/um/people/cmbishop/PRML/index.htm

http://www.cs.ubc.ca/~murphyk/MLbook/





1/2

1/2 1

0

1/2

1/2 1

1/2 1 1

0

1( | , , ) exp

( ) 2

1 1exp

( ) 2

aa

aa

a

bp x a b z d

a

bz z z dz

a A

Gamma as a Conjugate Prior

3

We have seen that the conjugate prior for the precision of a Gaussian

is given by a Gamma distribution. If we have a univariate Gaussian

N(x|μ, τ−1) together with a prior Gamma(τ |a, b) and we integrate out the

precision, we obtain the marginal distribution of x

Introduce the transformation to simplify as:

1

0

1/22 1

0

( | , , ) | , | ,

exp2 2 ( )

aa b

p x a b x a b d

bx e d

a

N Gamma

21

2

A

z b x

http://www.zabaras.com/Courses/BayesianComputing/GaussianModels-BayesianInference.pdf



4

Recalling the definition of the Gamma function:

It is common to redefine the parameters in this distribution as:

1/2

1/2 1

1/2 1 1

0

1/2 1/22 1/2

0

1 1( | , , ) exp

( ) 2

1 1exp

( ) 2 2

aa

a

aaa

bp x a b z z z dz

a A

bb x z z dz

a

1

0

( ) exp aa z z dz

1/2 1/2

21 1 1( | , , ) ( )

( ) 2 2 2

aabp x a b b x a

a

2 ,a

ab

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

http://mathworld.wolfram.com/GammaFunction.html



5

The parameter λ is called the precision of the t-distribution, even

though it is not in general equal to the inverse of the variance (see

below on behavior as ν →∞).

The parameter ν is called the degrees of freedom.

For the particular case of ν = 1, the T-distribution reduces to the

Cauchy distribution.

In the limit ν →∞, the t-distribution T(x|μ, λ, ν) becomes a Gaussian

N(x|μ, λ−1) with mean μ and precision λ.

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

http://en.wikipedia.org/wiki/Cauchy_distribution


For ν →∞, T(x|μ, λ, ν) Becomes a Gaussian

6

We first write the distribution as follows:

For large n we can approximate the log as follows:

In the limit ν →∞, the T-distribution T(x|μ, λ, ν) is indeed a Gaussian

N(x|μ, λ−1) with mean μ and precision λ. The normalization of the T is

valid in this limit as well (so the Gaussian obtained is normalized).

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

/2 1/2

2 2

1( | , , ) 1 exp ln 1

2

x xx

T

2 2

2 11( | , , ) exp ( ) exp ( )

2 2

x xx O O

T



7

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4student distribution

v=10

v=1.0

v=0.1

1

0, 1

,

( , )

For we

obtain

n

N

MatLab Code

2

2

: , 1

:

:2

, 22

Mean

Mode

Var

http://www.zabaras.com/Courses/BayesianComputing/Software/Fig15Chapter2Bishop.m


We plot:

The mean and variance of the

Student’s is undefined for n=1.

Logs of the PDFs. The Student’s

is NOT log concave.

Run MatLab function studentLaplacePdfPlot

from Kevin Murphys’ PMTK

When n=1, the distribution is

known as Cauchy or Lorentz.

Due to its heavy tails, the mean

does not converge.

Recommended to use n=4.

Student’s T Vs the Gaussian

8

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8prob. density functions

Gauss

Student

Laplace

-4 -3 -2 -1 0 1 2 3 4-9

-8

-7

-6

-5

-4

-3

-2

-1

0log prob. density functions

Gauss

Student

Laplace

| 0,1 , ( | 0,1,1), | 0,1/ 2x x xN T Lap

http://pmtk3.googlecode.com/svn/trunk/demos/studentLaplacePdfPlot.m

http://pmtk3.googlecode.com/svn/trunk/demos/studentLaplacePdfPlot.m

https://code.google.com/p/pmtk3/



http://en.wikipedia.org/wiki/Cauchy_distribution



9

The Student’s T distribution can be seen from the equation

above is a mixture of infinite Gaussian each of them

with different precision.

The result is a distribution that in general has longer ‘tails’

than a Gaussian.

This gives the T-distribution robustness, i.e. the T-

distribution is much less sensitive than the Gaussian to the

presence of outliers.

1

0

1/22 1

0

( | , , ) | , | ,

exp2 2 ( )

aa b

p x a b x a b d

bx e d

a

N Gamma


-10 -8 -6 -4 -2 0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Robustness of Student’s T Distribution

10

The robustness of the T-distribution is illustrated here by comparing

the maximum likelihood solutions for a Gaussian and a T-distribution

(30 data points from the Gaussian are used). The effect of a small

number of outliers is less significant for the T-distribution than for the

Gaussian.

-10 -8 -6 -4 -2 0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

t-distribution &

Gaussian nearly

overlap

T-distribution

Gaussian

MatLab Code

http://www.zabaras.com/Courses/BayesianComputing/Software/Fig16Chapter2Bishop.m


Robustness of Student’s T Distribution

11

The earlier simulation is repeated here with the MPTK toolbox.

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

gaussian

student T

laplace

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

gaussian

student T

laplace

Run MatLab function robustDemo

from Kevin Murphys’ PMTK

http://searchcode.com/?q=robustDemo





Multivariate Student’s T Distribution

12

If we return to the prior above and substitute ,

and use

we can write the Student’s T distribution as :

This form is useful in providing generalization to a multivatiate Student’s

T

1

0

( | , , ) | , | ,p x a b x a b d

N Gamma

2 , , /a

a b ab

1

0

( | , , ) | , | / 2, / 2x x d

T N Gamma

1

0

( | , , ) | , | / 2, / 2 d

x x T N Gamma

1| ,( )

aa bb

a b ea

Gamma



13

This integral can be computed analytically as:

One can derive the above form of the distribution by substitution in the

Eq. on the top.

Normalization proof is immediate from the normalization of the normal &

Gamma distributions

1

0

( | , , ) | , | / 2, / 2 d

x x T N Gamma

/2 /21/2 2

/2

2

( )| |2 2( | , , ) 1

( )2

( )

D

D

T

D

x

x - x -

T

Mahalanobis Distance

2

1/2/2

/2 /2 1 /2 /2

/2

0

1/2 1/2/2/2 /2 /2 /2

2 /2 /2 1 2

/2 /2

0

/ 2( | , , )

/ 2 2

/ 2 / 2 / 2/ 2 / 2 1 /

/ 2 / 22

D

D

D DD

D D

e e d

de d

x

T 2/ 2 / 2Use



14

Some useful results of the multivatiate Student’s T are given below:

One can show easily the expression for the mean by using x=z+:

The 1st term drops out since is even. The 2nd term gives

from the normalization of the distribution.

The covariance is computed as:

/2 /21/2 2

/2

( )| |2 2( | , , ) 1

( )2

D

D

D

x

T

11, 2,2

if cov if mode

x x x

/2 /2

1/2 2

/2

( )| |2 2 1

( )2

D

D

D

d

x z z

( | , , )z T

/2

1 1 /2 1 /2

0 0

/2

1 1 1 1

/2 /2 1

/ 2cov | , | / 2, / 2

/ 2

/ 2 / 2 1 / 2 / 2 1 / 2

/ 2 / 2 / 2 1 2/ 2 / 2

Td d e d

x

x x x - x - x

N Gamma



15

Differentiation with respect to x also shows the mode being :

The Student’s T has fatter tails than a Gaussian. The smaller n is the

fatter the tails.

For n ∞, the distribution approaches a Gaussian. Indeed note that:

The distribution can also be written in terms of S=-1 (scale matrix – not

the covariance) or V=nS.

/2 /21/2 2

/2

( )| |2 2( | , , ) 1

( )2

D

D

D

x

T

11, 2,2

if cov if mode

x x x

/2 /2 2

2 2 2 2 211

1 exp ln 1 exp exp2 2 2 2 2

D

DO

http://en.wikipedia.org/wiki/Natural_logarithm


The Laplace Distribution Another distribution with heavy tails is the Laplace distribution, also

known as the double sided exponential distribution. It has the following

pdf:

is a location parameter and b > 0 is a scale parameter

Its robust to outliers (see an earlier demonstration).

It puts mores probability density at 0 than the Gaussian. This property

is a useful way to encourage sparsity in a model.

1

| ,2

x

bx b eb

Lap

2, , 2Mean Mode Var b

16

http://en.wikipedia.org/wiki/Laplace_distribution

Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s...

Documents

Transcript of Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s...