Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s...

16
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) Student’s T Distribution Prof. Nicholas Zabaras School of Engineering University of Warwick Coventry CV4 7AL United Kingdom Email: [email protected] URL: http://www.zabaras.com/ August 7, 2014 1

Transcript of Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s...

Page 1: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

Prof. Nicholas Zabaras

School of Engineering

University of Warwick

Coventry CV4 7AL

United Kingdom

Email: [email protected]

URL: http://www.zabaras.com/

August 7, 2014

1

Page 2: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Gamma Distribution as a Conjugate prior for the precision of a Gaussian

Student’s T, Student’s T Approaching a Gaussian

Robustness of Student’s T to Outliers

Multivariate Student’s T

The Laplace Distribution

2

Contents

• Following closely Chris Bishops’ PRML book, Chapter 2

• Kevin Murphy’s, Machine Learning: A probablistic perspective, Chapter 2

Page 3: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

1/2

1/2 1

0

1/2

1/2 1

1/2 1 1

0

1( | , , ) exp

( ) 2

1 1exp

( ) 2

aa

aa

a

bp x a b z d

a

bz z z dz

a A

Gamma as a Conjugate Prior

3

We have seen that the conjugate prior for the precision of a Gaussian

is given by a Gamma distribution. If we have a univariate Gaussian

N(x|μ, τ−1) together with a prior Gamma(τ |a, b) and we integrate out the

precision, we obtain the marginal distribution of x

Introduce the transformation to simplify as:

1

0

1/22 1

0

( | , , ) | , | ,

exp2 2 ( )

aa b

p x a b x a b d

bx e d

a

N Gamma

21

2

A

z b x

Page 4: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

4

Recalling the definition of the Gamma function:

It is common to redefine the parameters in this distribution as:

1/2

1/2 1

1/2 1 1

0

1/2 1/22 1/2

0

1 1( | , , ) exp

( ) 2

1 1exp

( ) 2 2

aa

a

aaa

bp x a b z z z dz

a A

bb x z z dz

a

1

0

( ) exp aa z z dz

1/2 1/2

21 1 1( | , , ) ( )

( ) 2 2 2

aabp x a b b x a

a

2 ,a

ab

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

Page 5: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

5

The parameter λ is called the precision of the t-distribution, even

though it is not in general equal to the inverse of the variance (see

below on behavior as ν →∞).

The parameter ν is called the degrees of freedom.

For the particular case of ν = 1, the T-distribution reduces to the

Cauchy distribution.

In the limit ν →∞, the t-distribution T(x|μ, λ, ν) becomes a Gaussian

N(x|μ, λ−1) with mean μ and precision λ.

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

Page 6: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

For ν →∞, T(x|μ, λ, ν) Becomes a Gaussian

6

We first write the distribution as follows:

For large n we can approximate the log as follows:

In the limit ν →∞, the T-distribution T(x|μ, λ, ν) is indeed a Gaussian

N(x|μ, λ−1) with mean μ and precision λ. The normalization of the T is

valid in this limit as well (so the Gaussian obtained is normalized).

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

/2 1/2

2 2

1( | , , ) 1 exp ln 1

2

x xx

T

2 2

2 11( | , , ) exp ( ) exp ( )

2 2

x xx O O

T

Page 7: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

7

/2 1/2

21/21

( )2 2( | , , ) 1

( )2

xp x

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4student distribution

v=10

v=1.0

v=0.1

1

0, 1

,

( , )

For we

obtain

n

N

MatLab Code

2

2

: , 1

:

:2

, 22

Mean

Mode

Var

Page 8: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

We plot:

The mean and variance of the

Student’s is undefined for n=1.

Logs of the PDFs. The Student’s

is NOT log concave.

Run MatLab function studentLaplacePdfPlot

from Kevin Murphys’ PMTK

When n=1, the distribution is

known as Cauchy or Lorentz.

Due to its heavy tails, the mean

does not converge.

Recommended to use n=4.

Student’s T Vs the Gaussian

8

-4 -3 -2 -1 0 1 2 3 40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8prob. density functions

Gauss

Student

Laplace

-4 -3 -2 -1 0 1 2 3 4-9

-8

-7

-6

-5

-4

-3

-2

-1

0log prob. density functions

Gauss

Student

Laplace

| 0,1 , ( | 0,1,1), | 0,1/ 2x x xN T Lap

Page 9: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Student’s T Distribution

9

The Student’s T distribution can be seen from the equation

above is a mixture of infinite Gaussian each of them

with different precision.

The result is a distribution that in general has longer ‘tails’

than a Gaussian.

This gives the T-distribution robustness, i.e. the T-

distribution is much less sensitive than the Gaussian to the

presence of outliers.

1

0

1/22 1

0

( | , , ) | , | ,

exp2 2 ( )

aa b

p x a b x a b d

bx e d

a

N Gamma

Page 10: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

-10 -8 -6 -4 -2 0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Robustness of Student’s T Distribution

10

The robustness of the T-distribution is illustrated here by comparing

the maximum likelihood solutions for a Gaussian and a T-distribution

(30 data points from the Gaussian are used). The effect of a small

number of outliers is less significant for the T-distribution than for the

Gaussian.

-10 -8 -6 -4 -2 0 2 4 6 8 10 120

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

t-distribution &

Gaussian nearly

overlap

T-distribution

Gaussian

MatLab Code

Page 11: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Robustness of Student’s T Distribution

11

The earlier simulation is repeated here with the MPTK toolbox.

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

gaussian

student T

laplace

-5 0 5 100

0.1

0.2

0.3

0.4

0.5

gaussian

student T

laplace

Run MatLab function robustDemo

from Kevin Murphys’ PMTK

Page 12: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Student’s T Distribution

12

If we return to the prior above and substitute ,

and use

we can write the Student’s T distribution as :

This form is useful in providing generalization to a multivatiate Student’s

T

1

0

( | , , ) | , | ,p x a b x a b d

N Gamma

2 , , /a

a b ab

1

0

( | , , ) | , | / 2, / 2x x d

T N Gamma

1

0

( | , , ) | , | / 2, / 2 d

x x T N Gamma

1| ,( )

aa bb

a b ea

Gamma

Page 13: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Student’s T Distribution

13

This integral can be computed analytically as:

One can derive the above form of the distribution by substitution in the

Eq. on the top.

Normalization proof is immediate from the normalization of the normal &

Gamma distributions

1

0

( | , , ) | , | / 2, / 2 d

x x T N Gamma

/2 /21/2 2

/2

2

( )| |2 2( | , , ) 1

( )2

( )

D

D

T

D

x

x - x -

T

Mahalanobis Distance

2

1/2/2

/2 /2 1 /2 /2

/2

0

1/2 1/2/2/2 /2 /2 /2

2 /2 /2 1 2

/2 /2

0

/ 2( | , , )

/ 2 2

/ 2 / 2 / 2/ 2 / 2 1 /

/ 2 / 22

D

D

D DD

D D

e e d

de d

x

T 2/ 2 / 2Use

Page 14: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Student’s T Distribution

14

Some useful results of the multivatiate Student’s T are given below:

One can show easily the expression for the mean by using x=z+:

The 1st term drops out since is even. The 2nd term gives

from the normalization of the distribution.

The covariance is computed as:

/2 /21/2 2

/2

( )| |2 2( | , , ) 1

( )2

D

D

D

x

T

11, 2,2

if cov if mode

x x x

/2 /2

1/2 2

/2

( )| |2 2 1

( )2

D

D

D

d

x z z

( | , , )z T

/2

1 1 /2 1 /2

0 0

/2

1 1 1 1

/2 /2 1

/ 2cov | , | / 2, / 2

/ 2

/ 2 / 2 1 / 2 / 2 1 / 2

/ 2 / 2 / 2 1 2/ 2 / 2

Td d e d

x

x x x - x - x

N Gamma

Page 15: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Multivariate Student’s T Distribution

15

Differentiation with respect to x also shows the mode being :

The Student’s T has fatter tails than a Gaussian. The smaller n is the

fatter the tails.

For n ∞, the distribution approaches a Gaussian. Indeed note that:

The distribution can also be written in terms of S=-1 (scale matrix – not

the covariance) or V=nS.

/2 /21/2 2

/2

( )| |2 2( | , , ) 1

( )2

D

D

D

x

T

11, 2,2

if cov if mode

x x x

/2 /2 2

2 2 2 2 211

1 exp ln 1 exp exp2 2 2 2 2

D

DO

Page 16: Student’s T Distribution - Purdue University€¦ · Student’s T Distribution 9 The Student’s T distribution can be seen from the equation above is a mixture of infinite Gaussian

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

The Laplace Distribution Another distribution with heavy tails is the Laplace distribution, also

known as the double sided exponential distribution. It has the following

pdf:

is a location parameter and b > 0 is a scale parameter

Its robust to outliers (see an earlier demonstration).

It puts mores probability density at 0 than the Gaussian. This property

is a useful way to encourage sparsity in a model.

1

| ,2

x

bx b eb

Lap

2, , 2Mean Mode Var b

16