t4003specialdistributions_16

Course 003: Basic Econometrics, 2015-2016

Topic 4: Some Special Distributions

Rohini Somanathan

Course 003, 2015-2016

Page 0 Rohini Somanathan

Course 003: Basic Econometrics, 2015-2016'

&

$

%

Parametric Families of Distributions

• Some families of probability distributions are frequently used because they have a small

number of parameters and are good approximations for the experiments and events we are

interested in analyzing.

• Examples:

– For modeling the distribution of income or consumption expenditure, we want a density

which is skewed to the right ( gamma, weibull, lognormal..)

– IQs, heights, weights, arm circumference are quite symmetric around a mode (normal

or truncated normal)

– number of successes in a given number of trials (binomial)

– the time to failure for a machine or person (gamma, exponential)

• We refer to these probability density functions by f(x;θ) where θ refers to a parameter

vector.

• A given choice of θ therefore leads to a given probability density function.

• Ω is used to denote the parameter space.



&

$

%

The discrete uniform distribution

• Parameter: N

• Probability function: f(x;N) = 1N I(1,2,...,N)(x)

• Moments:

µ =∑xf(x) =

1

N

N(N+ 1)

2=(N+ 1)

2

σ2 =∑x2f(x) − µ2 =

1

N

N(N+ 1)(2N+ 1)

6−((N+ 1)

2

)2

=N2 − 1

12

• MGF:∑Nj=1

ejt

N

• Applications: experiments with equally likely outcomes (dice, coins..) Can you think of

applications in economics?



&

$

%

The Bernoulli distribution

• Parameter: p ∈Ω, 0≤ p≤ 1

• Probability function: f(x;p) = px(1 −p)1−x I(0,1)(x)

• Moments:

µ =∑xf(x) = 1.p1(1 −p)0 + 0.p0((1 −p)1 = p

σ2 =∑x2f(x)−µ2 = p(1 −p)

• MGF: etp+ e0(1 −p) = pet+(1 −p)

• Applications: experiments with two possible outcomes: success or failure, defective or not

defective, male or female, etc.



&

$

%

The Binomial distribution.

• Parameters: (n,p) ∈Ω, 0≤ p≤ 1 and n is a positive integer

• Probability function: An observed sequence of n Bernoulli trials can be represented by an

n-tuple of zeros and ones. The number of ways to achieve x ones is given by(nx

)= n!x!(n−x)! .

The probability of x successes in n trials is therefore:

f(x;n,p) =

(nx

)px(1 −p)n−x x = 0, 1, 2, . . .n

0 otherwise

Notice that sincen∑x=0

(nx

)axbn−x = (a+b)n,

n∑x=0f(x) = [(p+(1 −p)]n = 1 so we have a valid

density function.

• MGF:The MGF is given by:∑xetxf(x) =

n∑x=0etx

(nx

)px(1 −p)n−x =

n∑x=0

(nx

)(pet)x(1 −p)n−x = [(1 −p)+pet]n

• Moments: The MGF can be used to derive µ = np and σ2 = np(1 −p)

• Result: If X1, . . .Xk are independent random variables and if each Xi has a binomial

distribution with parameters ni and p, then the sum X1 + · · ·+Xk has a binomial

distribution with parameters n = n1 + · · ·+nk and p.



&

$

%

The Multinomial distributionSuppose there are a small number of different outcomes (methods of public transport, water

purification etc. ) The Multinomial distribution gives us the probability associated with a

particular vector of these outcomes:

• Parameters: (n,p1, . . .pm) ∈Ω, 0≤ pi ≤ 1,∑ipi = 1 and n is a positive integer

• Probability function:

f(x1, . . .xm;n,p1, . . .pm) =

n!m∏i=1xi!

m∏i=1pxii x = 0, 1, 2, . . .n,

∑mi xi = n

0 otherwise

• MGF: MX(t) =(m∑i=1pie

ti

)n• Moments: µi = npi, σ

2i = npi(1 −pi)



&

$

%

Geometric and Negative Binomial distributions

• The Negative Binomial (or Pascal) distribution gives us the probability that x failures will

occur before r successes are achieved. This means that the rth success occurs on the

(x+ r)th trial.

– Parameters: (r,p) ∈Ω, 0≤ pi ≤ 1,∑ipi = 1 and r is a positive integer

– Density: For the rth success occurs on the (x+ r)th trial, we require (r− 1) successes in

the first (x+ r− 1) trials. We therefore obtain the density:

f(x;r,p) =

(r+ x− 1

x

)prqx, x = 0, 1, 2, 3...

• The geometric distribution is a special case of the negative binomial with r = 1.

– The density in this case takes the form f(x|1,p) = pqx over all natural numbers x

– the MGF is given by E(etX) = p∑∞x=0(qe

t)x = p1−qet

for t < log( 1q)

– We can use this function to get the mean and variance, µ = qp and σ2 = q

p2

– The negative binomial is just a sum of r geometric variables, and the MGF is therefore

( p1−qet

)r and the corresponding mean and variance is µ = rqp and σ2 = rq

p2

– The geometric distribution is memory-less, so the conditional probability of k+ t

failures given k failures is the unconditional probability of t failures,

P(X = k+ t|X≥ k) = P(X = t)



&

$

%

Discrete Distributions: Poisson• Parameter: λ ∈Ω,λ > 0

• Probability function:

f(x;λ) =

e−λλx

x! , x = 0, 1, 2, . . . ,

0 otherwise

Using the result that the series 1 + λ+ λ2

2! + λ3

3! + . . . converges to eλ,∑xf(x) =

∞∑x=0

e−λλx

x! = e−λ∞∑x=0

λx

x! = e−λeλ = 1 so we have a valid density.

• Moments: µ = λ = σ2

• MGF: E(etX) =∞∑x=0

etxe−λλx

x! = e−λ∞∑x=0

(λet)x

x! = eλ(et−1)

• The MGF can be used to get the first and second moments about the origin, λ and λ2 + λ

so the mean and the variance are both λ.

• We can also use the product of k identical MGFs to show that the sum of k independently

distributed Poisson variables has a Poisson distribution with mean λ1 + . . .λk.



&

$

%

A Poisson processSuppose that the number of type A outcomes that occur over a fixed interval of time, [0, t]

follows a process in which

1. The probability that precisely one type A outcome will occur in a small interval of time ∆t

is approximately proportional to the length of the interval:

g(1,∆t) = γ∆t+o(∆t)

where o(∆t) denotes a function of ∆t having the property that lim∆t→0o(∆t)∆t = 0.

2. The probability that two or more type A outcomes will occur in a small interval of time ∆t

is negligible: ∞∑x=2

g(x,∆t) = o(∆t)

3. The numbers of type A outcomes that occur in nonoverlapping time intervals are

independent events.

These conditions imply a process which is stationary over the period of observation, i.e the

probability of an occurrence must be the same over the entire period with neither busy nor quiet

intervals.



&

$

%

Poisson densities representing poisson processes

RESULT: Consider a Poisson process with the rate γ per unit of time. The number of events in

a time interval t is a Poisson density with mean λ = γt.

Applications:

• the number of weaving defects in a yard of handloom cloth or stitching defects in a shirt

• the number of traffic accidents on a motorway in an hour

• the number of particles of a noxious substance that come out of chimney in a given period

of time

• the number of times a machine breaks down each week

Example:

• let the probability of exactly one blemish in a foot of wire be 11000 and that of two or more

blemishes be zero.

• we’re interested in the number of blemishes in 3, 000 feet of wire.

• if the numbers of blemishes in non-overlapping intervals are assumed to be independently

distributed, then our random variable X follows a poisson process with λ = γt = 3 and

P(X = 5) =35e−3

5!

• you can plug this into a computer, or alternatively use tables to compute f(5; 3) = .101



&

$

%

The Poisson as a limiting distributionWe can show that a binomial distribution with large n and small p can be approximated by a

Poisson ( which is computationally easier).

• useful result: ev = limn→∞(1 + vn)n

• We can rewrite the binomial density for non-zero values as

f(x;n,p) =

x∏i=1

(n−i+1)

x! px(1 −p)n−x. If np = λ, we can subsitute for p by(λn

)to get

limn→∞f(x;n,p) = limn→∞x∏i=1

(n− i+ 1)

x!

(λn

)x(1 −

λ

n

)n−x

= limn→∞x∏i=1

(n− i+ 1)

nxλx

x!

(1 −

λ

n

)n(1 −

λ

n

)−x= limn→∞[n

n.(n− 1)

n. . . .

(n− x+ 1)

n

λx

x!

(1 −

λ

n

)n(1 −

λ

n

)−x]=e−λλx

x!

(using the above result and the property that the limit of a product is the product of the

limits)



&

$

%

Poisson as a limiting distribution...example

• We have a 300 page novel with 1, 500 letters on each page.

• Typing errors are as likely to occur for one letter as for another, and the probability of such

an error is given by p = 10−5.

• The total number of letters n = (300) ∗ (1500) = 450, 000

• Using λ = np, the poisson distribution function gives us the probability of the number of

errors being less than or equal to 10 as:

P(x≤ 10)≈10∑x=0

e−4.5(4.5)x

x!= .9933

Rules of Thumb: close to binomial probabilities when n≥ 20 and p≤ .05, excellent when n≥ 100

and np≤ 10.



&

$

%

Discrete distributions: Hypergeometric

• Suppose, like in the case of the binomial, there are two possible outcomes and we’re

interested in the probability of x values of a particular outcome, but we are drawing

randomly without replacement so our trials are not independent.

• In particular, suppose there are A+B objects from which we pick n, A of the total number

available are of one type (red balls) and the rest are of the other (blue balls).

• If the random variable is the total number of red balls selected, then, for appropriate values

of x, we have f(x;A,B,n) =(Ax)(

Bn−x)

(A+Bn )

• Over what values of x is this defined? max0,n−B, ≤ X≤minn,A

• The multivariate extension is (for xi ∈ 0, 1, 2..n,n∑i=1xi = n and

m∑i=1Ki =M ):

f(x1 . . .xm;K1 . . .Km,n) =

m∏j=1

(Kjxj

)(Mn

)



&

$

%

Continuous distributions: uniform or rectangular

• Parameters: (a,b) ∈Ω, (a,b) : ∞ < a < b <∞• Density: f(x;a,b) = 1

b−a I(a,b)(x) (hence the name rectangular)

• Moments: µ =(a+b)

2 , σ2 =(b−a)2

12

• Applications:

– to construct the probability space of an experiment in which any outcome in the

interval [a,b] is equally likely.

– to generate random samples from other distributions (based on the probability integral

transformation). This is part of your first lab assignment.



&

$

%

The gamma functionThe gamma function is a special mathematical function that is widely used in statistics. The

gamma function of α is defined as

Γ(α) =

∞∫0

yα−1e−ydy (1)

• If α = 1, Γ(α) =∞∫0

e−ydy = −e−y∣∣∣∞0

= 1

• If α > 1, we can integrate (1) by parts, setting u = yα−1 and dv = e−y and using the formula∞∫0

udv = uv∣∣∣∞0

−∞∫0

vdu to get: −yα−1

ey

∣∣∣∞0

+(α− 1)∞∫0

yα−2e−ydy

• The first term in the above expression is zero because the exponential function goes to zero

faster than any polynomial and we obtain

Γ(α) = (α− 1)Γ(α− 1)

and for any integer α > 1, we have

Γ(α) = (α− 1)(α− 2)(α− 3) . . . (3)(2)(1)Γ(1) = (α− 1)!



&

$

%

The gamma distribution

Define the variable x by y = xβ , where β > 0. Then dy = 1

βdx and can rewrite Γ(α) as

Γ(α) =

∞∫0

( xβ

)α−1e− xβ

( 1

β

)dx

or as

1 =

∞∫0

1

Γ(α)βαxα−1e

− xβdx

This shows that for α,β > 0,

f(x;α,β) =1

Γ(α)βαxα−1e

− xβ I(0,∞)(x)

is a valid density and is known as a gamma-type probability density function.



&

$

%

Features of the gamma density

This is a valuable distribution because it can take a variety of shapes depending on the values of

the parameters α and β

• It is skewed to the right

• It is strictly decreasing when α≤ 1

• If α = 1, we have the exponential density, which is memory-less.

• For α > 1 the density attains it maximum at x = (α− 1)β320 Chapter 5 Special Distributions

Figure 5.7 Graphs of thep.d.f.’s of several differentgamma distributions withcommon mean of 1.

Gam

ma

p.d.

f.

x

a ! 0.1, b ! 0.1a ! 1, b ! 1a ! 2, b ! 2a ! 3, b ! 3

0.2

0

0.4

0.6

0.8

1.0

1.2

1 2 3 4 5

Theorem5.7.5

Moments. Let X have the gamma distribution with parameters α and β. For k =1, 2, . . . ,

E(Xk) = #(α + k)

βk#(α)= α(α + 1) . . . (α + k − 1)

βk.

In particular, E(X) = αβ , and Var(X) = α

β2 .

Proof For k = 1, 2, . . . ,

E(Xk) =! ∞

0xkf (x|α, β) dx = βα

#(α)

! ∞

0xα+k−1e−βx dx

= βα

#(α). #(α + k)

βα+k= #(α + k)

βk#(α). (5.7.14)

The expression for E(X) follows immediately from (5.7.14). The variance can becomputed as

Var(X) = α(α + 1)β2

−"

α

β

#2

= α

β2.

Figure 5.7 shows several gamma distribution p.d.f.’s that all have mean equal to1 but different values of α and β.

Example5.7.6

Service Times in a Queue. In Example 5.7.5, the conditional mean service rate giventhe observations X1 = x1, . . . , Xn = xn is

E(Z|x1, . . . , xn) = n + 12 + $n

i=1 xi

.

For large n, the conditional mean is approximately 1 over the sample average ofthe service times. This makes sense since 1 over the average service time is what wegenerally mean by service rate. !

The m.g.f. ψ of X can be obtained similarly.

Theorem5.7.6

Moment Generating Function. Let X have the gamma distribution with parameters α

and β. The m.g.f. of X is

ψ(t) ="

β

β − t

#α

for t < β. (5.7.15)



&

$

%

Moments of the gamma distribution

• Parameters: (α,β) ∈Ω, α > 0,β > 0

• Moments: µ = αβ, σ2 = αβ2

• MGF: MX(t) = (1 −βt)−α for t < 1β which can be derived as follows:

MX(t) =

∞∫0

etx1

Γ(α)βαxα−1e

− xβ dx

=

∞∫0

1

Γ(α)βαxα−1e

−( 1β

−t)xdx

=1

Γ(α)βα1(

1β − t

)α−1

∞∫0

xα−1( 1

β− t

)α−1e−( 1β

−t)x

(1β − t

)(1β − t

)dx=

1

Γ(α)βαΓ(α)(

1β − t

)α (by setting y = ( 1β − t)x in the expression for Γ(α).)

=1(

1 −βt)α



&

$

%

Gamma applications

• Survival analysis

– The waiting time till the rth event/success: If X is the time that passes until the first

success, then X could be gamma distribution with α = 1 and β = 1γ . This is known as an

exponential distribution. If, instead we are interested in the time taken for the rth

success, this has a gamma density with α = r and 1β = γ.

– Related to the Poisson distribution: If Y, the number of events in a given time period t

has a poisson density with parameter λ, the rate of success is given by γ = λt .

Example: A bottling plant breaks down, on average, twice every four weeks. We want the

probability that the number of breakdowns, X≤ 3 in the next four weeks. We have λ = 2

and the breakdown rate γ = 12 per week. P(X≤ 3) =

3∑i=0e−2 2i

i! = .135 + .271 + .271 + .18 = .857

Suppose we wanted the probability that the machine does not break down in the next four

weeks. The time taken until the first break-down, x must therefore be more than four

weeks. This follows a gamma distribution, with β = 1γ and α = 1.

P(X≥ 4) =∞∫4

12e

−x2 dx = e−x2

∣∣∣∞4

= e−2 = .135

• Income distributions that are uni-modal



&

$

%

Gamma distributions: some useful properties

• Gamma Additivity: Let X1, . . .Xn be independently distributed random variables with

respective gamma densities Gamma(αi,β). Then

Y =

n∑i=1

Xi ∼ Gamma(

n∑i=1

αi,β)

• Scaling Gamma Random Variables: Let X be distributed with gamma density

Gamma(α,β) and let c > 0. Then

Y = cX ∼ Gamma(α,βc)

Both these can be easily proved using the gamma MGF and applying the MGF uniqueness

theorem: In the first case the MGF of Y is the product of the individual MGFs, i.e.

MY(t) =

n∏i=1

MXi(t) =

n∏i=1

(1 −βt)−αi = (1 −βt)

n∑i=1

−αifor t <

1

β

For the second result, MY(t) =McX(t) =MX(ct) = (1 −βct)−α for t < 1βc



&

$

%

The gamma family: exponential distributions

An exponential distribution is simply a gamma distribution with α = 1

• Parameters: β ∈Ω, β > 0

• Density: f(x;β) = 1βe

− xβ I(0,∞)(x)

• Moments: µ = β, σ2 = β2

• MGF: MX(t) = (1 −βt)−1 for t < 1β

• Applications: As discussed above, the most important application the representation of

operating lives. The exponential is memoryless and so, if failure hasn’t occurred, the object

(or person, animal) is as good as new. The risk of failure at any point t is given by the

hazard rate,

h(t) =f(t)

S(t)

where S(t) is the survival function, 1 − F(t). Verify that the hazard rate in this case is a

constant, 1β .

If we would like wear-out effects, we should use a gamma with α > 1 and for work-hardening

effects, use a gamma with α < 1



&

$

%

The gamma family: chi-square distributions

An Chi-square distribution is simply a gamma distribution with α = v2 and β = 2

• Parameters: v ∈Ω, v is a positive integer (referred to as the degrees of freedom)

• Density: f(x;v) = 1

2v2 Γ(v2 )

xv2 −1e−x2 I(0,∞)(x)

• Moments: µ = v, σ2 = 2v

• MGF: MX(t) = (1 − 2t)−v2 for t < 1

2

• Applications:

• Notice that for v = 2, the Chi-Square density is equivalent to the exponential density with

β = 2. It is therefore decreasing for this value of v and hump-shaped for other higher values.

• The χ2 is especially useful in problems of statistical inference because if we have v

independent random variables, Xi ∼N(0, 1), their sumv∑i=1X2i ∼ χ

2v Many of the estimators we

use in our models fit this case (i.e. they can be expressed as the sum of independent normal

variables)



&

$

%

The Normal (or Gaussian) distribution

This symmetric bell-shaped density is widely used because:

1. Outcomes certain types of continuous random variables can be shown to follow this type of

distribution, this is the motivation we’ve used for most parametric distributions we’ve

considered so far (heights-humans, animals and plants, weights, strength of physical

materials, the distance from the centre of a target if errors in both directions are

independent).

2. It has nice mathematical properties: many functions of a set normally distributed random

variables have distributions that take simple forms.

3. Central Limit Theorems: The sample mean of a random sample from any distribution with

finite variance is approximately normal.



&

$

%

The Normal density

• Parameters: (µ,σ2) ∈Ω, µ ∈ (−∞,∞), σ > 0

• Density: f(x;µ,σ2) = 1σ√

2πe− 1

2 (x−µσ )2

I(−∞,+∞)(x)

• MGF: MX(t) = eµt+σ2t2

2

• The MGF can be used to derive the moments, E(X) = µ and variance is σ2

• As can be seen from the p.d.f, the distribution is symmetric around µ, where it achieves its

maximum value. this is therefore also the median and the mode of the distribution.

• The normal distribution with zero mean and unit variance is known as the standard normal

distribution and is of the form: f(x; 0, 1) = 1√2πe− 1

2 x2I(−∞,+∞)(x)

• The tails of the distribution are thin: 68% of the total probability lies within one σ of the

mean, 95.4% within 2σ and 99.7% within 3σ.



&

$

%

The Normal distribution: deriving the MGF

By the definition of the MGF:

M(t) =

∞∫−∞etx

1

σ√

2πe−

(x−µ)2

2σ2 dx

=

∞∫−∞

1

σ√

2πe

[tx−

(x−µ)2

2σ2

]dx

We can rewrite the term inside the square brackets to obtain:

tx−(x−µ)2

2σ2= µt+

1

2σ2t2 −

[x−(µ+σ2t)]2

2σ2

The MGF can now be written as:

MX(t) =Ceµt+12σ

2t2

where C =∞∫

−∞ 1σ√

2πe−

[x−(µ+σ2t)]2

2σ2 dx = 1 because the integrand is a normal p.d.f with parameter µ

replaced by (µ+σ2t)



&

$

%

The Normal distribution: computing moments

• First taking derivatives of the MGF:

M(t) = e(µt+σ2t22 )

M′(t) = M(t)(µ+σ2t)

M′′(t) = M(t) σ2 +M(t)(µ+σ2t)2

(obtained by differentiating M(t) with respect to t and substituting for M′(t))

• Evaluating these at t = 0, we get M′(0) = µ and M′′(0) = σ2 +µ2, or the variance = σ2.



&

$

%

Transformations of Normally Distributed Variables...1

RESULT 1: Let X ∼N(µ,σ2). Then Z =(X−µ)σ ∼N(0, 1)

Proof: Z is of the form aX+b with a = 1σ and b = −µ

σ . Therefore

MZ(t) = ebtMX(at) = e−µσ teµσ t+σ

2 t2

2σ2 = et22 which is the MGF of a standard normal distribution.

An important implication of the above result is that if we are interested in any distribution in

this class of normal distributions, we only need to be able to compute integrals for the standard

normal-these are the tables you’ll see at the back of most textbooks.

Example: The kilometres per litre of fuel achieved by a new Maruti model , X ∼N(17, .25). What

is the probability that a new car will achieve between 16 and 18 kilometres per litre?

Answer: P(16≤ x≤ 18) = P(

16−17.5 ≤ z≤

18−17.5

)= P(−2≤ z≤ 2) = 1 − 2(.0228) = .9544



&

$

%

Transformations of Normals...2

• RESULT 2: Let X ∼N(µ,σ2) and Y = aX+b, where a and b are given constants and a 6= 0,

then Y has a normal distribution with mean aµ+b and variance a2σ2

Proof: The MGF of Y can be expressed as MY(t) = ebteµat+12σ

2a2t2= e(aµ+b)t+ 1

2 (aσ)2t2

.

This is simply the MGF for a normal distribution with the mean aµ+b and variance a2σ2

• RESULT 3: If X1, . . . ,Xk are independent and Xi has a normal distribution with mean µiand variance σ2

i, then Y = X1 + · · ·+Xk has a normal distribution with mean µ1 + · · ·+µkand variance σ2

1 + · · ·+σ2k.

Proof: Write the MGF of Y as the product of the MGFs of the Xi’s and gather linear and

squared terms separately to get the desired result.

• We can combine these two results to derive the distribution of sample mean:

RESULT 4: Suppose that the random variables X1, . . . ,Xn form a random sample from a

normal distribution with mean µ and variance σ2, and let Xn denote the sample mean.

Then Xn has a normal distribution with mean µ and variance σ2

n .



&

$

%

Transformations of Normals to χ2 distributions

RESULT 5 : If X ∼N(0, 1), then Y = X2 has a χ2 distribution with one degree of freedom.

Proof:

MY(t) =

∞∫−∞ex

2t 1√

2πe−x

22 dx

=

∞∫−∞

1√

2πe− 1

2 x2(1−2t)dx

=1√

(1 − 2t)

∞∫−∞

1√

2π 1√(1−2t)

e− 12 (x√

(1−2t))2dx

=1√

(1 − 2t)for t <

1

2

( the integrand is a normal density with µ = 0 and σ2 = 1(1−2t) ).

The MGF obtained is that of a χ2 random variable with v = 1 since the χ2 MGF is given by

MX(t) = (1 − 2t)−v2 for t < 1

2 .



&

$

%

Normals and χ2 distributions...

RESULT 6 : Let X1, . . .Xn be independent random variables with each Xi ∼N(0, 1), then Y =n∑i=1X2i

has a χ2 distribution with n degrees of freedom.

Proof:

MY(t) =

n∏i=1

MX2i(t)

=

n∏i=1

(1 − 2t)−12

= (1 − 2t)−n2 for t <

1

2

which is the MGF of a χ2 random variable with v = n. This is the reason that the parameter v is

called the degrees of freedom. There are n freely varying random variables whose sum of squares

represents a χ2v-distributed random variable. This also follows directly from gamma-additivity.



&

$

%

The Bivariate Normal distribution

The bivariate normal has the density:

f(x,y) =1

2πσ1σ2

√1 − ρ2

e−q2

2

where

q =1

1 − ρ2

[(x−µ1

σ1

)2− 2ρ

(x−µ1

σ1

)(y−µ2

σ2

)+(y−µ2

σ2

)2]

• E(Xi) = µi,Var(Xi) = σ2i and the correlation coefficient ρ(X1,X2) = ρ

• Verify that in this case, X1 and X2 are independent iff they are uncorrelated.

• Applications: heights of couples, scores on tests...



&

$

%

The Multivariate Normal distribution

• Parameters: (a, B) ∈Ω, a ∈<n, B a symmetric positive definite matrix.

• Density: f(x; a, B) = 1

|B|12 (2π)

n2e− 1

2 (x−a)′B−1(x−a)

• Moments: µ = a, Cov(X) = B

• MGF: MX(t) = ea′t+ 12 t′Bt Note: There are n+

n(n+1)2 parameters, n means and

n(n+1)2

unique elements in the variance-covariance matrix B.

• Applications: statistical inference in the classical linear regression model...and with large

samples in other models.

Additional distributions that we’ll use mainly for inference are the Student’s t-distribution and

F-distribution. We’ll introduce these in the second half of the course.


t4003specialdistributions_16

Documents

Transcript of t4003specialdistributions_16