Lecture6 Handouts

8/13/2019 Lecture6 Handouts

1/46

Lecture 6: Probability Distributions

Matt Golder & Sona Golder

Pennsylvania State University

Probability Distributions

Were now going to take a closer look at particular probability distributions.

1 Discrete2 Continuous

Bernoulli Distribution

Imagine a binary X {0, 1} with:

X = 1 with probability = 0 with probability 1

That is, X s PMF is:

f (x) = for X = 11 for X = 0

which we can usefully rewrite as

f (x) = x (1 )1 x , x {0, 1}.

Notes

Notes

Notes
http://find/http://find/http://find/


2/46


X is a Bernoulli variable, and we say that X is distributed Bernoulli. Wewrite this as:

X Bernoulli( )

The Bernoulli is a one-parameter ( ) distribution for a discrete variable thatcan take on only two values.

It is therefore the natural choice for binary / dichotomous variables.

Because X can only take on two values, we say that X has support only in

{0, 1} i.e., the only possible values X can take on are 0 and 1.


So, the PMF of X is:

f (x) = for X = 11 for X = 0

and the cumulative probability function (CPF) of X is:

F (x) =x

f (x)

=1 for X = 11 for X = 0


It is also the case that:

E(X ) =x

xf (x)

= (0)(1 ) + (1)( )=

Notes

Notes

Notes


3/46


And it is also the case that:

Var(X ) =x

[X E(X )]2 f (x)=

x

[X ]2 f (x)= (0 )2 (1 ) + (1 )2 = 2 3 + 2 2 + 3= 2 = (1 )

The variance will be at its largest when = 0.5, which is when wed see thegreatest amount of variation between 0s and 1s.

Binomial Distribution

The Bernoulli is the simplest discrete probability distribution and is thebuilding block for a large set of important discrete distributions.

Probably the most important is the binomial distribution, which is most easilythought of as the number of 1s (successes) in n independent Bernoullitrials, each with identical probability :

f (x) = nx

x (1 )n x

where [0, 1] is the probability of success, n {0, 1, 2,...} is the numberof trials, and

nx

n!x!(n x)!


X is a Binomial variable, and we say that X is binomial-distributed. Wewrite this as:

X binomial(n, ).

The Binomial is a two-parameter ( n , ) distribution.

If X is a binomial variable, it has support only in the set {0, 1, 2, ...n }.

Notes

Notes

Notes


4/46


The binomial distribution is named after the binomial theorem for theexpansion of powers of sums in mathematics, which states that:

(a + b)n =n

k =0

nk

ak bn k .

For example,

(a + b)2 = a2 + 2 ab + b2 ,(a + b)3 = a3 + 3 a2 b + 3 ab2 + y3 ,(a + b)4 = a4 + 4 a3 b + 6 a2 b2 + 4 ab3 + b4 ,

and so forth.


Deriving the binomial distribution:

Each sample point in the sample space can b e characterized by an n -tupleinvolving the letters S and F , i.e., S SF S F F F S F S . . .F S for n trials.

For example, we might have a sample point with x successes i.e. S = x:

SSSSS. . .SSS

x

F F F . . . F F

n x



Because the Bernoulli trials are independent, the probability of this samplepoint is:

ppppp . . . ppp

xqqq. . .qq

n x= x (1 )n

x

Every other sample point can be represented by a similar n -tuple.

Notes

Notes

Notes


5/46



Because the number of distinct n -tuples is

nx =

n!x!(n x)! ,

it follows that the event S = x is made up of nx sample points, each withprobability x (1 )n

x .

As a result,

f (x) =ns

x (1 )n x


Example (Coin Toss) : What is the probability of getting exactly 2 heads in 6tosses of a fair coin?



Pr(X = 2) =62

12

2 12

4

= 6!2!4!

12

2 12

4

= 1564

Example (Defective Fuses) : Suppose that a lot of 5000 electrical fuses contains5% defectives. If a sample of 5 fuses is tested, what is the probability of ndingat least one defective?

Notes

Notes

Notes


6/46



Pr(X = 2) =62

12

2 12

4

= 6!2!4!

12

2 12

4

= 1564

Example (Defective Fuses) : Suppose that a lot of 5000 electrical fuses contains5% defectives. If a sample of 5 fuses is tested, what is the probability of ndingat least one defective?

Pr(at least one defective) = 1 p(0) = 1 50

0 (1 )5

= 1 p(0) = 1 50

0.955

= 1 0.774 = 0 .226


Example (Three-Child Family) : Whats the probability of a certain number of girls in a three child family?

Pr(X = 1) =31

12

1 12

2

= 3(0 .5)(0 .25) = 0 .375 = 38

Pr(X = 2) =32

12

2 12

1

= 3(0 .25)(0 .5) = 0 .375 = 38

Pr(X = 3) =33

12

3 12

0

= 1(0 .125)(1) = 0 .125 = 18

You can also use a tabulation of binomial probabilities from a statisticstextbook. For example, we could nd the probability of getting exactly 2 girlsfrom 3 children by looking for n = 3 in the rst column, s = 2 in the secondcolumn, and then going across the row until we found = 0 .5.


The binomial PMF is:

f (x) =ns

x (1 )n x

And the binomial CPF the probability of observing x or fewer successes inn Bernoulli trials with probability of success is:

F (x) =x

f (x)

=

x

j =0

n j

j(1 )

n j

Notes

Notes

Notes
http://find/http://goback/http://find/http://goback/http://find/http://goback/


7/46


Figure: Binomial PMF with Different Parameters

0

. 0 5

. 1

. 1 5

. 2

. 2 5

0 10 20 30 40X

p=0.5andn=20p=0.7andn=20p=0.5andn=40

Figure: Binomial CPF with Different Parameters

0

. 2

. 4

. 6

. 8

1

0 10 20 30 40X

binomial1

binomial2

binomial3


The expected value of a binomial variable X is:

E(X ) = n,

which is pretty intuitive since n is the number of trials, and is theprobability of success.


The variance of a binomial variable X is:

Var(X ) =x

[X E(X )]2 f (x)

=x

(X n )2nx

x (1 )n x

= n (1 ).

This means (again, intuitively) that the variability in a binomial variable is

increasing in n , and

largest when = 0 .5 for a xed value of n .

Notes

Notes

Notes


8/46


A binomial variable is necessarily unimodal (except for the special case of aBernoulli variable with = 0 .5).

It can also be skewed , depending on the value of .

Because the binomial is used to model the number of successes out of aknown number of trials, it is widely used in the social sciences.

For example, it is useful for modeling proportions or percentages where thedenominator is known.

Example: We might believe that the number of yea votes for bills in the U.S.House of Representatives (out of n = 435 ) follows a binomial distribution.

Geometric Distribution

Another thought experiment is to consider repeating (independent) Bernoullitrials with probability of success until we observe the rst success.

The number of independent Bernoulli trials needed to achieve one success is ageometric random variable.

If X is a geometric random variable with parameter , then

f (x) = (1 )x 1 .

The geometric is thus a one-parameter distribution, with [0, 1]. We writethis as:X geometric ( )


The geometric PMF is:

f (x) = (1 )x 1

And the geometric CPF the cumulative probability of a rst success in

{1, 2,... } trials is:

F (x) =x

j =1

(1 )x 1

= 1

(1

)x

Notes

Notes

Notes


9/46


The expected value of the geometric distribution is:

E(X ) = 1

This suggests that as the probability of success declines, we expect to

undertake more and more trials before observing our rst success.

The variance of X is:

Var(X ) = 1

2

In other words, the variance gets arbitrarily close to zero as the probability of success approaches 1.0, and arbitrarily large as 0.

Negative Binomial Distribution

The negative binomial can be thought of as a generalization or variant of thegeometric distribution.

Imagine we begin conducting independent Bernoulli trials with probability of success , and stop the trials upon observing r successes.

The distribution of the number of failures we observe (X ) before achieving ther th success is distributed according to a negative binomial distribution.

Technically, the negative binomial is a continuous distribution the specialcase where r is an integer value is known as the Pascal distribution (and thereal-valued case the Polya distribution ).


A negative binomial variable X has support on the nonnegative integers.

A simple expression of the PMF for a negative binomial variable is:

f (x) =r + x 1

r 1 r (1 )x

It turns out that there are lots of ways of formulating the negative binomialdistribution.

Notes

Notes

Notes


10/46


The negative binomial PMF is:

f (x) =r + x 1

r 1 r (1 )x

And the negative binomial CPF the probability of observing x or fewerfailures before the r th success is:

F (x) =x

j =0

r + j 1r 1

r (1 ) j .

Importantly, one can show via a bit of algebra that this value is equal to oneminus the CPF of the binomial distribution (hence the name).


Similar to a geometric variable, the expected value of a negative binomialvariable is:

E(X ) = (1 )r

and the variance is:

Var(X ) = (1 )r

2


There are at least two ways of thinking about the negative binomialdistribution:

1 As a generalization of the geometric (and, in fact, the negative binomialdistribution reduces to the geometric when r = 1 ).

2 Or as a Poisson variable with heterogeneity.

Notes

Notes

Notes


11/46

Poisson Distribution

Now consider a very large number of n Bernoulli trials, where the probability of an event in any one trial is small.

In such a situation, the total number of events observed will follow a Poissondistribution.


Formally, for n independent Bernoulli trials with (sufficiently small) probabilityof success and where n > 0, the probability of observing exactly x totalsuccesses as the number of trials grows without limit is:

f (x) = limn

nx

n

x

1 n

n x

= x exp( )

x!

This is sometimes known as the Law of Rare Events motivation for thePoisson distribution and comes from Simeon-Denis Poisson in 1837.


The Poisson PMF is:

f (x) = x exp( )

x!

And the Poisson CPF is:

F (x) =x

j =0

j exp( )x!

Notes

Notes

Notes


12/46


An alternative way to think about the Poisson distribution is with an abstractmodel of event counts .

Suppose we are interested in studying events, and that those events occur overtime.

We might consider the constant rate at which events occur; call this rate .Its useful to think of as the expected number of events in any particular timeperiod of length h .

Imagine further that the events in question are independent ; that is, theoccurrence of one event has no bearing on the probability that another willoccur.


If the process that gives rise to the events in question the event process conforms to these assumptions, then it can be shown that as the length of theinterval h 0,

The probability of an event occurring in the interval (t, t + h] = h

The probability of no event occurring in the interval (t, t + h] = 1 h

Such a variable is known as a Poisson process : events occur independently with

a constant probability equal to times the length of the interval, i.e., h .


Now, consider our outcome variable X t as the number of events that haveoccurred in the interval t of length h .

For such a process, the probability that the number of events occurring in(t, t + h] is equal to some value x {0, 1, 2, 3,... } is:

f (x) = exp(h )h x

x!

If all the intervals are of the same length (and equal to 1), this reduces to:

f (x) = exp() x

x! ,

which is the Poisson distribution.

Notes

Notes

Notes


13/46


f (x) = exp(h )h x

x!

This is the way we typically see the Poisson distribution written.

By this logic, the Poisson distribution is the limiting distribution for the numberof independent (Poisson) events occurring in some xed period of length h .

The assumptions underlying the event process constant arrival rates , andindependence across events are key to deriving the Poisson distribution inthis way.

If we relax these assumptions, the resulting distribution(s) are not Poisson.


The Poisson distribution has several important characteristics:

It is a discrete probability distribution, with support on the non-negativeintegers.

The rate can be interpreted as the expected number of events duringan observation period t . In fact, E (X ) = .

As increases, several interesting things happen:1 The mean/mode of the distribution gets bigger.2

The variance gets larger as well since the variable is bounded frombelow, its variability will necessarily get larger with its mean. Infact, E (X ) = Var(X ) = .

3 The distribution becomes more Normal.


Figure: Empirical Poisson Variates, with Varying s

Notes

Notes

Notes


14/46


The Poisson distribution is often used as a baseline for time series and spatialcount data: if the events are relatively rare and independent, they should bePoisson-distributed.

If a process is not consistent with the Poisson E (X ) = Var(X ) = , then it iseither overdispersed or underdispersed.


Overdispersion : E(X ) < Var(X )

This occurs when the occurrence of one event makes the occurrence of subsequent or adjacent events more likely, so that events bunch together.

Because they are bunched, some sample intervals will have an unusuallylarge number of events, others will have an unusually small number of events.

This leads back to a Negative Binomial count model.


Underdispersion : E(X ) > Var(X )

This occurs when the occurrence of one event makes the occurrence of subsequent or adjacent events less likely, so that events are spaced outevenly.

Because they are regularly spaced, all sample intervals will have about thesame number of events, so Var (X ) will be small.

Examples : Political (or marital.. . ) honeymoon periods where agovernment is unlikely to fall/resign immediately after taking office.

Notes

Notes

Notes


15/46


Note as well that the Poisson distribution. . .

. . . is not preserved under affine transformations that is, affinetransformations of Poisson variables are not themselves (necessarily)Poisson variables as well.

. . . is preserved under addition (convolution) provided that thecomponents are independent. That is, for two Poisson variablesX 1 Poisson (X 1 ) and X 2 Poisson ( X 2 ),Z = X 1 + X 2 Poisson (X 1 + X 2 ) iff X 1 and X 2 are independent .However,. . . the same is not true for differences of Poisson variables.

Multinomial Distribution

All the distributions weve talked about so far have their roots in the Bernoulli,which means that there have been only two potential outcomes (weve calledthem success and failure).

But suppose that instead of just two possibilities, we instead had K possibledistinct outcomes for each trial, where each possible outcome has somecorresponding probability of happening on each trial k , and Kk =1 k = 1 .

Multinomial Distribution

We can think of a multi-outcome analogue to the binomial, where the variableX k denotes the number of times we observe outcome k out of n trials.

The k-vector X denotes these k distinct variables:

X =

X 1X 2

...X K

Notes

Notes

Notes


16/46


17/46

Discrete Distributions

Figure: Relationship Among Discrete Distributions

n

0

1 F()

r = 1

n = 1

k = 2

Multinomial

Bernoulli Binomial

Negative Binomial

Poisson

Geometric

n = # of trials

= Pr(success)

r = # of successes

k = # of outcomes

Uniform Distribution

If a variable X is uniformly distributed on the range [a, b], then its PDF is:

f (x) =

1b a for a x b0 for x < a or x > b

We write this as X U (a, b).

The Uniform is a two-parameter distribution, where the parameters are theminimum and the maximum (the bounds), which may fall anywhere in R.

We can most easily think of this as a rectangular shape of probability, locatedbetween a and b in the real number line, with length ba and heightequal to 1b a .


Figure: Various Uniform PDFs

3 2 1 0 1 2 3

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

x value

D e n s i t y

Notes

Notes

Notes


18/46


The uniform PDF is:

f (x) =

1b a for a x b0 for x < a or x > b

And since a draw from a uniform distribution has equal probability of fallinganywhere on the real line between a and b, the CDF takes on an especiallysimple form:

F (x) = f (x)dx =0 for x < a

x ab a for a x < b1 for x b


Figure: Various Uniform CDFs

3 2 1 0 1 2 3

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

x value

C u m u

l a t i v e

P r o

b a

b i l i t y

Because we are just integrating over a constant value of f (x), the CDF lookslike a sloped line extending from 0 to 1 over the range from a to b.


The expected value of a uniform variable X is:

E(X ) = X Med = a + b

2

and the variance is:

Var(X ) = ( ba)2

12 .

Note as well that the mode of a uniform variable is any value in [a, b], and thatits skewness is zero.

Notes

Notes

Notes


19/46

Standard Uniform Distribution

A uniform variable with a = 0 and b = 1 is often referred to as a standard uniform variable.

The standard uniform has the property that:

1 X U (0, 1)In other words, the variable X and its complement have the same distribution.

The standard uniform distribution turns out to be very useful in generatingother random variables, since it is the range over which a probability varies.

Thus, if we want to generate random (equiprobable) data from somedistribution, we start with a standard uniform variable, and then transform thatvariable by the inverse of the relevant PDF.

Normal Distribution

If X is a normally distributed variable with mean and variance 2 , then:

f (x) = 1 2 exp

(x )222

The Normal, sometimes called the Gaussian, is a two-parameter distributionand we write it as X N (, 2 ); we say X is distributed normally with meanmu and variance sigma squared.

The symbol (little phi) is often used as a shorthand to represent the normaldensity:X , 2

Normal Distribution

The corresponding normal CDF, denoted by the symbol (big phi), theprobability of a normal random variable taking on a value less than or equal tosome specied number is the indenite integral of the normal density.

F (x) , 2 (x) = , 2 f (x)dx.Unfortunately, there is no simple closed-form expression for this integral and itsevaluation requires the use of numerical integration techniques.

Notes

Notes

Notes


20/46

Normal Distribution

Figure: Various Normal Densities

-5 0 5 10

0 . 0

0 . 1

0 . 2

0

. 3

0 . 4

Comparison of Normal Distributions

x value

D e n s i t y

Distributions

mean=0,var=1mean=2,var=1

mean=-3,var=2mean=5,var=4

Normal Distribution

Figure: Various Normal CDFs (parameters as in previous gure)

5 0 5 10

0 . 0

0 . 2

0 . 4

0 . 6

0 . 8

1 . 0

x value

C u m u l a

t i v e

P r o

b a

b i l i t y

Normal Distribution

The most common justication for the normal distribution has its roots in thecentral limit theorem .

Consider i = {1, 2, ...N } independent, real-valued random variables X i , eachwith nite mean i and variance 2i > 0.

Consider a new variable X dened as the sum of these variables:

X =N

i =1

X i

Notes

Notes

Notes


21/46

Normal Distribution

Then we know that

E(X ) =N

i =1

i

= < and

Var(X ) =N

i =1

2i

= 2 < .

Normal Distribution

The central limit theorem states that:

limN

X = limN

N

i =1

X iD

N ()

where the notation D indicates convergence in distribution.In other words, as N gets sufficiently large, the distribution of the sum of N independent random variables with nite mean and variance will converge to anormal distribution.

Thus, a normal distribution is appropriate when the observed variable X cantake on a range of continuous values, and when the observed value of X canbe thought of as the product of a large number of relatively small, independentshocks or perturbations.

Normal Distribution

Properties of the Normal distribution:

A normal variable X has support in R .

The normal is a two-parameter distribution, where (, ) and2 (0, ).The normal distribution is always symmetrical (mean, mode, and medianare the same) and mesokurtic.

The maximum height of the normal is attained at X = , and the pointsof inection are at .The normal distribution is preserved under a linear transformation, i.e., if X N (, 2 ), then aX + b N (a + b, a2 2 ).

Notes

Notes

Notes


22/46

Normal Distribution

The importance of the Normal distribution lies in its relationship to the centrallimit theorem.

The central limit theorem basically notes that as ones sample size increases,the distribution of sample means (or other estimates) approaches a normaldistribution.

Due to the complexity of its functional form, we often work with somethingknown as the standard Normal distribution rather than the Normal distribution.

Standard Normal Distribution

One linear transformation of the normal distributions is particularly useful:

b = ,

a = 1 .

This yields:

ax + b N (a + b, a2 2 ) N (0, 1)

This is the standard Normal density function . We often denote this (), andsay that X is distributed as standard normal.


We can also get the standard Normal distribution by transforming(standardizing) the normal variable X . . .

If X N (, 2 ), then Z = (x ) N (0, 1).

The density function then reduces to:

f (z ) (z ) = 1 2 exp

(z )2

2

We often write the CDF for the standard Normal as ().The standard Normal is sometimes called the Z distribution.

Notes

Notes

Notes


23/46


Figure: Standard Normal Distribution

z = 1.0

z = 1.7

The Z value is the number of standarddeviations away from the mean.

Because the mean of the standard Normal is 0, the standard deviation is 1,and the distribution is symmetric, we can dene values along the Z distributionwith regard to the distance from the mean.


We will often want to calculate cumulative probabilities for standard Normal(and Normal) distributions.

Most statistics books include a table at the back where the cumulativeprobabilities are calculated for different z scores.

Lets look at how this works.



z = 1.5

What is the probability that Z 1.5?

Notes

Notes

Notes


24/46



z = 1.5

What is the probability that Z 1.5? 0.067 or 6.7%.


Example : What is Pr( 0 z 1)?


Example : What is Pr( 0 z 1)?We can solve this by taking the probability above 0 and subtracting from it theprobability above 1.

Pr(0 z 1) = Pr(z > 0) Pr(z > 1)= 0 .5 0.159 = 0 .341 34%

Example : What is Pr( 1 < z < 1.5)?

Notes

Notes

Notes


25/46


Example : What is Pr( 0 z 1)?We can solve this by taking the probability above 0 and subtracting from it theprobability above 1.

Pr(0 z 1) = Pr(z > 0) Pr(z > 1)= 0 .5

0.159 = 0 .341 34%

Example : What is Pr( 1 < z < 1.5)?

We can solve this by taking the probability above 1 and subtracting from it theprobability above 1.5.

Pr(1 < z < 1.5) = Pr(z > 1) Pr(z > 1.5)= 0 .159 0.067 = 0 .092 9%


Example : What is Pr( 1 < z < 2)?


Example : What is Pr( 1 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

Pr(z > 2) = 0 .023Pr(z < 1) = Pr(z > 1) = 0 .159

Pr(1 < z < 2) = 1 0.023 0.159 = 0 .818 82%

Example: What is Pr( 2 < z < 2)?

Notes

Notes

Notes


26/46


Example : What is Pr( 1 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

Pr(z > 2) = 0 .023Pr(z < 1) = Pr(z > 1) = 0 .159

Pr(1 < z < 2) = 1 0.023 0.159 = 0 .818 82%Example: What is Pr( 2 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

Pr(2 < z < 2) = 1 Pr(z < 2) Pr(z > 2)= 1 2(0.023) = 0 .954 95%

Standardization and Z Scores

But what if were not dealing with a standard Normal distribution?

Suppose we want to know the area under a general normal curve i.e.Pr( x1 < x < x 2 ), where x1 and x 2 are the lower and upper bounds of someinterval.

To calculate this, we simply need to know the relationship betweenPr( z 1 < z < z 2 ) and Pr( x1 < x < x 2 ).


Note that we already showed how we could take a normally distributed X andturn it into a standard Normal Z through a linear transformation.

Recall that if X N (, 2 ), then Z = (x ) N (0, 1).

We know how to nd cumulative probabilities using a standard Normal table.

It follows that we can re-transform these probabilities back into the originalvariable X .

Notes

Notes

Notes


27/46


The fact thatx

= z

implies that

x = z +

Inserting the two endpoints of our interval of interest produces

x1 = z 1 + and x 2 = z 2 +

And substituting the values into our probability statement,

Pr(x1 < x < x 2 ) = Pr(z 1 + < z + < z 2 + )


Canceling out the common terms gives us:

Pr(x1 < x < x 2 ) = Pr(z 1 < z < z 2 ),

where

z 1 = x 1

z 2 = x 2


Example : Consider a random variable X which is distributed with mean veand standard deviation 2. We can determine the probability that 2 < x < 3.

z 1 = 2 5

2 =

32

z 2 = 3 5

2 = 1

This implies that Pr(2 < x < 3) = Pr( 32 < z < 1).

Notes

Notes

Notes


28/46


Pr(2 < x < 3) = Pr( 32 < z < 1).So we just subtract the probability that z < 32 from the probability thatz < 1.

Pr(2 x 3) = Pr(z < 1) Pr z < 32

We can solve this using symmetry:

Pr(2 x 3) = Pr(z < 1) Pr z < 32

= Pr(z > 1) Pr z > 32

= 0 .159 0.067 = 0 .092 = 9 .2%


Example : Suppose the trout in a lake have lengths that are approximatelynormally distributed with mean of 9.5 and a standard deviation of 1.4. Whatproportion of them have a length greater than 12? What proportion of themhave a length greater than 10?


Example : Suppose the trout in a lake have lengths that are approximatelynormally distributed with mean of 9.5 and a standard deviation of 1.4. Whatproportion of them have a length greater than 12? What proportion of themhave a length greater than 10?

We can start by standardizing the score x = 12 .

z = x

=

12 9.51.4

= 1 .79

Thus

Pr(x > 12) = Pr(z > 1.79) = 0 .037 4%

Notes

Notes

Notes


29/46


30/46


The proportion of non-defective washers is the area under the standard normalcurve between -1.2 and 1.2.

Pr(1.2 < z < 1.2) = 2 Pr(0 < z < 1.2) = 2 (Pr(z > 0) Pr(z > 1.2)= 2

(0.5

0.115) = 2

0.385 = 0 .77 = 77%

Thus, the percentage of defective washers is 100% 77% = 23% .


To go to Normal Probabilities and Normal Probabilities (One Tail), click here


Example : If X is a random normal variable with mean and variance 2 , ndPr( < x < + ).

Notes

Notes

Notes
http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://goback/http://find/http://goback/http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://goback/


31/46


Example : If X is a random normal variable with mean and variance 2 , ndPr( < x < + ).

z 1 = x 1

= ( ) )= 1

z 2 = x 2

= ( + ) )

= 1

Pr(1 < z < 1) = Pr(0 < z < 1) + Pr(0 < z < 1) = 0 .3413 + 0 .3413 = 0 .6826.


Example : If X is a random normal variable with mean and variance 2 , ndPr( 2 < x < + 2 )


Example : If X is a random normal variable with mean and variance 2 , ndPr( 2 < x < + 2 )

Pr( 2 < x < + 2 ) = Pr( 2)

< z qnorm(0.9)[1] 1.281552

Notes

Notes

Notes
http://www.statmethods.net/advgraphs/probability.htmlhttp://www.statmethods.net/advgraphs/probability.htmlhttp://find/http://goback/http://find/http://goback/http://www.statmethods.net/advgraphs/probability.htmlhttp://find/http://goback/


43/46

Generating Random Variables (R )

Suppose you wanted to generate 100 observations from a stan dard normaldistribution:

> rnorm(100)[1] 1.796717125 -0.144005643 -0.095465392 1.821761827 0.720954702. . .

[96] 0.346642718 -0.562705574 2.106901016 -0.313824110 -1.116299545

Each function has parameters specic to that distribution.

Example : rnorm(100, m=50, sd=10) generates 100 random variables from anormal distribution with mean 50 and standard deviation 10.

Generating Random Variables (R )

Example : For a Normal distribution with = 5 and = 1 .414 , we have:

> Xnorm Xbinom5point2


44/46

Generating Random Variables (Stata )

Drawing random variables in Stata is similar to R except that Stata works inobservations, whereas R works with vectors.

The following will give you 1000 normal variables with mean 5 and variance 2(i.e. standard deviation 1.414)

. clear. set obs 1000

. gen Xnorm = rnormal(5, 1.414)

For the binomial:

. clear

. set obs 10000

. gen Xbin = rbinomial(5, 0.2)

Generating Random Variables (Stata )

Figure: Ten Thousand Draws from a Binomial(5, 0.2) Distribution

Xbinom5point2

P e r c e n

t o

f T o

t a l

0

10

20

30

40

0 1 2 3 4 5

Generating Random Variables: R and Stata commands

Table: Commands for Generating Random Variates

Distribution R StataBinomial (n, ) rbinom() rndbinGeometric ( ) rgeom() ?Negative Binomial (n, ) rnbinom() ?Poisson ( ) rpois() rndpoiUniform (0 , 1) r uni f( ) u ni for m( )Normal (0 , 1) rnorm() invnorm(uniform())Lognormal (0 , 1) rlnorm() xlgnStudents t (k ) rt() rndtChi-Square (k ) rchisq() rndchiF (k, ) rf rndf

Stata commands marked with an asterisk are from Hilbes rnd group of commands.?s indicate that Im not aware of any canned way of doing this, though one canalways generate them by hand using the appropriate PDF function.

Notes

Notes

Notes


45/46

Generating Random Variables

When we generate random draws from a distribution, theyre not really randomin the truest sense.

Instead, the values generated by random number generators (RNGs) are(usually) what we refer to as pseudorandom (PRNG).

This means that they start with some original number or set of numbers (calleda seed ) and then they use deterministic functions of that number to generaterandom numbers.

Generating Random Variables

The key constraint of a PRNG is the cycle (also known as a period ormodulus ).

Since PRNGs are generated by a deterministic process, once a particularnumber r k is encountered again, every subsequent random number r t willequal r t k.

The 1997 invention of the Mersenne twister algorithm, by Makoto Matsumotoand Takuji Nishimura, avoids many of the problems with earlier generators.

It has the colossal cycle of 219937 - 1 iterations (> 43 106000 ), it is proven tobe equidistributed in (up to) 623 dimensions (fo r 32-bit values), and i t runsfaster than other statistically reasonable generators.

Note that the theoretical maximum cant be obtained since computers havenite memory for the storage of numbers.

Generating True Random Variables

In some applicationsnotably cryptography and lotteriesit is useful to havetrue random number generators that use physical processes that are chaotic orotherwise known to be truly random. These numbers do not repeat.

Examples include:

Lava lamps: An early system by the Silicon Graphics Inc. used lava lamps,which are chaotic, and photodetectors to generate true random numbers.

Radioactive decay.

Atmospheric noise.

Noise generated by loose electrical connections.

You can nd true random number generators at http://random.orghttp://random.org .

Notes

Notes

Notes
http://random.org/http://random.org/http://find/http://random.org/http://find/http://find/


46/46

Random Seeds

Given a particular seed, all pseudo-random numbers generated from that seed will occur in exactly the same order , i.e., the seed determines the sequence of random numbers.

An important property of this is that it allows one to go back and replicateexactly what one has done before, even though the values generated arerandom, so long as we set the seed to some known number(s) at the outset (and use the same pseudo-random number generation algorithm).

Random Seeds: R

> seed set.seed(seed) # setting the system seed> rt(3,1) # three draws from a t distrib. w/1 d.f.[1] -0.1113 -0.7306 1.9839> seed set.seed(seed) # resetting the seed> rt(3,1) # different values for the draws[1] -0.5211 7.9161 -155.3186> seed set.seed(seed)> rt(3,1) # identical values of the draws[1] -0.1113 -0.7306 1.9839

Random Seeds: Stata

. clear

. set obs 3obs was 0, now 3. set seed 3229. gen T1=invttail(1,uniform()). set seed 1077. gen T2=invttail(1,uniform()). set seed 3229. gen T3=invttail(1,uniform()). list

+-----------------------------------+| T1 T2 T3 ||-----------------------------------|

Notes

Notes

Notes

Lecture6 Handouts

Documents

Transcript of Lecture6 Handouts