Lecture6 Handouts

download Lecture6 Handouts

of 46

Transcript of Lecture6 Handouts

  • 8/13/2019 Lecture6 Handouts

    1/46

    Lecture 6: Probability Distributions

    Matt Golder & Sona Golder

    Pennsylvania State University

    Probability Distributions

    Were now going to take a closer look at particular probability distributions.

    1 Discrete2 Continuous

    Bernoulli Distribution

    Imagine a binary X {0, 1} with:

    X = 1 with probability = 0 with probability 1

    That is, X s PMF is:

    f (x) = for X = 11 for X = 0

    which we can usefully rewrite as

    f (x) = x (1 )1 x , x {0, 1}.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    2/46

    Bernoulli Distribution

    X is a Bernoulli variable, and we say that X is distributed Bernoulli. Wewrite this as:

    X Bernoulli( )

    The Bernoulli is a one-parameter ( ) distribution for a discrete variable thatcan take on only two values.

    It is therefore the natural choice for binary / dichotomous variables.

    Because X can only take on two values, we say that X has support only in

    {0, 1} i.e., the only possible values X can take on are 0 and 1.

    Bernoulli Distribution

    So, the PMF of X is:

    f (x) = for X = 11 for X = 0

    and the cumulative probability function (CPF) of X is:

    F (x) =x

    f (x)

    =1 for X = 11 for X = 0

    Bernoulli Distribution

    It is also the case that:

    E(X ) =x

    xf (x)

    = (0)(1 ) + (1)( )=

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    3/46

    Bernoulli Distribution

    And it is also the case that:

    Var(X ) =x

    [X E(X )]2 f (x)=

    x

    [X ]2 f (x)= (0 )2 (1 ) + (1 )2 = 2 3 + 2 2 + 3= 2 = (1 )

    The variance will be at its largest when = 0.5, which is when wed see thegreatest amount of variation between 0s and 1s.

    Binomial Distribution

    The Bernoulli is the simplest discrete probability distribution and is thebuilding block for a large set of important discrete distributions.

    Probably the most important is the binomial distribution, which is most easilythought of as the number of 1s (successes) in n independent Bernoullitrials, each with identical probability :

    f (x) = nx

    x (1 )n x

    where [0, 1] is the probability of success, n {0, 1, 2,...} is the numberof trials, and

    nx

    n!x!(n x)!

    Binomial Distribution

    X is a Binomial variable, and we say that X is binomial-distributed. Wewrite this as:

    X binomial(n, ).

    The Binomial is a two-parameter ( n , ) distribution.

    If X is a binomial variable, it has support only in the set {0, 1, 2, ...n }.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    4/46

    Binomial Distribution

    The binomial distribution is named after the binomial theorem for theexpansion of powers of sums in mathematics, which states that:

    (a + b)n =n

    k =0

    nk

    ak bn k .

    For example,

    (a + b)2 = a2 + 2 ab + b2 ,(a + b)3 = a3 + 3 a2 b + 3 ab2 + y3 ,(a + b)4 = a4 + 4 a3 b + 6 a2 b2 + 4 ab3 + b4 ,

    and so forth.

    Binomial Distribution

    Deriving the binomial distribution:

    Each sample point in the sample space can b e characterized by an n -tupleinvolving the letters S and F , i.e., S SF S F F F S F S . . .F S for n trials.

    For example, we might have a sample point with x successes i.e. S = x:

    SSSSS. . .SSS

    x

    F F F . . . F F

    n x

    Binomial Distribution

    Deriving the binomial distribution:

    Because the Bernoulli trials are independent, the probability of this samplepoint is:

    ppppp . . . ppp

    xqqq. . .qq

    n x= x (1 )n

    x

    Every other sample point can be represented by a similar n -tuple.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    5/46

    Binomial Distribution

    Deriving the binomial distribution:

    Because the number of distinct n -tuples is

    nx =

    n!x!(n x)! ,

    it follows that the event S = x is made up of nx sample points, each withprobability x (1 )n

    x .

    As a result,

    f (x) =ns

    x (1 )n x

    Binomial Distribution

    Example (Coin Toss) : What is the probability of getting exactly 2 heads in 6tosses of a fair coin?

    Binomial Distribution

    Example (Coin Toss) : What is the probability of getting exactly 2 heads in 6tosses of a fair coin?

    Pr(X = 2) =62

    12

    2 12

    4

    = 6!2!4!

    12

    2 12

    4

    = 1564

    Example (Defective Fuses) : Suppose that a lot of 5000 electrical fuses contains5% defectives. If a sample of 5 fuses is tested, what is the probability of ndingat least one defective?

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    6/46

    Binomial Distribution

    Example (Coin Toss) : What is the probability of getting exactly 2 heads in 6tosses of a fair coin?

    Pr(X = 2) =62

    12

    2 12

    4

    = 6!2!4!

    12

    2 12

    4

    = 1564

    Example (Defective Fuses) : Suppose that a lot of 5000 electrical fuses contains5% defectives. If a sample of 5 fuses is tested, what is the probability of ndingat least one defective?

    Pr(at least one defective) = 1 p(0) = 1 50

    0 (1 )5

    = 1 p(0) = 1 50

    0.955

    = 1 0.774 = 0 .226

    Binomial Distribution

    Example (Three-Child Family) : Whats the probability of a certain number of girls in a three child family?

    Pr(X = 1) =31

    12

    1 12

    2

    = 3(0 .5)(0 .25) = 0 .375 = 38

    Pr(X = 2) =32

    12

    2 12

    1

    = 3(0 .25)(0 .5) = 0 .375 = 38

    Pr(X = 3) =33

    12

    3 12

    0

    = 1(0 .125)(1) = 0 .125 = 18

    You can also use a tabulation of binomial probabilities from a statisticstextbook. For example, we could nd the probability of getting exactly 2 girlsfrom 3 children by looking for n = 3 in the rst column, s = 2 in the secondcolumn, and then going across the row until we found = 0 .5.

    Binomial Distribution

    The binomial PMF is:

    f (x) =ns

    x (1 )n x

    And the binomial CPF the probability of observing x or fewer successes inn Bernoulli trials with probability of success is:

    F (x) =x

    f (x)

    =

    x

    j =0

    n j

    j(1 )

    n j

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    7/46

    Binomial Distribution

    Figure: Binomial PMF with Different Parameters

    0

    . 0 5

    . 1

    . 1 5

    . 2

    . 2 5

    0 10 20 30 40X

    p=0.5andn=20p=0.7andn=20p=0.5andn=40

    Figure: Binomial CPF with Different Parameters

    0

    . 2

    . 4

    . 6

    . 8

    1

    0 10 20 30 40X

    binomial1

    binomial2

    binomial3

    Binomial Distribution

    The expected value of a binomial variable X is:

    E(X ) = n,

    which is pretty intuitive since n is the number of trials, and is theprobability of success.

    Binomial Distribution

    The variance of a binomial variable X is:

    Var(X ) =x

    [X E(X )]2 f (x)

    =x

    (X n )2nx

    x (1 )n x

    = n (1 ).

    This means (again, intuitively) that the variability in a binomial variable is

    increasing in n , and

    largest when = 0 .5 for a xed value of n .

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    8/46

    Binomial Distribution

    A binomial variable is necessarily unimodal (except for the special case of aBernoulli variable with = 0 .5).

    It can also be skewed , depending on the value of .

    Because the binomial is used to model the number of successes out of aknown number of trials, it is widely used in the social sciences.

    For example, it is useful for modeling proportions or percentages where thedenominator is known.

    Example: We might believe that the number of yea votes for bills in the U.S.House of Representatives (out of n = 435 ) follows a binomial distribution.

    Geometric Distribution

    Another thought experiment is to consider repeating (independent) Bernoullitrials with probability of success until we observe the rst success.

    The number of independent Bernoulli trials needed to achieve one success is ageometric random variable.

    If X is a geometric random variable with parameter , then

    f (x) = (1 )x 1 .

    The geometric is thus a one-parameter distribution, with [0, 1]. We writethis as:X geometric ( )

    Geometric Distribution

    The geometric PMF is:

    f (x) = (1 )x 1

    And the geometric CPF the cumulative probability of a rst success in

    {1, 2,... } trials is:

    F (x) =x

    j =1

    (1 )x 1

    = 1

    (1

    )x

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    9/46

    Geometric Distribution

    The expected value of the geometric distribution is:

    E(X ) = 1

    This suggests that as the probability of success declines, we expect to

    undertake more and more trials before observing our rst success.

    The variance of X is:

    Var(X ) = 1

    2

    In other words, the variance gets arbitrarily close to zero as the probability of success approaches 1.0, and arbitrarily large as 0.

    Negative Binomial Distribution

    The negative binomial can be thought of as a generalization or variant of thegeometric distribution.

    Imagine we begin conducting independent Bernoulli trials with probability of success , and stop the trials upon observing r successes.

    The distribution of the number of failures we observe (X ) before achieving ther th success is distributed according to a negative binomial distribution.

    Technically, the negative binomial is a continuous distribution the specialcase where r is an integer value is known as the Pascal distribution (and thereal-valued case the Polya distribution ).

    Negative Binomial Distribution

    A negative binomial variable X has support on the nonnegative integers.

    A simple expression of the PMF for a negative binomial variable is:

    f (x) =r + x 1

    r 1 r (1 )x

    It turns out that there are lots of ways of formulating the negative binomialdistribution.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    10/46

    Negative Binomial Distribution

    The negative binomial PMF is:

    f (x) =r + x 1

    r 1 r (1 )x

    And the negative binomial CPF the probability of observing x or fewerfailures before the r th success is:

    F (x) =x

    j =0

    r + j 1r 1

    r (1 ) j .

    Importantly, one can show via a bit of algebra that this value is equal to oneminus the CPF of the binomial distribution (hence the name).

    Negative Binomial Distribution

    Similar to a geometric variable, the expected value of a negative binomialvariable is:

    E(X ) = (1 )r

    and the variance is:

    Var(X ) = (1 )r

    2

    Negative Binomial Distribution

    There are at least two ways of thinking about the negative binomialdistribution:

    1 As a generalization of the geometric (and, in fact, the negative binomialdistribution reduces to the geometric when r = 1 ).

    2 Or as a Poisson variable with heterogeneity.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    11/46

    Poisson Distribution

    Now consider a very large number of n Bernoulli trials, where the probability of an event in any one trial is small.

    In such a situation, the total number of events observed will follow a Poissondistribution.

    Poisson Distribution

    Formally, for n independent Bernoulli trials with (sufficiently small) probabilityof success and where n > 0, the probability of observing exactly x totalsuccesses as the number of trials grows without limit is:

    f (x) = limn

    nx

    n

    x

    1 n

    n x

    = x exp( )

    x!

    This is sometimes known as the Law of Rare Events motivation for thePoisson distribution and comes from Simeon-Denis Poisson in 1837.

    Poisson Distribution

    The Poisson PMF is:

    f (x) = x exp( )

    x!

    And the Poisson CPF is:

    F (x) =x

    j =0

    j exp( )x!

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    12/46

    Poisson Distribution

    An alternative way to think about the Poisson distribution is with an abstractmodel of event counts .

    Suppose we are interested in studying events, and that those events occur overtime.

    We might consider the constant rate at which events occur; call this rate .Its useful to think of as the expected number of events in any particular timeperiod of length h .

    Imagine further that the events in question are independent ; that is, theoccurrence of one event has no bearing on the probability that another willoccur.

    Poisson Distribution

    If the process that gives rise to the events in question the event process conforms to these assumptions, then it can be shown that as the length of theinterval h 0,

    The probability of an event occurring in the interval (t, t + h] = h

    The probability of no event occurring in the interval (t, t + h] = 1 h

    Such a variable is known as a Poisson process : events occur independently with

    a constant probability equal to times the length of the interval, i.e., h .

    Poisson Distribution

    Now, consider our outcome variable X t as the number of events that haveoccurred in the interval t of length h .

    For such a process, the probability that the number of events occurring in(t, t + h] is equal to some value x {0, 1, 2, 3,... } is:

    f (x) = exp(h )h x

    x!

    If all the intervals are of the same length (and equal to 1), this reduces to:

    f (x) = exp() x

    x! ,

    which is the Poisson distribution.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    13/46

    Poisson Distribution

    f (x) = exp(h )h x

    x!

    This is the way we typically see the Poisson distribution written.

    By this logic, the Poisson distribution is the limiting distribution for the numberof independent (Poisson) events occurring in some xed period of length h .

    The assumptions underlying the event process constant arrival rates , andindependence across events are key to deriving the Poisson distribution inthis way.

    If we relax these assumptions, the resulting distribution(s) are not Poisson.

    Poisson Distribution

    The Poisson distribution has several important characteristics:

    It is a discrete probability distribution, with support on the non-negativeintegers.

    The rate can be interpreted as the expected number of events duringan observation period t . In fact, E (X ) = .

    As increases, several interesting things happen:1 The mean/mode of the distribution gets bigger.2

    The variance gets larger as well since the variable is bounded frombelow, its variability will necessarily get larger with its mean. Infact, E (X ) = Var(X ) = .

    3 The distribution becomes more Normal.

    Poisson Distribution

    Figure: Empirical Poisson Variates, with Varying s

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    14/46

    Poisson Distribution

    The Poisson distribution is often used as a baseline for time series and spatialcount data: if the events are relatively rare and independent, they should bePoisson-distributed.

    If a process is not consistent with the Poisson E (X ) = Var(X ) = , then it iseither overdispersed or underdispersed.

    Poisson Distribution

    Overdispersion : E(X ) < Var(X )

    This occurs when the occurrence of one event makes the occurrence of subsequent or adjacent events more likely, so that events bunch together.

    Because they are bunched, some sample intervals will have an unusuallylarge number of events, others will have an unusually small number of events.

    This leads back to a Negative Binomial count model.

    Poisson Distribution

    Underdispersion : E(X ) > Var(X )

    This occurs when the occurrence of one event makes the occurrence of subsequent or adjacent events less likely, so that events are spaced outevenly.

    Because they are regularly spaced, all sample intervals will have about thesame number of events, so Var (X ) will be small.

    Examples : Political (or marital.. . ) honeymoon periods where agovernment is unlikely to fall/resign immediately after taking office.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    15/46

    Poisson Distribution

    Note as well that the Poisson distribution. . .

    . . . is not preserved under affine transformations that is, affinetransformations of Poisson variables are not themselves (necessarily)Poisson variables as well.

    . . . is preserved under addition (convolution) provided that thecomponents are independent. That is, for two Poisson variablesX 1 Poisson (X 1 ) and X 2 Poisson ( X 2 ),Z = X 1 + X 2 Poisson (X 1 + X 2 ) iff X 1 and X 2 are independent .However,. . . the same is not true for differences of Poisson variables.

    Multinomial Distribution

    All the distributions weve talked about so far have their roots in the Bernoulli,which means that there have been only two potential outcomes (weve calledthem success and failure).

    But suppose that instead of just two possibilities, we instead had K possibledistinct outcomes for each trial, where each possible outcome has somecorresponding probability of happening on each trial k , and Kk =1 k = 1 .

    Multinomial Distribution

    We can think of a multi-outcome analogue to the binomial, where the variableX k denotes the number of times we observe outcome k out of n trials.

    The k-vector X denotes these k distinct variables:

    X =

    X 1X 2

    ...X K

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    16/46

  • 8/13/2019 Lecture6 Handouts

    17/46

    Discrete Distributions

    Figure: Relationship Among Discrete Distributions

    n

    0

    1 F()

    r = 1

    n = 1

    k = 2

    Multinomial

    Bernoulli Binomial

    Negative Binomial

    Poisson

    Geometric

    n = # of trials

    = Pr(success)

    r = # of successes

    k = # of outcomes

    Uniform Distribution

    If a variable X is uniformly distributed on the range [a, b], then its PDF is:

    f (x) =

    1b a for a x b0 for x < a or x > b

    We write this as X U (a, b).

    The Uniform is a two-parameter distribution, where the parameters are theminimum and the maximum (the bounds), which may fall anywhere in R.

    We can most easily think of this as a rectangular shape of probability, locatedbetween a and b in the real number line, with length ba and heightequal to 1b a .

    Uniform Distribution

    Figure: Various Uniform PDFs

    3 2 1 0 1 2 3

    0 . 0

    0 . 2

    0 . 4

    0 . 6

    0 . 8

    1 . 0

    x value

    D e n s i t y

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    18/46

    Uniform Distribution

    The uniform PDF is:

    f (x) =

    1b a for a x b0 for x < a or x > b

    And since a draw from a uniform distribution has equal probability of fallinganywhere on the real line between a and b, the CDF takes on an especiallysimple form:

    F (x) = f (x)dx =0 for x < a

    x ab a for a x < b1 for x b

    Uniform Distribution

    Figure: Various Uniform CDFs

    3 2 1 0 1 2 3

    0 . 0

    0 . 2

    0 . 4

    0 . 6

    0 . 8

    1 . 0

    x value

    C u m u

    l a t i v e

    P r o

    b a

    b i l i t y

    Because we are just integrating over a constant value of f (x), the CDF lookslike a sloped line extending from 0 to 1 over the range from a to b.

    Uniform Distribution

    The expected value of a uniform variable X is:

    E(X ) = X Med = a + b

    2

    and the variance is:

    Var(X ) = ( ba)2

    12 .

    Note as well that the mode of a uniform variable is any value in [a, b], and thatits skewness is zero.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    19/46

    Standard Uniform Distribution

    A uniform variable with a = 0 and b = 1 is often referred to as a standard uniform variable.

    The standard uniform has the property that:

    1 X U (0, 1)In other words, the variable X and its complement have the same distribution.

    The standard uniform distribution turns out to be very useful in generatingother random variables, since it is the range over which a probability varies.

    Thus, if we want to generate random (equiprobable) data from somedistribution, we start with a standard uniform variable, and then transform thatvariable by the inverse of the relevant PDF.

    Normal Distribution

    If X is a normally distributed variable with mean and variance 2 , then:

    f (x) = 1 2 exp

    (x )222

    The Normal, sometimes called the Gaussian, is a two-parameter distributionand we write it as X N (, 2 ); we say X is distributed normally with meanmu and variance sigma squared.

    The symbol (little phi) is often used as a shorthand to represent the normaldensity:X , 2

    Normal Distribution

    The corresponding normal CDF, denoted by the symbol (big phi), theprobability of a normal random variable taking on a value less than or equal tosome specied number is the indenite integral of the normal density.

    F (x) , 2 (x) = , 2 f (x)dx.Unfortunately, there is no simple closed-form expression for this integral and itsevaluation requires the use of numerical integration techniques.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    20/46

    Normal Distribution

    Figure: Various Normal Densities

    -5 0 5 10

    0 . 0

    0 . 1

    0 . 2

    0

    . 3

    0 . 4

    Comparison of Normal Distributions

    x value

    D e n s i t y

    Distributions

    mean=0,var=1mean=2,var=1

    mean=-3,var=2mean=5,var=4

    Normal Distribution

    Figure: Various Normal CDFs (parameters as in previous gure)

    5 0 5 10

    0 . 0

    0 . 2

    0 . 4

    0 . 6

    0 . 8

    1 . 0

    x value

    C u m u l a

    t i v e

    P r o

    b a

    b i l i t y

    Normal Distribution

    The most common justication for the normal distribution has its roots in thecentral limit theorem .

    Consider i = {1, 2, ...N } independent, real-valued random variables X i , eachwith nite mean i and variance 2i > 0.

    Consider a new variable X dened as the sum of these variables:

    X =N

    i =1

    X i

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    21/46

    Normal Distribution

    Then we know that

    E(X ) =N

    i =1

    i

    = < and

    Var(X ) =N

    i =1

    2i

    = 2 < .

    Normal Distribution

    The central limit theorem states that:

    limN

    X = limN

    N

    i =1

    X iD

    N ()

    where the notation D indicates convergence in distribution.In other words, as N gets sufficiently large, the distribution of the sum of N independent random variables with nite mean and variance will converge to anormal distribution.

    Thus, a normal distribution is appropriate when the observed variable X cantake on a range of continuous values, and when the observed value of X canbe thought of as the product of a large number of relatively small, independentshocks or perturbations.

    Normal Distribution

    Properties of the Normal distribution:

    A normal variable X has support in R .

    The normal is a two-parameter distribution, where (, ) and2 (0, ).The normal distribution is always symmetrical (mean, mode, and medianare the same) and mesokurtic.

    The maximum height of the normal is attained at X = , and the pointsof inection are at .The normal distribution is preserved under a linear transformation, i.e., if X N (, 2 ), then aX + b N (a + b, a2 2 ).

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    22/46

    Normal Distribution

    The importance of the Normal distribution lies in its relationship to the centrallimit theorem.

    The central limit theorem basically notes that as ones sample size increases,the distribution of sample means (or other estimates) approaches a normaldistribution.

    Due to the complexity of its functional form, we often work with somethingknown as the standard Normal distribution rather than the Normal distribution.

    Standard Normal Distribution

    One linear transformation of the normal distributions is particularly useful:

    b = ,

    a = 1 .

    This yields:

    ax + b N (a + b, a2 2 ) N (0, 1)

    This is the standard Normal density function . We often denote this (), andsay that X is distributed as standard normal.

    Standard Normal Distribution

    We can also get the standard Normal distribution by transforming(standardizing) the normal variable X . . .

    If X N (, 2 ), then Z = (x ) N (0, 1).

    The density function then reduces to:

    f (z ) (z ) = 1 2 exp

    (z )2

    2

    We often write the CDF for the standard Normal as ().The standard Normal is sometimes called the Z distribution.

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    23/46

    Standard Normal Distribution

    Figure: Standard Normal Distribution

    z = 1.0

    z = 1.7

    The Z value is the number of standarddeviations away from the mean.

    Because the mean of the standard Normal is 0, the standard deviation is 1,and the distribution is symmetric, we can dene values along the Z distributionwith regard to the distance from the mean.

    Standard Normal Distribution

    We will often want to calculate cumulative probabilities for standard Normal(and Normal) distributions.

    Most statistics books include a table at the back where the cumulativeprobabilities are calculated for different z scores.

    Lets look at how this works.

    Standard Normal Distribution

    Figure: Standard Normal Distribution

    z = 1.5

    What is the probability that Z 1.5?

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    24/46

    Standard Normal Distribution

    Figure: Standard Normal Distribution

    z = 1.5

    What is the probability that Z 1.5? 0.067 or 6.7%.

    Standard Normal Distribution

    Example : What is Pr( 0 z 1)?

    Standard Normal Distribution

    Example : What is Pr( 0 z 1)?We can solve this by taking the probability above 0 and subtracting from it theprobability above 1.

    Pr(0 z 1) = Pr(z > 0) Pr(z > 1)= 0 .5 0.159 = 0 .341 34%

    Example : What is Pr( 1 < z < 1.5)?

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    25/46

    Standard Normal Distribution

    Example : What is Pr( 0 z 1)?We can solve this by taking the probability above 0 and subtracting from it theprobability above 1.

    Pr(0 z 1) = Pr(z > 0) Pr(z > 1)= 0 .5

    0.159 = 0 .341 34%

    Example : What is Pr( 1 < z < 1.5)?

    We can solve this by taking the probability above 1 and subtracting from it theprobability above 1.5.

    Pr(1 < z < 1.5) = Pr(z > 1) Pr(z > 1.5)= 0 .159 0.067 = 0 .092 9%

    Standard Normal Distribution

    Example : What is Pr( 1 < z < 2)?

    Standard Normal Distribution

    Example : What is Pr( 1 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

    Pr(z > 2) = 0 .023Pr(z < 1) = Pr(z > 1) = 0 .159

    Pr(1 < z < 2) = 1 0.023 0.159 = 0 .818 82%

    Example: What is Pr( 2 < z < 2)?

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    26/46

    Standard Normal Distribution

    Example : What is Pr( 1 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

    Pr(z > 2) = 0 .023Pr(z < 1) = Pr(z > 1) = 0 .159

    Pr(1 < z < 2) = 1 0.023 0.159 = 0 .818 82%Example: What is Pr( 2 < z < 2)?We can solve this by subtracting the two tail areas from the total area of 1.

    Pr(2 < z < 2) = 1 Pr(z < 2) Pr(z > 2)= 1 2(0.023) = 0 .954 95%

    Standardization and Z Scores

    But what if were not dealing with a standard Normal distribution?

    Suppose we want to know the area under a general normal curve i.e.Pr( x1 < x < x 2 ), where x1 and x 2 are the lower and upper bounds of someinterval.

    To calculate this, we simply need to know the relationship betweenPr( z 1 < z < z 2 ) and Pr( x1 < x < x 2 ).

    Standardization and Z Scores

    Note that we already showed how we could take a normally distributed X andturn it into a standard Normal Z through a linear transformation.

    Recall that if X N (, 2 ), then Z = (x ) N (0, 1).

    We know how to nd cumulative probabilities using a standard Normal table.

    It follows that we can re-transform these probabilities back into the originalvariable X .

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    27/46

    Standardization and Z Scores

    The fact thatx

    = z

    implies that

    x = z +

    Inserting the two endpoints of our interval of interest produces

    x1 = z 1 + and x 2 = z 2 +

    And substituting the values into our probability statement,

    Pr(x1 < x < x 2 ) = Pr(z 1 + < z + < z 2 + )

    Standardization and Z Scores

    Canceling out the common terms gives us:

    Pr(x1 < x < x 2 ) = Pr(z 1 < z < z 2 ),

    where

    z 1 = x 1

    z 2 = x 2

    Standardization and Z Scores

    Example : Consider a random variable X which is distributed with mean veand standard deviation 2. We can determine the probability that 2 < x < 3.

    z 1 = 2 5

    2 =

    32

    z 2 = 3 5

    2 = 1

    This implies that Pr(2 < x < 3) = Pr( 32 < z < 1).

    Notes

    Notes

    Notes

    http://find/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    28/46

    Standardization and Z Scores

    Pr(2 < x < 3) = Pr( 32 < z < 1).So we just subtract the probability that z < 32 from the probability thatz < 1.

    Pr(2 x 3) = Pr(z < 1) Pr z < 32

    We can solve this using symmetry:

    Pr(2 x 3) = Pr(z < 1) Pr z < 32

    = Pr(z > 1) Pr z > 32

    = 0 .159 0.067 = 0 .092 = 9 .2%

    Standardization and Z Scores

    Example : Suppose the trout in a lake have lengths that are approximatelynormally distributed with mean of 9.5 and a standard deviation of 1.4. Whatproportion of them have a length greater than 12? What proportion of themhave a length greater than 10?

    Standardization and Z Scores

    Example : Suppose the trout in a lake have lengths that are approximatelynormally distributed with mean of 9.5 and a standard deviation of 1.4. Whatproportion of them have a length greater than 12? What proportion of themhave a length greater than 10?

    We can start by standardizing the score x = 12 .

    z = x

    =

    12 9.51.4

    = 1 .79

    Thus

    Pr(x > 12) = Pr(z > 1.79) = 0 .037 4%

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    29/46

  • 8/13/2019 Lecture6 Handouts

    30/46

    Standardization and Z Scores

    The proportion of non-defective washers is the area under the standard normalcurve between -1.2 and 1.2.

    Pr(1.2 < z < 1.2) = 2 Pr(0 < z < 1.2) = 2 (Pr(z > 0) Pr(z > 1.2)= 2

    (0.5

    0.115) = 2

    0.385 = 0 .77 = 77%

    Thus, the percentage of defective washers is 100% 77% = 23% .

    Standardization and Z Scores

    To go to Normal Probabilities and Normal Probabilities (One Tail), click here

    Standardization and Z Scores

    Example : If X is a random normal variable with mean and variance 2 , ndPr( < x < + ).

    Notes

    Notes

    Notes

    http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://goback/http://find/http://goback/http://www.brookscole.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&flag=student&product_isbn_issn=9780495110811&disciplinenumber=17http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    31/46

    Standardization and Z Scores

    Example : If X is a random normal variable with mean and variance 2 , ndPr( < x < + ).

    z 1 = x 1

    = ( ) )= 1

    z 2 = x 2

    = ( + ) )

    = 1

    Pr(1 < z < 1) = Pr(0 < z < 1) + Pr(0 < z < 1) = 0 .3413 + 0 .3413 = 0 .6826.

    Standardization and Z Scores

    Example : If X is a random normal variable with mean and variance 2 , ndPr( 2 < x < + 2 )

    Standardization and Z Scores

    Example : If X is a random normal variable with mean and variance 2 , ndPr( 2 < x < + 2 )

    Pr( 2 < x < + 2 ) = Pr( 2)

    < z qnorm(0.9)[1] 1.281552

    Notes

    Notes

    Notes

    http://www.statmethods.net/advgraphs/probability.htmlhttp://www.statmethods.net/advgraphs/probability.htmlhttp://find/http://goback/http://find/http://goback/http://www.statmethods.net/advgraphs/probability.htmlhttp://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    43/46

    Generating Random Variables (R )

    Suppose you wanted to generate 100 observations from a stan dard normaldistribution:

    > rnorm(100)[1] 1.796717125 -0.144005643 -0.095465392 1.821761827 0.720954702. . .

    [96] 0.346642718 -0.562705574 2.106901016 -0.313824110 -1.116299545

    Each function has parameters specic to that distribution.

    Example : rnorm(100, m=50, sd=10) generates 100 random variables from anormal distribution with mean 50 and standard deviation 10.

    Generating Random Variables (R )

    Example : For a Normal distribution with = 5 and = 1 .414 , we have:

    > Xnorm Xbinom5point2

  • 8/13/2019 Lecture6 Handouts

    44/46

    Generating Random Variables (Stata )

    Drawing random variables in Stata is similar to R except that Stata works inobservations, whereas R works with vectors.

    The following will give you 1000 normal variables with mean 5 and variance 2(i.e. standard deviation 1.414)

    . clear. set obs 1000

    . gen Xnorm = rnormal(5, 1.414)

    For the binomial:

    . clear

    . set obs 10000

    . gen Xbin = rbinomial(5, 0.2)

    Generating Random Variables (Stata )

    Figure: Ten Thousand Draws from a Binomial(5, 0.2) Distribution

    Xbinom5point2

    P e r c e n

    t o

    f T o

    t a l

    0

    10

    20

    30

    40

    0 1 2 3 4 5

    Generating Random Variables: R and Stata commands

    Table: Commands for Generating Random Variates

    Distribution R StataBinomial (n, ) rbinom() rndbinGeometric ( ) rgeom() ?Negative Binomial (n, ) rnbinom() ?Poisson ( ) rpois() rndpoiUniform (0 , 1) r uni f( ) u ni for m( )Normal (0 , 1) rnorm() invnorm(uniform())Lognormal (0 , 1) rlnorm() xlgnStudents t (k ) rt() rndtChi-Square (k ) rchisq() rndchiF (k, ) rf rndf

    Stata commands marked with an asterisk are from Hilbes rnd group of commands.?s indicate that Im not aware of any canned way of doing this, though one canalways generate them by hand using the appropriate PDF function.

    Notes

    Notes

    Notes

    http://find/http://goback/http://find/http://goback/http://find/http://goback/
  • 8/13/2019 Lecture6 Handouts

    45/46

    Generating Random Variables

    When we generate random draws from a distribution, theyre not really randomin the truest sense.

    Instead, the values generated by random number generators (RNGs) are(usually) what we refer to as pseudorandom (PRNG).

    This means that they start with some original number or set of numbers (calleda seed ) and then they use deterministic functions of that number to generaterandom numbers.

    Generating Random Variables

    The key constraint of a PRNG is the cycle (also known as a period ormodulus ).

    Since PRNGs are generated by a deterministic process, once a particularnumber r k is encountered again, every subsequent random number r t willequal r t k.

    The 1997 invention of the Mersenne twister algorithm, by Makoto Matsumotoand Takuji Nishimura, avoids many of the problems with earlier generators.

    It has the colossal cycle of 219937 - 1 iterations (> 43 106000 ), it is proven tobe equidistributed in (up to) 623 dimensions (fo r 32-bit values), and i t runsfaster than other statistically reasonable generators.

    Note that the theoretical maximum cant be obtained since computers havenite memory for the storage of numbers.

    Generating True Random Variables

    In some applicationsnotably cryptography and lotteriesit is useful to havetrue random number generators that use physical processes that are chaotic orotherwise known to be truly random. These numbers do not repeat.

    Examples include:

    Lava lamps: An early system by the Silicon Graphics Inc. used lava lamps,which are chaotic, and photodetectors to generate true random numbers.

    Radioactive decay.

    Atmospheric noise.

    Noise generated by loose electrical connections.

    You can nd true random number generators at http://random.orghttp://random.org .

    Notes

    Notes

    Notes

    http://random.org/http://random.org/http://find/http://random.org/http://find/http://find/
  • 8/13/2019 Lecture6 Handouts

    46/46

    Random Seeds

    Given a particular seed, all pseudo-random numbers generated from that seed will occur in exactly the same order , i.e., the seed determines the sequence of random numbers.

    An important property of this is that it allows one to go back and replicateexactly what one has done before, even though the values generated arerandom, so long as we set the seed to some known number(s) at the outset (and use the same pseudo-random number generation algorithm).

    Random Seeds: R

    > seed set.seed(seed) # setting the system seed> rt(3,1) # three draws from a t distrib. w/1 d.f.[1] -0.1113 -0.7306 1.9839> seed set.seed(seed) # resetting the seed> rt(3,1) # different values for the draws[1] -0.5211 7.9161 -155.3186> seed set.seed(seed)> rt(3,1) # identical values of the draws[1] -0.1113 -0.7306 1.9839

    Random Seeds: Stata

    . clear

    . set obs 3obs was 0, now 3. set seed 3229. gen T1=invttail(1,uniform()). set seed 1077. gen T2=invttail(1,uniform()). set seed 3229. gen T3=invttail(1,uniform()). list

    +-----------------------------------+| T1 T2 T3 ||-----------------------------------|

    Notes

    Notes

    Notes