non_unif_HO

download non_unif_HO

of 58

Transcript of non_unif_HO

  • 8/11/2019 non_unif_HO

    1/58

  • 8/11/2019 non_unif_HO

    2/58

    1 Introduction

    2 Inversion method

    3 Rejection method

    4 R functions

    Important families of discrete random variablesImportant families of continuous random variables

    5 The bootstrap

    6 Gaussian vectors

    7 Exercises

    G. Guillot (DTU) Non uniform random numbers - 02443 2 / 58

  • 8/11/2019 non_unif_HO

    3/58

    Introduction

    Introduction

    From today on, we assume that we have a reliable (pseudo-)RNG thatproduces uniform random variables on[0, 1].We can use runif() in R for this purpose.

    How to simulate non uniform

    discrete random variables

    e.g. from the geometric distribution: waiting time of the first headP(X =k) = (1 p)k1pX Geom(p) p [0, 1], k Ncontinuous random variablese.g. the Gaussian (or normal) random distribution with pdf

    f(x) = 1

    2exp[ (x )

    2

    22 ]

    random vectors, i.e. random variables with values in Rp

    We can do that by simulating uniform numbers and makingsomething to them...

    G. Guillot (DTU) Non uniform random numbers - 02443 3 / 58

  • 8/11/2019 non_unif_HO

    4/58

    I i h d

  • 8/11/2019 non_unif_HO

    5/58

    Inversion method

    Anamorphosis by the cdf

    Theorem: image of a rv through its one-to-one c.d.f.Let Xbe a continuous r.v. with a one-to-one (bijection) c.d.fF.Let us define Y as Y =F(X). Then Y U([0, 1])

    ## Illustration of theorem on inversion

    h = seq(0,10,.01) ; lambda = 1/2plot(h,dexp(h,rate=lambda),type=l,col=2,lwd=2,xlab=,ylab=,ylim=c(0,1)) ; abline(v=0);abline(h=0)lines(h,pexp(h,rate=lambda),lty=1,lwd=2)x = rexp(n=1,rate=lambda) ; y = pexp(x,rate=lambda)points(x,0,col=2,lwd=2,pch=16,cex=2)points(x,y,col=1,lwd=2,pch=16,cex=2) ;arrows(x0=x,y0=0,x1=x,y1=y,lty=2,angle=15)points(0,y,col=3,lwd=2,pch=16,cex=2);arrows(x0=x,y0=y,x1=0,y1=y,lty=2,angle=15)

    n = 100

    h = seq(0,10,.01) ; lambda = 1/2plot(h,dexp(h,rate=lambda),type=l,col=2,lwd=2,xlab=,ylab=,ylim=c(0,1)) ; abline(v=0);abline(h=0)lines(h,pexp(h,rate=lambda),lty=1,lwd=2)x = rexp(n=n,rate=lambda) ; y = pexp(x,rate=lambda)points(x,rep(0,n),col=2,lwd=2,pch=1,cex=2)points(x,y,col=1,lwd=2,pch=1,cex=2)points(rep(0,n),y,col=3,lwd=2,pch=1,cex=2)

    G. Guillot (DTU) Non uniform random numbers - 02443 5 / 58

    I i th d

  • 8/11/2019 non_unif_HO

    6/58

    Inversion method

    The proof...

    Theorem: image of a rv through its one-to-one c.d.f.

    Let Xbe a continuous r.v. with a one-to-one (bijection) c.d.fF.Let us define Y as Y =F(X). Then Y U([0, 1])

    Proof: For yR

    we have

    P(Y y) = P(F(X) y)= P(X F1(y))= F(F1(y))

    = y

    This property suggests a method to simulate aU([0, 1])from an arbitrarydistrib. with a one-to-one c.d.f.

    G. Guillot (DTU) Non uniform random numbers - 02443 6 / 58

  • 8/11/2019 non_unif_HO

    7/58

    Inversion method

  • 8/11/2019 non_unif_HO

    8/58

    Inversion method

    Simulation of an exponential rv by inversion

    Imagine we want to simulate an exponential rv,

    i.e. with pdff(x) = exp(x)I[0,+](x); R+The corresponding cdf is F(x) = 1 exp(x)Fhas an inverse F1(y) = 1/ ln(1 y)Then X defined as

    1/ ln(1

    U)

    Exp()

    ## Simulation of an exponential rv by inversion

    lambda = 1/2 ; n=1000 ; u = runif(n)

    h = seq(0,10,.1) ; plot(h,dexp(h,rate=lambda),

    type=l,lwd=3,col=3,xlab=,ylab=)

    x = (-1/lambda) *log(1-u)points(x,rep(0,n),col=3)

    hist(x,breaks=seq(-0.1,max(x)+.1,.1),add=TRUE,prob=TRUE,col=3)

    ## the right way to code this in R without using rexp:

    qexp(runif(n))G. Guillot (DTU) Non uniform random numbers - 02443 8 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    9/58

    Rejection method

    Points uniform in a subset ofR2

    Definition

    Let Dbe a subset ofR2

    of finite area. A random vector U= (U1, U2)isuniform in D if for any A D, P(U A) = |A|/|D|.The associated density is f(u) =f(u1, u2) =

    1

    |D| ID(u1, u2)

    G. Guillot (DTU) Non uniform random numbers - 02443 9 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    10/58

    Rejection method

    Points uniform between a density curve and the x-axis

    Theorem

    Let gbe a pdf on R and Sgthe subset ofR2 defined as

    {(x, y); 0 y g(x)}i) If(X,Y)is uniformly distributed in Sg then X has gas pdf.

    ii) IfX

    g, U

    U([0, 1])and Y =Ug(X)then(X,Y)is uniformly

    distributed in Sg.

    G. Guillot (DTU) Non uniform random numbers - 02443 10 / 58

  • 8/11/2019 non_unif_HO

    11/58

    Rejection method

  • 8/11/2019 non_unif_HO

    12/58

    j

    Proof

    i) If(X,Y) U(Sg) then

    P(X x) = Area({(u, v); u x; 0 v g(u)})=

    x

    g(t)dt i.e. X g

    ii) IfX g, U U([0, 1])and Y =Ug(X),then(X,Y)has a joint density h(x, y)defined as

    h(x, y) =g(x) 1

    g(x)I[0,g(x)](y) = I[0,g(x)](y)

    Hence for B Sg,

    P((X,Y) B) =B

    h(x, y)dxdy=

    B

    I[0,g(x)](y)dxdy= |B|

    i.e. (X,Y)is uniformly distributed in Sg.G. Guillot (DTU) Non uniform random numbers - 02443 12 / 58

  • 8/11/2019 non_unif_HO

    13/58

    Rejection method

  • 8/11/2019 non_unif_HO

    14/58

    Rejection method

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    4

    5

    Target distribution with density f.

    G. Guillot (DTU) Non uniform random numbers - 02443 14 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    15/58

    Rejection method

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    4

    5

    An auxiliary distribution gfrom which we now how to sample.

    G. Guillot (DTU) Non uniform random numbers - 02443 15 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    16/58

    Rejection method

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    4

    5

    We assume that gcan majorate uniformly f after re-scaling.

    G. Guillot (DTU) Non uniform random numbers - 02443 16 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    17/58

    Rejection method

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    4

    5

    Generate some points(Xi, Yi)in the domain under the graph of there-scaled auxiliary density C g.

    G. Guillot (DTU) Non uniform random numbers - 02443 17 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    18/58

    Rejection method

    0.0 0.2 0.4 0.6 0.8 1.0

    0

    1

    2

    3

    4

    5

    Throw away points(Xi,Yi)outside Sf.The x coordinates of the remaining points follow the target distribution f.

    G. Guillot (DTU) Non uniform random numbers - 02443 18 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    19/58

    Remarks on the rejection method

    All Xivalues produced initially are drawn from gbut after dropping

    the bad ones, they are distributed according to f

    Method can be generalized easily in higher dimension

    G. Guillot (DTU) Non uniform random numbers - 02443 19 / 58

    Rejection method

  • 8/11/2019 non_unif_HO

    20/58

    Pseudo-code for the rejection algorithm

    Init:

    X =x0; Y =f(x0) + 1

    While Y>f(X) do:X gY U([0,C g(x)])End do

    Deliver X

    G. Guillot (DTU) Non uniform random numbers - 02443 20 / 58

  • 8/11/2019 non_unif_HO

    21/58

    R functions Important families of discrete random variables

  • 8/11/2019 non_unif_HO

    22/58

    Back to initial examples

    P(X =k) = (1 p)k1pX Geom(p) p [0, 1], k N

    f(x) = 1

    2exp[

    (x )2

    22 ]

    These distributions have nice concrete interpretations, which can be usedfor simulation.

    The geometric distribution: waiting time of the first head

    The Gaussian (or normal) random distribution can be seen as the sumofn i.i.d random variables (Central Limit Theorem)

    G. Guillot (DTU) Non uniform random numbers - 02443 22 / 58

  • 8/11/2019 non_unif_HO

    23/58

    R functions Important families of continuous random variables

  • 8/11/2019 non_unif_HO

    24/58

    Important families of continuous random variables

    Continuous uniform: all subsets of equal sizes equally likely

    fX(x) = 1

    b

    a

    I[a,b](x)

    X U([a, b])Normal (or Gaussian) distribution

    fX(x) = 1

    2exp

    1

    2

    x

    2X N(, )orX N(,

    2

    ) R, R+

    G. Guillot (DTU) Non uniform random numbers - 02443 24 / 58

  • 8/11/2019 non_unif_HO

    25/58

    R functions Important families of continuous random variables

  • 8/11/2019 non_unif_HO

    26/58

    Common R simulation functions for continuous simulation

    runif, rnorm, rexp, rgamma, rbeta, rlnorm, rf, rchisqSee ?distribution

    G. Guillot (DTU) Non uniform random numbers - 02443 26 / 58

    The bootstrap

  • 8/11/2019 non_unif_HO

    27/58

    Assessing the value of a sample

    Blurp on opinion pole & qiality control

    G. Guillot (DTU) Non uniform random numbers - 02443 27 / 58

    The bootstrap

  • 8/11/2019 non_unif_HO

    28/58

    Reminder: estimation, estimator, bias, variance, MSE

    G. Guillot (DTU) Non uniform random numbers - 02443 28 / 58

  • 8/11/2019 non_unif_HO

    29/58

    The bootstrap

  • 8/11/2019 non_unif_HO

    30/58

    Now imagine for a while that we do not know that V() =2/n

    or that2 = (Xi Xn)2/(n 1)

    G. Guillot (DTU) Non uniform random numbers - 02443 30 / 58

    The bootstrap

    A f i l l i i

  • 8/11/2019 non_unif_HO

    31/58

    A fairly general situation:

    An unknown parameter

    An estimator

    No known formula for the variance (or MSE) of

    G. Guillot (DTU) Non uniform random numbers - 02443 31 / 58

    The bootstrap

    B k t th i f

  • 8/11/2019 non_unif_HO

    32/58

    Back to the variance of

    By definition:variance = expected square of the difference with expectation

    V(Xn)can be thought of as the number obtained as follows

    Take a 1st i.i.d sample of size n, compute X(1)n

    Take a 2nd i.i.d sample of size n, compute X(2)n

    ...

    ...Take a B-th i.i.d sample of size n, compute X

    (B)n

    Compute V(Xn) = 1B

    Bj=1

    X(j)n 1/B Bj=1

    X(j)n2

    G. Guillot (DTU) Non uniform random numbers - 02443 32 / 58

  • 8/11/2019 non_unif_HO

    33/58

    The bootstrap

  • 8/11/2019 non_unif_HO

    34/58

    The previous quantity estimates V()more and more accurately as Bincreases(law of large numbers)

    Little problem: we do not have Bsamples of size n, we have only one!

    There is a get around ... just pull your bootstraps!

    G. Guillot (DTU) Non uniform random numbers - 02443 34 / 58

    The bootstrap

    Estimating V () with the so called bootstrap estimator

  • 8/11/2019 non_unif_HO

    35/58

    Estimating V()with the so-called bootstrap estimator(Efron, 1979)

    We have a single dataset(X1, ...,Xn)We can pretend to have Bdifferent samples of size n:

    sample Y(1)

    1 ,..., Y(1)n in(X1,...,Xn)uniformly with replacement,

    compute Y(1)n

    sample Y(2)

    1 ,..., Y(2)n in(X1,...,Xn)uniformly with replacement,

    compute Y(2)n......sample Y

    (B)1 ,..., Y

    (B)n in(X1, ...,Xn)uniformly with replacement,

    compute Y(B)n

    Note that Y(b)

    1 , ...,Y(b)n usually have ties

    Compute V(Xn) = 1

    B

    B

    j=1

    Y(j)n 1/B

    B

    j=1Y

    (j)n

    2

    G. Guillot (DTU) Non uniform random numbers - 02443 35 / 58

  • 8/11/2019 non_unif_HO

    36/58

  • 8/11/2019 non_unif_HO

    37/58

    The bootstrap

    Numerical comparison

  • 8/11/2019 non_unif_HO

    38/58

    Numerical comparison

    sd

  • 8/11/2019 non_unif_HO

    39/58

    Bootstrap-based confidence intervals

    Problem: give a C.I. for a parameter on the basis of a sample X1, ...,XnSolution:

    Build an estimator

    Assume that NEstimate V()by the bootstrap estimator V()

    Build the normal-based C.I. =

    +z/2

    V() ; +z1/2

    V()

    G. Guillot (DTU) Non uniform random numbers - 02443 39 / 58

    The bootstrap

    Take home message about the bootstrap

  • 8/11/2019 non_unif_HO

    40/58

    Take home message about the bootstrap

    A collection of several samples can be mimicked by resampling thedata

    A theoretical parameter that can be expressed as an expectation canbe estimated by an average over the fake samples

    The idea is very general (see example for the median below)

    G. Guillot (DTU) Non uniform random numbers - 02443 40 / 58

    Gaussian vectors

    Multivariate Gaussian (or normal) random vectors

  • 8/11/2019 non_unif_HO

    41/58

    Multivariate Gaussian (or normal) random vectors

    Examples of bivariate normal densities (n= 2)

    G. Guillot (DTU) Non uniform random numbers - 02443 41 / 58

    Gaussian vectors

    Multivariate Gaussian (or normal) random vectors

  • 8/11/2019 non_unif_HO

    42/58

    Multivariate Gaussian (or normal) random vectors

    Definition

    Let X= (X1,X2, . . . , Xn)T be a random vector

    X has an n-dimensional multivariate normal distribution if there is

    a kdimensionnal random vector Z with iid N(0, 1)entriesan n kdeterministic matrix Aan n dimensional deterministic vector b

    such that X and AZ + b have the same distribution

    Since A and b are fixed, we haveE[X] =b andVar[X] =AAT.

    G. Guillot (DTU) Non uniform random numbers - 02443 42 / 58

  • 8/11/2019 non_unif_HO

    43/58

    Gaussian vectors

    Reminder: the expectation (or mean) of a random vector

  • 8/11/2019 non_unif_HO

    44/58

    p ( )

    Definition: expectation of a random vector

    The mean vector = (i)of a random vector X= (X1,X2, . . . , Xn)T isdefined as the vector whose entries i arei=E(Xi) i= 1, ..., n

    G. Guillot (DTU) Non uniform random numbers - 02443 44 / 58

  • 8/11/2019 non_unif_HO

    45/58

    Gaussian vectors

    The density of a multivariate normal vector

  • 8/11/2019 non_unif_HO

    46/58

    y

    Let us assume that Y has a multivariate normal distribution in Rn. If thevariance-covariance matrix ofY is of full rank, then Y has a density onRn as follows:

    fY(y) = 1

    (2)n/2

    detexp1

    2(y )T1(y )

    is the expectation ofY

    We write Y Nn(,).

    G. Guillot (DTU) Non uniform random numbers - 02443 46 / 58

    Gaussian vectors

    Interest of the multivariate Gaussian distribution

  • 8/11/2019 non_unif_HO

    47/58

    Flexible: many results can be derived analytically

    Central Limit Theorem (CLT): many processes exhibit aGaussian-like distribution

    No obvious other alternative in multivariate statistics

    G. Guillot (DTU) Non uniform random numbers - 02443 47 / 58

  • 8/11/2019 non_unif_HO

    48/58

    Gaussian vectors

  • 8/11/2019 non_unif_HO

    49/58

    Examples of bivariate normal densities (n= 2)

    G. Guillot (DTU) Non uniform random numbers - 02443 49 / 58

    Gaussian vectors

    The Choleski factorization

  • 8/11/2019 non_unif_HO

    50/58

    The Choleski factorization

    Let be the covariance matrix of a random vector.

    Being a symetric positive-definite matrix, can be written = LU

    where L is lower triangular and U = LT.

    This is known as the Choleski or (LU) factorization

    Useful everywhere in numerical analysis because computations withtriangular matrices are fast

    G. Guillot (DTU) Non uniform random numbers - 02443 50 / 58

  • 8/11/2019 non_unif_HO

    51/58

  • 8/11/2019 non_unif_HO

    52/58

    Gaussian vectors

    The LUalgorithm for simulating general random vectors

  • 8/11/2019 non_unif_HO

    53/58

    Algorithm to simulate a random vectors with mean and variance matrix

    1

    Simulate X= (X1,...,Xn)T

    centred (E[Xi] = 0) andVar[X] =In2 Find L such that =LU

    3 Compute Y = + LX

    G. Guillot (DTU) Non uniform random numbers - 02443 53 / 58

    Gaussian vectors

    The LUalgorithm forsimulating MVN random vectors

  • 8/11/2019 non_unif_HO

    54/58

    Algorithm to simulate amultivariate Gaussianrandom vectors with mean and variance matrix

    1 Simulate X= (X1,...,Xn)T i.i.dN(0, 1)2 Find L such that =LU

    3 Compute Y= (Y1, ...,Yn)T =+ LX

    G. Guillot (DTU) Non uniform random numbers - 02443 54 / 58

    Gaussian vectors

    R code for simulation of a bivariate Gaussian vector with the LU method

  • 8/11/2019 non_unif_HO

    55/58

    ## Define target densitysd =1 ; rho = .8Sigma = matrix(nr=2,nc=2,data=c(sd,rho,rho,sd))Sigma

    ## Choleski factorisationU = chol(Sigma) ; L = t(U)LL %*% U ## just checking

    # simulationn = 1000x1 = rnorm(n) ; x2 = rnorm(n)X = rbind(x1,x2)Y = L %*% XY = t(Y)plot(Y,asp=1)

    ## cheking that the empirical var-covar matrix is close to the targetvar(Y)

    Sigma

    ## cheking that empirical bivariate densityrequire(MASS)image(kde2d(Y[,1],Y[,2]),asp=TRUE) ; points(Y,cex=.1,col=3)contour(kde2d(Y[,1],Y[,2]),add=TRUE)

    G. Guillot (DTU) Non uniform random numbers - 02443 55 / 58

  • 8/11/2019 non_unif_HO

    56/58

  • 8/11/2019 non_unif_HO

    57/58

    Exercises

    Exercises II

  • 8/11/2019 non_unif_HO

    58/58

    5 You want to sell your car. You receive a first offer at a price of 12 (say in K Euro) Since itis the first offer, you decide to reject it and wait a little bit to smell the market. You

    reject offers and wait until you get the first offer strictly larger than 12. Study bysimulation the waiting time (counted in number of offers untill the sell). You can forinstance estimate the expectation of the waiting time. Assume that the Xis are i.i.dPoisson(10). Consider the solution conditionally on X1 = 12, then unconditionally (i.e. inaverage over all possible X1 values under the assumption that the Xis are i.i.dPoisson(10)). The use ofrpois is allowed in this exercise.

    6 Variance in the estimation of the median of a Poisson distributionSimulate a single dataset of say n= 200observations from a Poisson distributionwith parameter10Estimate the theoretical median from your sampleImplement the bootstrap estimation technique to evaluate the variance of yourestimator

    7 Simulate a sample of size 50000 from a bivariate centred, standardized Gaussian vectorwith correlation coefficient = 0.7. Plot this sample. Extract a sample of a randomvariable approximately distributed as Y2|Y1 =.5. Estimate its mean and variance. Plot itsempirical histogram and compare it to a normal density with same parameters as the onesimulated above.

    G. Guillot (DTU) Non uniform random numbers - 02443 58 / 58