14_RandomNumber

download 14_RandomNumber

of 140

Transcript of 14_RandomNumber

  • 7/28/2019 14_RandomNumber

    1/140

    BSTA 670 Statistical Computing

    27 October 2010

    Lecture 14:

    Random Number Generation

    Presented by:Paul Wileyto, Ph.D.

  • 7/28/2019 14_RandomNumber

    2/140

    Anyone who uses software to produce random

    numbers is in a state of sin.

    John von Neumann

    A good analog

    random numbergenerator.

  • 7/28/2019 14_RandomNumber

    3/140

    Why do we need Random

    Numbers?

    Simulation Input

    Statistical Sampling Assignment in Trials

    Games

  • 7/28/2019 14_RandomNumber

    4/140

    Where do you get random numbers?

    Uniform Random Numbers Published Tables

    Make Them

    Computer Algorithms Harvest from Nature

    Random draws from a specificdistribution

    Make them from Uniform RandomNumbers

  • 7/28/2019 14_RandomNumber

    5/140

    Software Random Number Generators

    There are no true random numbergenerators but

    There are Pseudo-Random NumberGenerators

    Computers have only a limited number ofbits to represent a number

    Sooner or later, the sequence of randomnumbers will repeat itself (period of the

    generator) The trick is to be good enough to look likerandom numbers

  • 7/28/2019 14_RandomNumber

    6/140

    Algorithms for Uniform Random Numbers

  • 7/28/2019 14_RandomNumber

    7/140

    Good pseudo-random numbers:

    Independent of the previous number

    Long period

    Sequence reproducible if started with

    same initial conditions

    Fast

  • 7/28/2019 14_RandomNumber

    8/140

    Good pseudo-random numbers:

    Equal probability for any number inside

    interval [a,b]

    Probability Density:

    1,( )

    0, ,

    a x bf x b a

    x a x b

    =

    < >

  • 7/28/2019 14_RandomNumber

    9/140

    We are interested primarily inuniform random numbers in the

    interval [0,1].

    Well refer to the realization of a uniform

    random number over [0,1] as U. Many of the algorithms produce integer

    valued random numbers over interval

    [0,b]. Transform to interval [0,1]

  • 7/28/2019 14_RandomNumber

    10/140

    Linear Congruential Generator (LCG)

    -1

    Most common

    ( ) mod

    = seed, modulus m (large prime),

    muliplier , and increment c

    Repeats due to the modular

    arithmetic that forces wrapping

    of values into th

    n n

    o

    X X c m

    X

    = +

    e desired range.

    Mod in SAS

    proc iml; /* begin IML session */

    q={20,30,40,50,70,90,160};

    t=mod(q,7);

    qt=q||t;

    print qt; /* print matrix */

    quit;

    qt

    20 6

    30 2

    40 5

    50 1

    70 0

    90 6

    160 6

    SAS

  • 7/28/2019 14_RandomNumber

    11/140

    Linear Congruential Generator (LCG)

    -1

    Most common

    ( ) mod

    = seed, modulus m (large prime),

    muliplier , and increment c

    Repeats due to the modular

    arithmetic that forces wrapping

    of values into th

    n n

    o

    X X c m

    X

    = +

    e desired range.

    Mod in R

    q

  • 7/28/2019 14_RandomNumber

    12/140

    proc iml; /* begin IML session */

    seed = 123456;c = j(5,1,seed);

    b = uniform(c);

    print b;

    quit;

    b

    0.73902

    0.2724794

    0.7095326

    0.3191636

    0.367853

    Unit Random Variates in SAS

    SAS

  • 7/28/2019 14_RandomNumber

    13/140

    RNGkind()[1] "Mersenne-Twister" "Inversion"

    set.seed(as.integer(format(Sys.time(), "%S%M%H")))

    c

  • 7/28/2019 14_RandomNumber

    14/140

    proc iml; /* begin IML session */

    seed = 0;c = j(5,1,seed);

    b = uniform(c);

    print b;

    quit;

    b

    0.73902

    0.2724794

    0.7095326

    0.3191636

    0.367853

    Unit Random Variates in SAS

    Set seed to 0 to

    grab a seed value

    from the system

    clock.

    SAS

  • 7/28/2019 14_RandomNumber

    15/140

    RANUNI() and IML UNIFORM() use a multiplicative

    linear congruential generator (from SAS docs) where

    SEED = mod( SEED * 397204094, 2**31-1 )

    and then returns

    SEED / (2**31-1)

    SAS

  • 7/28/2019 14_RandomNumber

    16/140

    Testing Randomness

    Is it Uniform?

    0 0.2 0.4 0.6 0.8 10

    50

    100

    150

    200

    250

    300

    0 0.2 0.4 0.6 0.8 10

    0.5

    1

    1.5

    2

    2.5

    3x 10

    4

  • 7/28/2019 14_RandomNumber

    17/140

    Testing Randomness

    Generate two sets

    and plot against

    each other Might see

    correlation in higher

    dimensions

    PlotXi versusXi+k for

    serial correlation

    0

    .2

    .4

    .6

    .8

    1

    x

    0 .2 .4 .6 .8 1y

    0

    .2

    .4

    .6

    .8

    1

    x1

    0 .2 .4 .6 .8 1y2

  • 7/28/2019 14_RandomNumber

    18/140

  • 7/28/2019 14_RandomNumber

    19/140

    Linear Congruential Generator

    The good

    Fast

    Up to period of m random numbers

    The Bad

    Sequential correlation

    Plots in more than 1 dimension do not fill in the

    space uniformly, but tend to form bands

    Not cryptographically secure

    Selections of m, , and c are important

  • 7/28/2019 14_RandomNumber

    20/140

    Linear Congruential Generator

    Good magic number for linear

    congruent method:

    a = 16,807, c = 0, M = 2,147,483,647

  • 7/28/2019 14_RandomNumber

    21/140

    Overflow Method for integers

    Multiply two 32-bit numbers to get a 64bit integer, that cannot be represented

    in 32-bit space. Low order 32 bits remain after the

    overflow.

    Divide by 2

    32

    to get floating point valuesbetween 0 and 1.

    Very Fast

    1j jI aI c+ = +

  • 7/28/2019 14_RandomNumber

    22/140

    Blum, Blum, Shub

    Very slow Not suited to simulation

    Passes all tests

    Cryptographically secure

    ( )

    2

    1 mod , ,where p and q are large primes

    n nX X M M pq+= =

  • 7/28/2019 14_RandomNumber

    23/140

    Mersenne Twister

    By Matsumoto and Nishimura (1997) Caused a great deal of excitement in

    1997.

    Good statistical properties Not good for cryptography

    SAS IML RANDGEN function

    Default technique for R runif()

  • 7/28/2019 14_RandomNumber

    24/140

    Mersenne Twister

    Im just going to give you the flavor of it Its a bit shifting algorithm

    32 bit word:

    0000 1111

  • 7/28/2019 14_RandomNumber

    25/140

    Mersenne Twister

    XOR Logical bitwise comparison

    function Compares two bits

    If they are different, value is1

    If they are the same, value iszero

    >> a=[0 0 1 1]

    a =

    0 0 1 1

    >> b=[0 1 1 0]

    b =

    0 1 1 0

    >> c=xor(a,b)

    c =

    0 1 0 1

    >> MATLAB

  • 7/28/2019 14_RandomNumber

    26/140

    Mersenne Twister

    XOR Logical bitwise comparison

    function Compares two bits

    If they are different, value is1

    If they are the same, value iszero

    > a b c c

    [1] FALSE TRUE FALSE TRUE

    > as.integer(c)

    [1] 0 1 0 1

    R

  • 7/28/2019 14_RandomNumber

    27/140

    Mersenne Twister

    Bit shifting algorithm

    Use XOR function to flip values

    32 bit word:

    0001 1111

    XOR

  • 7/28/2019 14_RandomNumber

    28/140

    Mersenne Twister

    Use 624 32 bit words to make one19937 bit word (623*32 + 1) XOR flip function in each 32-bit word

    32 bit word:

    0000 1111

    XOR

    Tonext

    word

    From

    last

    word

  • 7/28/2019 14_RandomNumber

    29/140

    MersenneTwister

    From:

    John Savards

    Cryptology

    Page

    http://www.quadibloc.com

  • 7/28/2019 14_RandomNumber

    30/140

    Mersenne Twister

    By Matsumoto and Nishimura (1997) Mersenne Prime Numbers (powers of 2 1) give period length: 219937-1 for 32 bitnumbers

    Free C source code

    Fast

    Passes all randomness smell tests

    Not cryptographically secure

  • 7/28/2019 14_RandomNumber

    31/140

    proc iml; /* begin IML session */

    r = j(10,1,.);

    call randgen(r,'uniform');

    print r;quit;

    r

    0.01510130.5743561

    0.5829185

    0.6437729

    0.1823678

    0.3977417

    0.4768810.9845982

    0.3211301

    0.9623223

    SAS

  • 7/28/2019 14_RandomNumber

    32/140

    > RNGkind()

    [1] "Mersenne-Twister" "Inversion"

    > r=matrix(runif(10), 10,1)

    > r[,1]

    [1,] 0.14645262

    [2,] 0.04558767

    [3,] 0.79254901

    [4,] 0.57810786

    [5,] 0.57831079[6,] 0.30258424

    [7,] 0.08682622

    [8,] 0.77980499

    [9,] 0.34161593

    [10,] 0.98705945

    R

  • 7/28/2019 14_RandomNumber

    33/140

    Both R and SAS automatically grab a seed value

    from the system clock at first use, unless you call

    set.seed (in R) or randseed (in SAS) to set a specificstarting point

    Grabbing a Seed from theSystem Clock (SAS)

    SAS

  • 7/28/2019 14_RandomNumber

    34/140

    proc iml; /* begin IML session */

    call randseed(12345);

    r = j(10,1,.);

    call randgen(r,'uniform');print r;

    quit;

    r

    0.58329710.9936254

    0.5878877

    0.8574689

    0.8246889

    0.2805668

    0.64739690.3819192

    0.4489572

    0.8757847

    SAS

  • 7/28/2019 14_RandomNumber

    35/140

    > set.seed(12345)

    > r=matrix(runif(10), 10,1)

    > r[,1]

    [1,] 0.7209039

    [2,] 0.8757732

    [3,] 0.7609823

    [4,] 0.8861246

    [5,] 0.4564810[6,] 0.1663718

    [7,] 0.3250954

    [8,] 0.5092243

    [9,] 0.7277053

    [10,] 0.9897369

    R

  • 7/28/2019 14_RandomNumber

    36/140

    Obtaining Random Numbers from

    Specific Distributions Inverse Probability Transform

    Methods

    Rejection Methods

    Mixed Rejection and Transform

    Methods for Correlated Random

    Numbers

  • 7/28/2019 14_RandomNumber

    37/140

    Obtaining Random Numbers from

    Specific Distributions Inverse Probability Transform methods

    LetXbe a random variable described by CDF F(X)

    We wish to generate values ofXdistributedaccording to F(X).

    Given a continuous Uniform Random Variable U, in[0,1], the Random VariableX=F-1(U).

    { }1( ) inf | ( ) ,0 1F u x F x u u = = < r=matrix(runif(10000), 10000,1)

    >exrand=-log(1-r)/.04

    >hist(exrand, freq = FALSE)

    > mean(exrand)

    [1] 24.55222

    > 1/mean(exrand)

    [1] 0.04072951

    > hist(exrand, freq = FALSE)> help.search("means")

    R

  • 7/28/2019 14_RandomNumber

    45/140

    Histogram of exrand

    exrand

    Density

    0 50 100 150 200 250

    0.0

    00

    0.0

    10

    0.0

    20

    R

  • 7/28/2019 14_RandomNumber

    46/140

    ( )

    ( )1

    Survival Time: ( ) exp

    lnInverse Prob Transform:

    1.5, 0.001

    S U t

    Ut

    = =

    =

    = =

    Weibull Survival

    SAS

  • 7/28/2019 14_RandomNumber

    47/140

    proc iml; /* begin IML session */

    u = j(2000,1,.);

    call randgen(u,'uniform');

    wrand=(-log(1-u)/.001)##(1/1.5);tbl=u||wrand;

    print tbl;

    varnames={"u","weibrand"};

    create wrand from tbl [colname=varnames];

    append from tbl;

    Quit;

    proc means data=wrand;

    var u weibrand;

    run;

    title 'Analysis of Weibull RVs';proc univariate data=wrand noprint;

    histogram weibrand / midpoints=5 to 205 by 10 weibull;

    run;

    SAS

  • 7/28/2019 14_RandomNumber

    48/140

    SAS

  • 7/28/2019 14_RandomNumber

    49/140

    ( )

    ( )1

    Survival Time: ( ) exp

    lnInverse Prob Transform:

    1.5, 0.001

    S U t

    Ut

    = =

    =

    = =

    Weibull Survival

    R

  • 7/28/2019 14_RandomNumber

    50/140

    > r=matrix(runif(10000), 10000,1)

    > wrand=(-log(1-r)/.001)^(1/1.5)

    > hist(wrand, freq = FALSE, main = paste("Histogram of

    Survival Times"), breaks=50, xlab = "Survival Time")

    R

  • 7/28/2019 14_RandomNumber

    51/140

    R

  • 7/28/2019 14_RandomNumber

    52/140

    proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

    time WEIBRAND * CENS (0);

    run; quit;

    goptions reset=all device=WIN;

    data work._surv; set work._surv;

    if survival > 0 then _lsurv = -log(survival);

    if _lsurv > 0 then _llsurv = log(_lsurv);

    run;

    ** Survival plots **;

    goptions reset=symbol;

    goptions ftext=SWISS ctext=BLACK htext=1 cells;

    proc gplot data=work._surv ;

    label weibrand = 'Survival Time'; SAS

  • 7/28/2019 14_RandomNumber

    53/140

    SAS

  • 7/28/2019 14_RandomNumber

    54/140

    > r=matrix(runif(1000), 1000,1)

    > wrand=(-log(1-r)/.001)^(1/1.5)

    > event=wrand wrand2=wrand*(event)+200*(1-event)

    > fit plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")>

    R

  • 7/28/2019 14_RandomNumber

    55/140

    R

  • 7/28/2019 14_RandomNumber

    56/140

    ( )

    ( )

    1

    0

    1

    Survival Time: ( ) exp , exp( )

    lnInverse Prob Transform:

    1.5, exp( )

    ln(0.001) 2.30

    ( ) 0.5, 0.69

    S P t

    Pt

    HR Drug

    = = =

    =

    = =

    = =

    = =

    x

    x

    Simulating Weibull Regression Data, with

    Proportional Hazards

    SAS

  • 7/28/2019 14_RandomNumber

    57/140

    proc iml; /* begin IML session */

    u = j(400,1,.);

    d = j(200,1,0) // j(200,1,1);

    call randgen(u,'uniform');

    wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);c = wrand

  • 7/28/2019 14_RandomNumber

    58/140

    options pageno=1;

    proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

    time WEIBRAND * CENS (0); strata TREAT;

    run; quit;

    goptions reset=all device=WIN;

    data work._surv; set work._surv;

    if survival > 0 then _lsurv = -log(survival);

    if _lsurv > 0 then _llsurv = log(_lsurv);

    run;

    ** Survival plots **;

    title;

    footnote;

    goptions reset=symbol;

    goptions ftext=SWISS ctext=BLACK htext=1 cells;

    proc gplot data=work._surv ;

    label weibrand = 'Survival Time';

    axis2 minor=none major=(number=6)

    label=(angle=90 'Survival Distribution Function');

    symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1; SAS

  • 7/28/2019 14_RandomNumber

    59/140

    SAS

  • 7/28/2019 14_RandomNumber

    60/140

    Testing Global Null Hypothesis: BETA=0

    Test Chi-Square DF Pr > ChiSq

    Likelihood Ratio 202.5356 1

  • 7/28/2019 14_RandomNumber

    61/140

    ( )

    ( )

    1

    0

    1

    Survival Time: ( ) exp , exp( )

    lnInverse Prob Transform:

    1.5, exp( )

    ln(0.001) 2.30

    ( ) 0.5, 0.69

    S P t

    Pt

    HR Drug

    = = =

    =

    = =

    = =

    = =

    x

    x

    Simulating Weibull Regression Data, with

    Proportional Hazards

    R

  • 7/28/2019 14_RandomNumber

    62/140

    r=matrix(runif(400), 400,1)

    drug=rbind(matrix(1,200,1),matrix(0,200,1))

    wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)

    event = wrand

  • 7/28/2019 14_RandomNumber

    63/140

    R

    Package eha

  • 7/28/2019 14_RandomNumber

    64/140

    Call:phreg(formula = Surv(enter, wrand, event) ~ drug)

    Covariate W.mean Coef Exp(Coef) se(Coef) Wald p

    drug 0.586 -0.731 0.481 0.113 0.000

    log(scale) 4.663 105.903 0.050 0.000

    log(shape) 0.402 1.495 0.047 0.000

    Events 327

    Total time at risk 44359

    Max. log. likelihood -1886.9

    LR test statistic 42.4

    Degrees of freedom 1

    Overall p-value 7.38224e-11

    > enter=matrix(0,400,1)

    > fit fit

    > plot.phreg(fn="sur)

    R

    Package eha

  • 7/28/2019 14_RandomNumber

    65/140

    R

  • 7/28/2019 14_RandomNumber

    66/140

    Generating Numbers from

    Specific Distributions Rejection Method

    Fast

    Good for Count Models Good when you cannot find F-1 , but have

    f(x)

    Generally Use Pairs of Random Numbers Just like playing the game Battleship

    The Rejection Method is Like Playing the Game Battleship

  • 7/28/2019 14_RandomNumber

    67/140

    The Rejection Method is Like Playing the Game Battleship

  • 7/28/2019 14_RandomNumber

    68/140

    Rejection

    Choose pairs of uniform random

    numbers

    xU betweenXmin andXmax yU between Ymin and Ymax

    RejectxU ifyU > f(x) at xU

    Rejection

  • 7/28/2019 14_RandomNumber

    69/140

    Rejection

    Xmin Xmax

    Ymax

    Ymin

    f(x) Hit

    MissMiss

    Sample the area (two dimensions) containing theprobability distribution or density function uniformly.

  • 7/28/2019 14_RandomNumber

    70/140

    Rejection

    Simple version becomes inefficient if

    the rejection area is large.

    Large Dead Zone

    g(x)

    Bi i l C t

  • 7/28/2019 14_RandomNumber

    71/140

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    0

    0.05

    0.1

    0.15

    0.2

    0.25239 In, 761 Rejected

    Binomial Count:p=0.2

    Trials=20

    Matlab

    Rejection

  • 7/28/2019 14_RandomNumber

    72/140

    Rejection

    Can be made more efficient by uniform

    sampling over a smaller target area.

    g(x)

    Smaller

    Dead Zone

    The trick is to sample uniformly overthe smaller area.

    Rejection

  • 7/28/2019 14_RandomNumber

    73/140

    Rejection

    Can be made more efficient by uniform

    sampling over a smaller target area.

    g(x)f(x)

    First, define "dominating function" ( ),

    and corresponding integral or Cumulative

    Distribution ( ).( ) need not be normalized.

    f x

    F xF x

    Smaller

    Dead Zone

    Rejection

  • 7/28/2019 14_RandomNumber

    74/140

    Rejection

    Can be made more efficient by uniform

    sampling over a smaller target area.

    g(x)f(x)

    Smaller

    Dead Zone

    2

    2

    ( ) , 0

    ( )2

    ( )2

    ab

    ab

    Max

    f x a bx x

    bF x ax x

    aF

    b

    ab

    x

    =

    =

    =

    =

    R j ti

  • 7/28/2019 14_RandomNumber

    75/140

    Rejection

    ChoosexU based on inverse transform of the

    integrated dominance function (F(x)).

    Choose a uniform random numberU1 in the range:

    Calculatexby setting F(x)=U, and solving (the quadratic) forx.

    g(x)f(x)

    xU

    2

    0 12

    aU

    b

    Rejection

  • 7/28/2019 14_RandomNumber

    76/140

    Rejection

    Evaluate f(x), choose a second uniform

    random numberU2between 0 and f(x). Reject ifU2>f(x)

    g(x)f(x)

    xU

  • 7/28/2019 14_RandomNumber

    77/140

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    0

    0.05

    0.1

    0.15

    0.2

    0.25315 In, 685 Rejected

  • 7/28/2019 14_RandomNumber

    78/140

    Weibull Function?

    ( )

    ( )

    ( )( )

    1

    11

    ( )

    ( ) exp 2

    ( ) 1 exp

    ln , 6.5, 1.8u

    x xf x times

    xF x

    F x u

    =

    =

    = = = =

  • 7/28/2019 14_RandomNumber

    79/140

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    -5 0 5 10 15 20 250

    0.05

    0.1

    0.15

    0.2

    0.25

    Count

    Frequency

    0

    0.05

    0.1

    0.15

    0.2

    0.25574 In, 426 Rejected

    Binomial Distribution

  • 7/28/2019 14_RandomNumber

    80/140

    (Bernoulli Trials are the simplestexample of the rejection method.)

    Probability Pr(X=1): p

    >>proc iml; /* begin IML session */

  • 7/28/2019 14_RandomNumber

    81/140

    r = j(10,1,.);

    call randgen(r,'uniform');

    b=r>0.5;

    print b;quit;

    b

    1

    0

    1

    1

    0

    0

    0

    00

    1

    SAS

  • 7/28/2019 14_RandomNumber

    82/140

    > r=matrix(runif(10), 10,1)

    > b=r cbind(r,b)

    [,1] [,2]

    [1,] 0.4919652 1

    [2,] 0.5088624 0

    [3,] 0.5955355 0

    [4,] 0.5243394 0

    [5,] 0.5923056 0[6,] 0.1610980 1

    [7,] 0.9663659 0

    [8,] 0.2548106 1

    [9,] 0.4582953 1

    [10,] 0.1170421 1>

    R

    But then you never have just one

  • 7/28/2019 14_RandomNumber

    83/140

    Simulating Outcomes from a

    Logistic Model

    But then, you never have just onevalue of p for your Bernoulli Trials

    Placebo Controlled Drug Trial 25% Success for Placebo

    Odds Ratio of 2.0 for Treatment

    Two different success probabilities,

    based on logistic model

  • 7/28/2019 14_RandomNumber

    84/140

    Logistic Model

    ( )

    ( )

    ( ) ( )( )

    0

    0

    0

    expPlacebo: 0.25 , 1.0986

    1 exp

    Drug (0,1): OR=2.0, ln(OR)=0.6931

    exp 1.0986 0.6931*CDF

    1 exp 1.0986 0.6931*

    DrugSuccess

    Drug

    = =

    +

    +=+ +

    proc iml; /* begin IML session */

    j(400 1 )

  • 7/28/2019 14_RandomNumber

    85/140

    u = j(400,1,.);

    d = j(400,1,1)||(j(200,1,0) // j(200,1,1));

    bta= {-1.0986 , 0.6931};

    call randgen(u,'uniform');

    expit=exp(d*bta)/(1+exp(d*bta));outcome=u

  • 7/28/2019 14_RandomNumber

    86/140

    SAS

    The LOGISTIC Procedure

    Analysis of Maximum Likelihood Estimates

    Standard Wald

    Parameter DF Estimate Error Chi-Square Pr > ChiSq

    Intercept 1 -1.2367 0.1693 53.3383

  • 7/28/2019 14_RandomNumber

    87/140

    r=matrix(runif(400), 400,1)

    drug=rbind(matrix(0,200,1),matrix(1,200,1))

    d=cbind(matrix(1,400,1), drug)

    parms=matrix(c(-1.0986 , 0.6931),2,1)

    expit=exp(d%*%parms)/(1+exp(d%*%parms))

    outcome=r drugtrial

  • 7/28/2019 14_RandomNumber

    88/140

    R

    > drugtrial< glm(outcome drug, family binomial(link logit ))

    > summary(drugtrial)

    Call:

    glm(formula = outcome ~ drug, family = binomial(link = "logit"))

    Deviance Residuals:

    Min 1Q Median 3Q Max

    -1.036 -1.036 -0.776 1.326 1.641

    Coefficients:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***

    drug 0.7026 0.2158 3.255 0.00113 **

    ---

    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    (Dispersion parameter for binomial family taken to be 1)

    Null deviance: 511.49 on 399 degrees of freedom

    Residual deviance: 500.67 on 398 degrees of freedomAIC: 504.67

    Number of Fisher Scoring iterations: 4

    Normally Distributed Random Numbers

  • 7/28/2019 14_RandomNumber

    89/140

    Normally Distributed Random Numbers

    Inverse transform methods inefficient

    for normal random numbers

    Box-Muller Transform

    z transformation of two random uniform

    variates [X1,X2~U(0,1)] Random radius, random

    1 2

    1 1 2

    2 1 2

    Get two z variates from two uniform

    random numbers, and :

    cos( ) 2 ln( ) cos(2 )

    sin( ) 2 ln( ) sin(2 )

    X X

    z r X X

    z r X X

    = =

    = =

  • 7/28/2019 14_RandomNumber

    90/140

    Normally Distributed Random Numbers

    Box-Muller Transform X1, X2, specify a position within the unit circle Random angle, random radius

    Would be more efficient if it did not make calls totrigonometric functions.

    Marsaglia Method Places the Unit Circle within a square, -1 to +1,

    and samples the square uniformly. Rejects draws that fall outside the circle.

    But it avoids calls to trig functions.2 2

    1 2

    1 1 2 2

    1

    2ln( ) 2 ln( ),

    s X X

    s sz X z X

    s s

    = +

    = =

    G ti N b f

  • 7/28/2019 14_RandomNumber

    91/140

    Generating Numbers from

    Specific Distributions Normal, Using CLT (quick & dirty)

    Sum several iterations ofu

    Standardize Recall that Var(u)=1/12

    12

    16i

    iX u

    ==

  • 7/28/2019 14_RandomNumber

    92/140

    Correlated Multivariate Random Numbers

    Simulating panel data, repeated

    measures

    Mixture distributions

  • 7/28/2019 14_RandomNumber

    93/140

    Generating Multivariate Normal

    Random Numbers

    Desired Covariance Matrix

    , is the Cholesky Decomposition of

    Begin with independent standard normal RVs (0,1)

    Correlated (Multivariate) Normal RVs:

    N

    +

    V

    V = R'R R V

    Z

    X = R'Z

    :

    Generating Multivariate Normal

  • 7/28/2019 14_RandomNumber

    94/140

    Random Numbers

    proc iml; /* begin IML session */

    rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};

    sigvec={53 36 12 47};

    cvmat=rmat#(sigvec`*sigvec);

    upr=half(cvmat);

    print rmat;

    print sigvec;

    print cvmat;

    print upr;

    r1 = j(1000,4,.);

    r2 = j(1000,4,.);

    call randgen(r1,'uniform');

    call randgen(r2,'uniform');

    pi= 4*atan(1);

    print pi;

    /* Lets be wasteful */

    z1=sqrt(-2*log(r1))#cos(2*pi*r2);

    z1=z1*upr;

    varnames={"x1","x2","x3","x4"};

    create nrand from z1 [colname=varnames];

    append from z1;

    quit;

    proc corr data=work.nrand pearson;

    var x1 x2 x3 x4;

    run;

    SAS

    rmat

    1 0 3 0 2 0 1

  • 7/28/2019 14_RandomNumber

    95/140

    1 0.3 0.2 0.1

    0.3 1 0.3 0.2

    0.2 0.3 1 0.3

    0.1 0.2 0.3 1

    sigvec

    53 36 12 47

    cvmat

    2809 572.4 127.2 249.1

    572.4 1296 129.6 338.4

    127.2 129.6 144 169.2

    249.1 338.4 169.2 2209

    upr

    53 10.8 2.4 4.7

    0 34.341811 3.0190603 8.3757958

    0 0 11.36333 11.672016

    0 0 0 44.503035

    SAS

    The CORR Procedure

  • 7/28/2019 14_RandomNumber

    96/140

    4 Variables: x1 x2 x3 x4

    Simple Statistics

    Variable N Mean Std Dev Sum Minimum Maximum

    x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299

    x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335

    x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908

    x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509

    Pearson Correlation Coefficients, N = 1000

    Prob > |r| under H0: Rho=0

    x1 x2 x3 x4

    x1 1.00000 0.30338 0.20341 0.11397

  • 7/28/2019 14_RandomNumber

    97/140

    Random Numbers

    cmat cmat

    [,1] [,2] [,3] [,4]

    [1,] 1.0 0.3 0.2 0.1

    [2,] 0.3 1.0 0.3 0.2

    [3,] 0.2 0.3 1.0 0.3

    [4,] 0.1 0.2 0.3 1.0

    > vv

    [,1] [,2] [,3] [,4]

    [1,] 2809.0 572.4 127.2 249.1

    [2,] 572.4 1296.0 129.6 338.4

    [3,] 127.2 129.6 144.0 169.2

    [4,] 249.1 338.4 169.2 2209.0

    > rr

  • 7/28/2019 14_RandomNumber

    98/140

    R

    [,1] [,2] [,3] [,4]

    [1,] 53 10.80000 2.400000 4.700000

    [2,] 0 34.34181 3.019060 8.375796

    [3,] 0 0.00000 11.363330 11.672016

    [4,] 0 0.00000 0.000000 44.503035

    > cov(rvs)

    [,1] [,2] [,3] [,4]

    [1,] 2832.4200 561.2585 134.0656 533.7351

    [2,] 561.2585 1235.7616 124.2373 382.5441[3,] 134.0656 124.2373 127.4132 160.2173

    [4,] 533.7351 382.5441 160.2173 2205.5903

    > cor(rvs)

    [,1] [,2] [,3] [,4]

    [1,] 1.0000000 0.2999969 0.2231676 0.2135426

    [2,] 0.2999969 1.0000000 0.3130961 0.2317137

    [3,] 0.2231676 0.3130961 1.0000000 0.3022317

    [4,] 0.2135426 0.2317137 0.3022317 1.0000000

    > sd(rvs)

    [1] 53.22048 35.15340 11.28774 46.96371

  • 7/28/2019 14_RandomNumber

    99/140

    Subject-specific Random Effects

    We have an error term (eij) for

    measurement j in subject i.

    We also have a subject specific randomeffect (ki)

    For the subject in the measurement:th th

    ij ij i

    i j

    y x e k= + +

    Recipe for Subject specific Random Effects

  • 7/28/2019 14_RandomNumber

    100/140

    Recipe for Subject-specific Random Effects

    Create subjects for study NAssign treatment, covariates

    Give each subject a random effect

    Drawn from, say, N(0,V) Generate predicted values based on

    regression + random effects

    Generate outcomes for each repeated

    measure from specific distribution

    Logistic Model

  • 7/28/2019 14_RandomNumber

    101/140

    Logistic Model

    ( )( )

    ( )( )

    0

    0

    0

    i

    expPlacebo: 0.25 , 1.0986

    1 exp

    Drug: OR=2.0, ln(OR)=0.6931

    Time (0,1,2): OR 1.5, ln(OR)=0.4055

    K N(0,1)

    exp 1.0986 0.6931* 0.4055*CDF

    1 exp 1.0986 0.6931* 0.4055*

    iDrug Time KSuccess

    Drug Time

    = =

    +

    + + +=

    + + +

    :

    ( )iK+

    proc iml; /* begin IML session */

    u = j(600,1,.);d

    12

    1

    btabta

    id

  • 7/28/2019 14_RandomNumber

    102/140

    u j(600,1,.);

    d1=j(100,1,0)//j(100,1,1);

    d1=d1//d1//d1;

    id=j(200,1,0);

    do i=1 to 200 by 1;

    id[i,1]=i;

    end;

    id=id//id//id;

    t=j(200,1,0)//j(200,1,1)//j(200,1,2);

    k=j(200,1,.);

    call randgen(k,'normal');

    k=k//k//k;

    bta= {-1.0986 , 0.6931,.4055,1};d = j(600,1,1)||d1||t||k;

    call randgen(u,'uniform');

    expit=exp(d*bta)/(1+exp(d*bta));

    y=u

  • 7/28/2019 14_RandomNumber

    103/140

    Random effects logistic regression Number of obs 600Group variable: id Number of groups = 200

    Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3

    Wald chi2(2) = 11.07Log likelihood = -394.754 Prob > chi2 = 0.0040

    ------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047

    t | .235297 .1123071 2.10 0.036 .015179 .4554149_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642

    -------------+----------------------------------------------------------------/lnsig2u | -.1281394 .3971684 -.9065751 .6502964

    -------------+----------------------------------------------------------------sigma_u | .9379396 .18626 .6355353 1.384236

    rho | .2109869 .0661172 .1093476 .3680594------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000

    . di -394.754*(-2)789.508

    Stata, SAS data

    Finally got this to run in SAS. I had forgotten that SAS requires you to sort.

    Stata does not require sorted data for their mixed models.

  • 7/28/2019 14_RandomNumber

    104/140

    proc sort data=erand;

    by id t;

    run;

    proc nlmixed data=erand qpoints=5 ;

    parms b0=0 b1=-.7 b2=.6 sig=0 ;

    theta2 = b0+b1*treat+b2*t+u;

    prb= exp(theta2)/(1+exp(theta2));

    model outcome ~ binary(prb);

    random u ~normal(0,sig) subject=id ;

    run;

    SAS

    Th NLMIXED P d

  • 7/28/2019 14_RandomNumber

    105/140

    The NLMIXED Procedure

    Fit Statistics

    -2 Log Likelihood 789.5

    AIC (smaller is better) 797.5AICC (smaller is better) 797.6

    BIC (smaller is better) 810.7

    Parameter Estimates

    Standard

    Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

    b0 -0.9332 0.2039 199 -4.58

  • 7/28/2019 14_RandomNumber

    106/140

    q

    k1=matrix(runif(200), 200,1)

    k2=matrix(runif(200), 200,1)

    k=sqrt(-2*log(k1))*cos(2*pi*k2)

    id=rbind(id,id,id)k=rbind(k,k,k)

    drug =rbind(matrix(0,100,1), matrix(1,100,1))

    drug=rbind(drug,drug,drug)

    d=cbind(matrix(1,600,1),drug,k)

    parms=matrix(c(-1.0986 , 0.6931,1),3,1)

    expit=exp(d%*%parms)/(1+exp(d%*%parms))

    outcome=r

  • 7/28/2019 14_RandomNumber

    107/140

    g gGroup variable: id Number of groups = 200

    Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3

    Wald chi2(1) = 14.55Log likelihood = -368.112 Prob > chi2 = 0.0001

    ------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    -------------+----------------------------------------------------------------drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085

    -------------+----------------------------------------------------------------/lnsig2u | -.6325958 .5624138 -1.734907 .469715

    -------------+----------------------------------------------------------------sigma_u | .7288423 .2049555 .4200198 1.264729

    rho | .1390212 .0673177 .050895 .3271436------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012

    R

    Got this to run in R, using a mixed effects package called Zelig.

    z.out1

  • 7/28/2019 14_RandomNumber

    108/140

    g g g g

    Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic

    model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical

    Software," http://gking.harvard.edu/zelig

    summary(z.out1)

    Generalized linear mixed model fit by the Laplace approximation

    Formula: outcome ~ drug + tag(1 | id)

    AIC BIC logLik deviance

    743.4 756.6 -368.7 737.4

    Random effects:

    Groups Name Variance Std.Dev.id (Intercept) 0.39486 0.62838

    Number of obs: 600, groups: id, 200

    Fixed effects:

    Estimate Std. Error z value Pr(>|z|)

    (Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***

    drug 0.8174 0.2023 4.041 5.32e-05 ***

    ---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

    Correlation of Fixed Effects:

    (Intr)

    drug -0.747

    R

    General Approach to Correlated

    http://gking.harvard.edu/zelighttp://gking.harvard.edu/zelig
  • 7/28/2019 14_RandomNumber

    109/140

    General Approach to CorrelatedMultivariate Random Numbers

    Copulas allow us to draw correlated random

    numbers from different distributions

    Random effects in Mixture Models They use CDF probabilities of correlated

    variables on the inside to map tocorrelated uniform random numbers on

    the margins Those correlated uniform RVs may be used

    to marry vastly different distributions.

    Maintain Marginal Distributions

    Generating Multivariate Random

  • 7/28/2019 14_RandomNumber

    110/140

    Generating Multivariate Random

    Numbers

    From SAS documentation, a Gaussian Copula

    Independent Normal (N(0,1) ) random variables are

    generated

    These variables are transformed to a correlated set ofz-scores by using the Cholesky Decomposition of the

    covariance matrix.

    These correlated normal RVs are transformed to a

    uniform by using (z).

    F-1

    () is used to compute the final sample value

    Generating Multivariate Random

  • 7/28/2019 14_RandomNumber

    111/140

    Generating Multivariate Random

    Numbers

    proc iml;/* begin IML session */

    rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};

    sigvec={1 1 1 1};

    cvmat=rmat#(sigvec`*sigvec);

    /* # is element-wise multiplication */

    upr=half(cvmat);

    print rmat;print sigvec;

    print cvmat;

    print upr;

    r1 = j(1000,4,.);

    r2 = j(1000,4,.);

    call randgen(r1,'uniform');

    call randgen(r2,'uniform');pi= 4*atan(1);

    print pi;

    z1=sqrt(-2*log(r1))#cos(2*pi*r2);

    /* Note I could have gotten another z here */

    z1=z1*upr;

    z1=cdf('Normal',z1);

    z1=gaminv(z1,3.0);

    /* Standardized gamma parameter, also the

    mean */

    varnames={"x1","x2","x3","x4"};

    create nrand from z1 [colname=varnames];

    append from z1;

    quit;

    proc corr data=work.nrand pearson;var x1 x2 x3 x4;

    run;

    SAS

  • 7/28/2019 14_RandomNumber

    112/140

    rmat

    1 0.3 0.2 0.1

    0.3 1 0.3 0.2

    0.2 0.3 1 0.3

    0.1 0.2 0.3 1

    sigvec

    1 1 1 1

    cvmat

    1 0.3 0.2 0.1

    0.3 1 0.3 0.20.2 0.3 1 0.3

    0.1 0.2 0.3 1

    SAS

    The CORR Procedure

  • 7/28/2019 14_RandomNumber

    113/140

    4 Variables: x1 x2 x3 x4

    Simple Statistics

    Variable N Mean Std Dev Sum Minimum Maximum

    x1 1000 2.96320 1.73566 2963 0.11528 12.19072

    x2 1000 3.01249 1.68236 3012 0.14039 10.20117

    x3 1000 3.00336 1.68803 3003 0.34496 13.72023

    x4 1000 3.08106 1.79858 3081 0.11148 13.25409

    Pearson Correlation Coefficients, N = 1000

    Prob > |r| under H0: Rho=0

    x1 x2 x3 x4

    x1 1.00000 0.25874 0.19005 0.10052

  • 7/28/2019 14_RandomNumber

    114/140

    Generating Multivariate Random

    Numberscmat U = copularnd('Gaussian',.4,10)>> X = norminv(U,0,1);

  • 7/28/2019 14_RandomNumber

    115/140

    U =

    0.8017 0.9388

    0.3650 0.22500.8104 0.6253

    0.3467 0.0988

    0.6067 0.6561

    0.4743 0.6723

    0.6273 0.7427

    0.9905 0.8249

    0.4427 0.6925

    0.3443 0.2711

    >> U = copularnd('Gaussian',.4,10000);

    >> corr(U)

    ans =

    1.0000 0.37650.3765 1.0000

    o (U,0, );

    >> corr(X)

    ans =

    1.0000 0.3901

    0.3901 1.0000

    >> Xg = gaminv(U,2,3);

    >> corr(Xg)

    ans =

    1.0000 0.3721

    0.3721 1.0000

    >>

    Matlab(a little more clear)

  • 7/28/2019 14_RandomNumber

    116/140

  • 7/28/2019 14_RandomNumber

    117/140

    O S

  • 7/28/2019 14_RandomNumber

    118/140

    Old Slides

    Grabbing a Seed from theS t Cl k (St t )

  • 7/28/2019 14_RandomNumber

    119/140

    program define seedset

    local ct =c(current_time)

    local s1=substr("`ct'",7,2)

    local s2=substr("`ct'",4,2)

    local s3=substr("`ct'",2,1)

    global newseed=real("`s1'" +"`s2'" +"`s3'")

    di $newseed

    set seed $newseed

    end

    System Clock (Stata)

    LCG is default for Stata

  • 7/28/2019 14_RandomNumber

    120/140

    . set obs 100obs was 0, now 100

    . gen x0=ceil(uniform()*100)

    . gen m=ceil(uniform()*10)

    . gen x1=mod(x0,m)

    . list in 1/10

    +------------+| x0 m x1 ||------------|

    1. | 70 2 0 |2. | 62 7 6 |3. | 92 7 1 |4. | 53 1 0 |5. | 37 3 1 |

    |------------|

    6. | 78 1 0 |7. | 47 2 1 |8. | 91 2 1 |9. | 98 1 0 |10. | 71 9 8 |

    +------------+

    Testing Randomness (Stata)

  • 7/28/2019 14_RandomNumber

    121/140

    Correlogram ofXi versusXi+k for serial

    correlation . gen tv=_n. tsset tv

    time variable: tv, 1 to 20000delta: 1 unit

    . corrgram x, lags(40)

    -1 0 1 -1 0

    1LAG AC PAC Q Prob>Q [Autocorrelation] [PartialAutocor]-------------------------------------------------------------------------------1 0.0026 0.0026 .13551 0.7128 | |

    2 -0.0011 -0.0011 .15995 0.9231 | |

    3 -0.0004 -0.0004 .16301 0.9833 | |

    4 -0.0131 -0.0131 3.5987 0.4630 | |

    5 -0.0008 -0.0007 3.6115 0.6066 | |

    6 0.0119 0.0118 6.4238 0.3774 | |

    7 -0.0060 -0.0061 7.1533 0.4131 | |

    8 0.0004 0.0003 7.1571 0.5198 | |

    9 0.0057 0.0057 7.815 0.5529 | |

    d t

    (Stata)

  • 7/28/2019 14_RandomNumber

    122/140

    . seedset23491

    . set obs 2000obs was 0, now 2000

    . gen P=uniform()

    . gen enum=-ln(P)/.04

    . ci

    Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------

    P | 2000 .5030619 .0065298 .4902559 .5158678enum | 2000 24.79822 .553316 23.71309 25.88336

    . set obs 200obs was 0, now 200 (Stata)

  • 7/28/2019 14_RandomNumber

    123/140

    . gen P=uniform()

    . gen tte=(-ln(P)/0.1)^1.5

    . gen fail=1

    . replace fail=0 if tte>200

    . replace tte=200 if tte>200

    . stset tte, fail(fail)

    failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure

    -------------------------------------------------------------------200 total obs.0 exclusions

    -------------------------------------------------------------------200 obs. remaining, representing188 failures in single record/single failure data

    9019.163 total analysis time at risk, at risk from t = 0earliest observed entry t = 0

    last observed exit t = 200

    (Stata)

  • 7/28/2019 14_RandomNumber

    124/140

    0.0

    0

    0.2

    5

    0

    .50

    0.7

    5

    1.0

    0

    0 50 100 150 200

    analysis time

    Kaplan-Meier survival estimate

    . streg, d(w) nohr(Stata)

  • 7/28/2019 14_RandomNumber

    125/140

    failure _d: failanalysis time _t: tte

    Weibull regression -- log relative-hazard form

    No. of subjects = 200 Number of obs = 200No. of failures = 188Time at risk = 9019.163067

    LR chi2(0) = 0.00Log likelihood = -403.39593 Prob > chi2 = .

    --------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]

    -------------+------------------------------------------------------_cons | -2.245 0.167 -13.47 0.000 -2.572 -1.918

    -------------+------------------------------------------------------delta | 0.625 0.036 0.558 0.701

    --------------------------------------------------------------------

    .

    . gen P=uniform()

    . gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5(Stata)

  • 7/28/2019 14_RandomNumber

    126/140

    . gen fail=1

    . replace fail=0 if tte>200

    (39 real changes made)

    . replace tte=200 if tte>200(39 real changes made)

    . stset tte, fail(fail)

    failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure

    ------------------------------------------------------------------------------400 total obs.0 exclusions

    ------------------------------------------------------------------------------

    400 obs. remaining, representing361 failures in single record/single failure data

    25170.04 total analysis time at risk, at risk from t = 0earliest observed entry t = 0

    last observed exit t = 200

    .

    . list drug P tte fail in 1/15

    (Stata)

  • 7/28/2019 14_RandomNumber

    127/140

    +-----------------------------------+| drug P tte fail |

    |-----------------------------------|1. | 1 .842721 6.331312 1 |2. | 0 .3839878 29.6119 1 |3. | 1 .3483792 96.8484 1 |4. | 0 .6035132 11.34804 1 |5. | 1 .8460417 6.114305 1 |

    |-----------------------------------|6. | 1 .4935982 53.06192 1 |

    7. | 1 .5173908 47.84433 1 |8. | 1 .385052 83.39208 1 |9. | 0 .8726683 1.589515 1 |10. | 0 .0356283 192.5611 1 |

    |-----------------------------------|11. | 0 .8018837 3.280757 1 |12. | 0 .6059877 11.21039 1 |

    13. | 1 .7919235 10.07838 1 |14. | 0 .1920578 67.02081 1 |15. | 0 .0819428 125.1301 1 |

    +-----------------------------------+

    0Kaplan-Meier survival estimates

    (Stata)

  • 7/28/2019 14_RandomNumber

    128/140

    0.0

    0

    0.2

    5

    0.5

    0

    0.7

    5

    1.0

    0

    0 50 100 150 200analysis time

    drug = 0 drug = 1

    . streg drug, d(w) nohr

    f il d f il

    (Stata)

  • 7/28/2019 14_RandomNumber

    129/140

    failure _d: failanalysis time _t: tte

    Weibull regression -- log relative-hazard form

    No. of subjects = 400 Number of obs = 400No. of failures = 361Time at risk = 25170.03819

    LR chi2(1) = 53.70Log likelihood = -757.69677 Prob > chi2 = 0.0000

    --------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]

    -------------+------------------------------------------------------drug | -0.788 0.107 -7.34 0.000 -0.998 -0.577_cons | -2.458 0.143 -17.19 0.000 -2.738 -2.177

    -------------+------------------------------------------------------

    delta | 0.706 0.031 0.648 0.768--------------------------------------------------------------------

    .

    Generating Multivariate Normal

    (Stata)

  • 7/28/2019 14_RandomNumber

    130/140

    Ge e at g u t a ate o a

    Random Numbers

    In Stata , gennorm (webseek to download):

    Typing

    . gennorm a b c, corr(.2 .3 .4)

    creates a, b, and c with value draw from a N(0,S) distribution where

    +- -+

    | 1 |

    S = | .2 1 |

    | .3 .4 1 |

    +- -+

    That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4

    CONTINUED NEXT PAGE

    Generating Multivariate Normal

    (Stata)

  • 7/28/2019 14_RandomNumber

    131/140

    g

    Random Numbers

    In Stata:

    Example-------

    . set obs 10000

    obs was 0, now 10000

    . set seed 6819

    . gennorm a b c, corr(.2 .3 .4)

    . summarize a b c

    Variable | Obs Mean Std. Dev. Min Max

    -------------+----------------------------------------------------------------------------

    a | 10000 -.0105333 1.005723 -3.694448 3.775433

    b | 10000 -.0042212 1.000254 -3.695302 3.648826

    c | 10000 -.0069625 .9989002 -3.996779 3.606923

    . corr a b c(obs=10000)

    | a b c

    -------------+---------------------------------

    a | 1.0000

    b | 0.2137 1.0000

    c | 0.3035 0.3952 1.0000

    Generating Multivariate Normal(Stata)

  • 7/28/2019 14_RandomNumber

    132/140

    g

    Random Numbers

    In Stata, drawnorm:

    . clear

    . matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)

    . drawnorm a b c, n(10000) corr(C)

    (obs 10000)

    . summarize a b c

    Variable | Obs Mean Std. Dev. Min Max

    -------------+--------------------------------------------------------

    a | 10000 -.0176275 .9920181 -3.701594 3.7838

    b | 10000 .0009005 1.003002 -3.709259 3.518793

    c | 10000 -.0149926 .9925292 -3.716346 4.009713

    . corr a b c

    (obs=10000)

    | a b c

    -------------+---------------------------

    a | 1.0000

    b | 0.1937 1.0000

    c | 0.3051 0.4056 1.0000

    Simulating Weibull Regression Data, with

    Ti D d i D Eff t

  • 7/28/2019 14_RandomNumber

    133/140

    ( )Survival Time: ( ) exp , exp( )

    -2.30 0.3* - 0.08* *

    Inverse Prob Transform: ?????????

    How do you solve for ? (Not all answers are in the book.)

    S P t

    x drug drug t

    t

    = = =

    = +

    x

    Time-Dependency in Drug Effect

    Remember Newtons Method?

  • 7/28/2019 14_RandomNumber

    134/140

    ( )*

    01 0

    0

    ( ) exp exp(-2.30 0.3* - 0.004* * )( ) 0

    ( ) ( )( )

    ( )

    ( )

    f t drug drug t t Pf t

    f t d f tf t

    d

    f tt t

    f t

    = + =

    +

    =

    t0 t1

    clearset obs 400gen drug=_n>200

    (Stata)

  • 7/28/2019 14_RandomNumber

    135/140

    gen double P=uniform()gen double t=1

    gen double tpd=t+.0001gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pgen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pgen double slope=(fp-f)/0.0001

    forvalues i=1/50 {

    qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pqui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pqui replace slope=(fp-f)/0.0001

    qui replace t=t-f/slopequi replace tpd=t+.0001

    }

    MatlabMatlab

  • 7/28/2019 14_RandomNumber

    136/140

    >> drug=[zeros(1000,1);ones(1000,1)];

    >> P=rand(2000,1);

    >> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));

    >> outcome=P> b = glmfit(drug,outcome,'binomial')

    b =

    -1.0616

    0.6562

    00

    Kaplan-Meier survival estimates

    (Stata)

  • 7/28/2019 14_RandomNumber

    137/140

    0

    .00

    0.2

    5

    0.5

    0

    0.7

    5

    1.0

    0 50 100 150 200analysis time

    drug = 0 drug = 1

    . gen P=uniform()

    . gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-

    (Stata)

  • 7/28/2019 14_RandomNumber

    138/140

    1.0986+0.6931*drug))

    . list in 1/10

    +----------------------------+| drug P cdf0 ||----------------------------|

    1. | 0 .2865897 .2500023 |

    2. | 0 .3788754 .2500023 |3. | 1 .3597057 .3999916 |4. | 1 .7182508 .3999916 |5. | 1 .4315197 .3999916 |

    |----------------------------|6. | 1 .2963237 .3999916 |

    7. | 1 .7961193 .3999916 |8. | 0 .056983 .2500023 |9. | 0 .4622037 .2500023 |

    10. | 0 .5336403 .2500023 |+----------------------------+

    . gen outcome=P

  • 7/28/2019 14_RandomNumber

    139/140

    . list in 1/10

    +--------------------------------------+| drug P cdf0 outcome ||--------------------------------------|

    1. | 0 .2865897 .2500023 0 |2. | 0 .3788754 .2500023 0 |3. | 1 .3597057 .3999916 1 |

    4. | 1 .7182508 .3999916 0 |5. | 1 .4315197 .3999916 0 |

    |--------------------------------------|6. | 1 .2963237 .3999916 1 |7. | 1 .7961193 .3999916 0 |8. | 0 .056983 .2500023 1 |9. | 0 .4622037 .2500023 0 |

    10. | 0 .5336403 .2500023 0 |+--------------------------------------+

    gen outcome=P

  • 7/28/2019 14_RandomNumber

    140/140

    . gen outcome=P chi2 = 0.0000

    Log likelihood = -1245.3138 Pseudo R2 = 0.0230

    --------------------------------------------------------------------outcome | OR SE z P>|z| [95% CI]-------------+------------------------------------------------------

    drug | 2.084 0.202 7.57 0.000 1.723 2.519--------------------------------------------------------------------

    _cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935--------------------------------------------------------------------

    .