14_RandomNumber

7/28/2019 14_RandomNumber

1/140

BSTA 670 Statistical Computing

27 October 2010

Lecture 14:

Random Number Generation

Presented by:Paul Wileyto, Ph.D.


2/140

Anyone who uses software to produce random

numbers is in a state of sin.

John von Neumann

A good analog

random numbergenerator.


3/140

Why do we need Random

Numbers?

Simulation Input

Statistical Sampling Assignment in Trials

Games


4/140

Where do you get random numbers?

Uniform Random Numbers Published Tables

Make Them

Computer Algorithms Harvest from Nature

Random draws from a specificdistribution

Make them from Uniform RandomNumbers


5/140

Software Random Number Generators

There are no true random numbergenerators but

There are Pseudo-Random NumberGenerators

Computers have only a limited number ofbits to represent a number

Sooner or later, the sequence of randomnumbers will repeat itself (period of the

generator) The trick is to be good enough to look likerandom numbers


6/140

Algorithms for Uniform Random Numbers


7/140

Good pseudo-random numbers:

Independent of the previous number

Long period

Sequence reproducible if started with

same initial conditions

Fast


8/140

Good pseudo-random numbers:

Equal probability for any number inside

interval [a,b]

Probability Density:

1,( )

0, ,

a x bf x b a

x a x b

=

< >


9/140

We are interested primarily inuniform random numbers in the

interval [0,1].

Well refer to the realization of a uniform

random number over [0,1] as U. Many of the algorithms produce integer

valued random numbers over interval

[0,b]. Transform to interval [0,1]


10/140

Linear Congruential Generator (LCG)

-1

Most common

( ) mod

= seed, modulus m (large prime),

muliplier , and increment c

Repeats due to the modular

arithmetic that forces wrapping

of values into th

n n

o

X X c m

X

= +

e desired range.

Mod in SAS

proc iml; /* begin IML session */

q={20,30,40,50,70,90,160};

t=mod(q,7);

qt=q||t;

print qt; /* print matrix */

quit;

qt

20 6

30 2

40 5

50 1

70 0

90 6

160 6

SAS


11/140

Linear Congruential Generator (LCG)

-1

Most common

( ) mod

= seed, modulus m (large prime),

muliplier , and increment c

Repeats due to the modular

arithmetic that forces wrapping

of values into th

n n

o

X X c m

X

= +

e desired range.

Mod in R

q


12/140


seed = 123456;c = j(5,1,seed);

b = uniform(c);

print b;

quit;

b

0.73902

0.2724794

0.7095326

0.3191636

0.367853

Unit Random Variates in SAS

SAS


13/140

RNGkind()[1] "Mersenne-Twister" "Inversion"

set.seed(as.integer(format(Sys.time(), "%S%M%H")))

c


14/140


seed = 0;c = j(5,1,seed);

b = uniform(c);

print b;

quit;

b

0.73902

0.2724794

0.7095326

0.3191636

0.367853

Unit Random Variates in SAS

Set seed to 0 to

grab a seed value

from the system

clock.

SAS


15/140

RANUNI() and IML UNIFORM() use a multiplicative

linear congruential generator (from SAS docs) where

SEED = mod( SEED * 397204094, 2**31-1 )

and then returns

SEED / (2**31-1)

SAS


16/140

Testing Randomness

Is it Uniform?

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3x 10

4


17/140

Testing Randomness

Generate two sets

and plot against

each other Might see

correlation in higher

dimensions

PlotXi versusXi+k for

serial correlation

0

.2

.4

.6

.8

1

x

0 .2 .4 .6 .8 1y

0

.2

.4

.6

.8

1

x1

0 .2 .4 .6 .8 1y2


18/140


19/140

Linear Congruential Generator

The good

Fast

Up to period of m random numbers

The Bad

Sequential correlation

Plots in more than 1 dimension do not fill in the

space uniformly, but tend to form bands

Not cryptographically secure

Selections of m, , and c are important


20/140

Linear Congruential Generator

Good magic number for linear

congruent method:

a = 16,807, c = 0, M = 2,147,483,647


21/140

Overflow Method for integers

Multiply two 32-bit numbers to get a 64bit integer, that cannot be represented

in 32-bit space. Low order 32 bits remain after the

overflow.

Divide by 2

32

to get floating point valuesbetween 0 and 1.

Very Fast

1j jI aI c+ = +


22/140

Blum, Blum, Shub

Very slow Not suited to simulation

Passes all tests

Cryptographically secure

( )

2

1 mod , ,where p and q are large primes

n nX X M M pq+= =


23/140

Mersenne Twister

By Matsumoto and Nishimura (1997) Caused a great deal of excitement in

1997.

Good statistical properties Not good for cryptography

SAS IML RANDGEN function

Default technique for R runif()


24/140

Mersenne Twister

Im just going to give you the flavor of it Its a bit shifting algorithm

32 bit word:

0000 1111


25/140

Mersenne Twister

XOR Logical bitwise comparison

function Compares two bits

If they are different, value is1

If they are the same, value iszero

>> a=[0 0 1 1]

a =

0 0 1 1

>> b=[0 1 1 0]

b =

0 1 1 0

>> c=xor(a,b)

c =

0 1 0 1

>> MATLAB


26/140

Mersenne Twister

XOR Logical bitwise comparison

function Compares two bits

If they are different, value is1

If they are the same, value iszero

> a b c c

[1] FALSE TRUE FALSE TRUE

> as.integer(c)

[1] 0 1 0 1

R


27/140

Mersenne Twister

Bit shifting algorithm

Use XOR function to flip values

32 bit word:

0001 1111

XOR


28/140

Mersenne Twister

Use 624 32 bit words to make one19937 bit word (623*32 + 1) XOR flip function in each 32-bit word

32 bit word:

0000 1111

XOR

Tonext

word

From

last

word


29/140

MersenneTwister

From:

John Savards

Cryptology

Page

http://www.quadibloc.com


30/140

Mersenne Twister

By Matsumoto and Nishimura (1997) Mersenne Prime Numbers (powers of 2 1) give period length: 219937-1 for 32 bitnumbers

Free C source code

Fast

Passes all randomness smell tests

Not cryptographically secure


31/140


r = j(10,1,.);

call randgen(r,'uniform');

print r;quit;

r

0.01510130.5743561

0.5829185

0.6437729

0.1823678

0.3977417

0.4768810.9845982

0.3211301

0.9623223

SAS


32/140

> RNGkind()

[1] "Mersenne-Twister" "Inversion"

> r=matrix(runif(10), 10,1)

> r[,1]

[1,] 0.14645262

[2,] 0.04558767

[3,] 0.79254901

[4,] 0.57810786

[5,] 0.57831079[6,] 0.30258424

[7,] 0.08682622

[8,] 0.77980499

[9,] 0.34161593

[10,] 0.98705945

R


33/140

Both R and SAS automatically grab a seed value

from the system clock at first use, unless you call

set.seed (in R) or randseed (in SAS) to set a specificstarting point

Grabbing a Seed from theSystem Clock (SAS)

SAS


34/140


call randseed(12345);

r = j(10,1,.);

call randgen(r,'uniform');print r;

quit;

r

0.58329710.9936254

0.5878877

0.8574689

0.8246889

0.2805668

0.64739690.3819192

0.4489572

0.8757847

SAS


35/140

> set.seed(12345)


> r[,1]

[1,] 0.7209039

[2,] 0.8757732

[3,] 0.7609823

[4,] 0.8861246

[5,] 0.4564810[6,] 0.1663718

[7,] 0.3250954

[8,] 0.5092243

[9,] 0.7277053

[10,] 0.9897369

R


36/140

Obtaining Random Numbers from

Specific Distributions Inverse Probability Transform

Methods

Rejection Methods

Mixed Rejection and Transform

Methods for Correlated Random

Numbers


37/140

Obtaining Random Numbers from

Specific Distributions Inverse Probability Transform methods

LetXbe a random variable described by CDF F(X)

We wish to generate values ofXdistributedaccording to F(X).

Given a continuous Uniform Random Variable U, in[0,1], the Random VariableX=F-1(U).

{ }1( ) inf | ( ) ,0 1F u x F x u u = = < r=matrix(runif(10000), 10000,1)

>exrand=-log(1-r)/.04

>hist(exrand, freq = FALSE)

> mean(exrand)

[1] 24.55222

> 1/mean(exrand)

[1] 0.04072951

> hist(exrand, freq = FALSE)> help.search("means")

R


45/140

Histogram of exrand

exrand

Density

0 50 100 150 200 250

0.0

00

0.0

10

0.0

20

R


46/140

( )

( )1

Survival Time: ( ) exp

lnInverse Prob Transform:

1.5, 0.001

S U t

Ut

= =

=

= =

Weibull Survival

SAS


47/140


u = j(2000,1,.);

call randgen(u,'uniform');

wrand=(-log(1-u)/.001)##(1/1.5);tbl=u||wrand;

print tbl;

varnames={"u","weibrand"};

create wrand from tbl [colname=varnames];

append from tbl;

Quit;

proc means data=wrand;

var u weibrand;

run;

title 'Analysis of Weibull RVs';proc univariate data=wrand noprint;

histogram weibrand / midpoints=5 to 205 by 10 weibull;

run;

SAS


48/140

SAS


49/140

( )

( )1

Survival Time: ( ) exp


1.5, 0.001

S U t

Ut

= =

=

= =

Weibull Survival

R


50/140

> r=matrix(runif(10000), 10000,1)

> wrand=(-log(1-r)/.001)^(1/1.5)

> hist(wrand, freq = FALSE, main = paste("Histogram of

Survival Times"), breaks=50, xlab = "Survival Time")

R


51/140

R


52/140

proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0);

run; quit;

goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;

** Survival plots **;

goptions reset=symbol;

goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time'; SAS


53/140

SAS


54/140

> r=matrix(runif(1000), 1000,1)

> wrand=(-log(1-r)/.001)^(1/1.5)

> event=wrand wrand2=wrand*(event)+200*(1-event)

> fit plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")>

R


55/140

R


56/140

( )

( )

1

0

1

Survival Time: ( ) exp , exp( )


1.5, exp( )

ln(0.001) 2.30

( ) 0.5, 0.69

S P t

Pt

HR Drug

= = =

=

= =

= =

= =

x

x

Simulating Weibull Regression Data, with

Proportional Hazards

SAS


57/140


u = j(400,1,.);

d = j(200,1,0) // j(200,1,1);


wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);c = wrand


58/140

options pageno=1;

proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;

time WEIBRAND * CENS (0); strata TREAT;

run; quit;

goptions reset=all device=WIN;

data work._surv; set work._surv;

if survival > 0 then _lsurv = -log(survival);

if _lsurv > 0 then _llsurv = log(_lsurv);

run;

** Survival plots **;

title;

footnote;

goptions reset=symbol;

goptions ftext=SWISS ctext=BLACK htext=1 cells;

proc gplot data=work._surv ;

label weibrand = 'Survival Time';

axis2 minor=none major=(number=6)

label=(angle=90 'Survival Distribution Function');

symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1; SAS


59/140

SAS


60/140

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 202.5356 1


61/140

( )

( )

1

0

1

Survival Time: ( ) exp , exp( )


1.5, exp( )

ln(0.001) 2.30

( ) 0.5, 0.69

S P t

Pt

HR Drug

= = =

=

= =

= =

= =

x

x


Proportional Hazards

R


62/140

r=matrix(runif(400), 400,1)

drug=rbind(matrix(1,200,1),matrix(0,200,1))

wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)

event = wrand


63/140

R

Package eha


64/140

Call:phreg(formula = Surv(enter, wrand, event) ~ drug)

Covariate W.mean Coef Exp(Coef) se(Coef) Wald p

drug 0.586 -0.731 0.481 0.113 0.000

log(scale) 4.663 105.903 0.050 0.000

log(shape) 0.402 1.495 0.047 0.000

Events 327

Total time at risk 44359

Max. log. likelihood -1886.9

LR test statistic 42.4

Degrees of freedom 1

Overall p-value 7.38224e-11

> enter=matrix(0,400,1)

> fit fit

> plot.phreg(fn="sur)

R

Package eha


65/140

R


66/140

Generating Numbers from

Specific Distributions Rejection Method

Fast

Good for Count Models Good when you cannot find F-1 , but have

f(x)

Generally Use Pairs of Random Numbers Just like playing the game Battleship

The Rejection Method is Like Playing the Game Battleship


67/140

The Rejection Method is Like Playing the Game Battleship


68/140

Rejection

Choose pairs of uniform random

numbers

xU betweenXmin andXmax yU between Ymin and Ymax

RejectxU ifyU > f(x) at xU

Rejection


69/140

Rejection

Xmin Xmax

Ymax

Ymin

f(x) Hit

MissMiss

Sample the area (two dimensions) containing theprobability distribution or density function uniformly.


70/140

Rejection

Simple version becomes inefficient if

the rejection area is large.

Large Dead Zone

g(x)

Bi i l C t


71/140

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

0

0.05

0.1

0.15

0.2

0.25239 In, 761 Rejected

Binomial Count:p=0.2

Trials=20

Matlab

Rejection


72/140

Rejection

Can be made more efficient by uniform

sampling over a smaller target area.

g(x)

Smaller

Dead Zone

The trick is to sample uniformly overthe smaller area.

Rejection


73/140

Rejection



g(x)f(x)

First, define "dominating function" ( ),

and corresponding integral or Cumulative

Distribution ( ).( ) need not be normalized.

f x

F xF x

Smaller

Dead Zone

Rejection


74/140

Rejection



g(x)f(x)

Smaller

Dead Zone

2

2

( ) , 0

( )2

( )2

ab

ab

Max

f x a bx x

bF x ax x

aF

b

ab

x

=

=

=

=

R j ti


75/140

Rejection

ChoosexU based on inverse transform of the

integrated dominance function (F(x)).

Choose a uniform random numberU1 in the range:

Calculatexby setting F(x)=U, and solving (the quadratic) forx.

g(x)f(x)

xU

2

0 12

aU

b

Rejection


76/140

Rejection

Evaluate f(x), choose a second uniform

random numberU2between 0 and f(x). Reject ifU2>f(x)

g(x)f(x)

xU


77/140

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

0

0.05

0.1

0.15

0.2

0.25

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

0

0.05

0.1

0.15

0.2



78/140

Weibull Function?

( )

( )

( )( )

1

11

( )

( ) exp 2

( ) 1 exp

ln , 6.5, 1.8u

x xf x times

xF x

F x u

=

=

= = = =


79/140

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

0

0.05

0.1

0.15

0.2

0.25

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

-5 0 5 10 15 20 250

0.05

0.1

0.15

0.2

0.25

Count

Frequency

0

0.05

0.1

0.15

0.2


Binomial Distribution


80/140

(Bernoulli Trials are the simplestexample of the rejection method.)

Probability Pr(X=1): p

>>proc iml; /* begin IML session */


81/140

r = j(10,1,.);

call randgen(r,'uniform');

b=r>0.5;

print b;quit;

b

1

0

1

1

0

0

0

00

1

SAS


82/140


> b=r cbind(r,b)

[,1] [,2]

[1,] 0.4919652 1

[2,] 0.5088624 0

[3,] 0.5955355 0

[4,] 0.5243394 0

[5,] 0.5923056 0[6,] 0.1610980 1

[7,] 0.9663659 0

[8,] 0.2548106 1

[9,] 0.4582953 1

[10,] 0.1170421 1>

R

But then you never have just one


83/140

Simulating Outcomes from a

Logistic Model

But then, you never have just onevalue of p for your Bernoulli Trials

Placebo Controlled Drug Trial 25% Success for Placebo

Odds Ratio of 2.0 for Treatment

Two different success probabilities,

based on logistic model


84/140

Logistic Model

( )

( )

( ) ( )( )

0

0

0

expPlacebo: 0.25 , 1.0986

1 exp

Drug (0,1): OR=2.0, ln(OR)=0.6931

exp 1.0986 0.6931*CDF

1 exp 1.0986 0.6931*

DrugSuccess

Drug

= =

+

+=+ +


j(400 1 )


85/140

u = j(400,1,.);

d = j(400,1,1)||(j(200,1,0) // j(200,1,1));

bta= {-1.0986 , 0.6931};


expit=exp(d*bta)/(1+exp(d*bta));outcome=u


86/140

SAS

The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -1.2367 0.1693 53.3383


87/140

r=matrix(runif(400), 400,1)

drug=rbind(matrix(0,200,1),matrix(1,200,1))

d=cbind(matrix(1,400,1), drug)

parms=matrix(c(-1.0986 , 0.6931),2,1)

expit=exp(d%*%parms)/(1+exp(d%*%parms))

outcome=r drugtrial


88/140

R

> drugtrial< glm(outcome drug, family binomial(link logit ))

> summary(drugtrial)

Call:

glm(formula = outcome ~ drug, family = binomial(link = "logit"))

Deviance Residuals:

Min 1Q Median 3Q Max

-1.036 -1.036 -0.776 1.326 1.641

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***

drug 0.7026 0.2158 3.255 0.00113 **

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 511.49 on 399 degrees of freedom

Residual deviance: 500.67 on 398 degrees of freedomAIC: 504.67

Number of Fisher Scoring iterations: 4

Normally Distributed Random Numbers


89/140


Inverse transform methods inefficient

for normal random numbers

Box-Muller Transform

z transformation of two random uniform

variates [X1,X2~U(0,1)] Random radius, random

1 2

1 1 2

2 1 2

Get two z variates from two uniform

random numbers, and :

cos( ) 2 ln( ) cos(2 )

sin( ) 2 ln( ) sin(2 )

X X

z r X X

z r X X

= =

= =


90/140


Box-Muller Transform X1, X2, specify a position within the unit circle Random angle, random radius

Would be more efficient if it did not make calls totrigonometric functions.

Marsaglia Method Places the Unit Circle within a square, -1 to +1,

and samples the square uniformly. Rejects draws that fall outside the circle.

But it avoids calls to trig functions.2 2

1 2

1 1 2 2

1

2ln( ) 2 ln( ),

s X X

s sz X z X

s s

= +

= =

G ti N b f


91/140

Generating Numbers from

Specific Distributions Normal, Using CLT (quick & dirty)

Sum several iterations ofu

Standardize Recall that Var(u)=1/12

12

16i

iX u

==


92/140

Correlated Multivariate Random Numbers

Simulating panel data, repeated

measures

Mixture distributions


93/140

Generating Multivariate Normal

Random Numbers

Desired Covariance Matrix

, is the Cholesky Decomposition of

Begin with independent standard normal RVs (0,1)

Correlated (Multivariate) Normal RVs:

N

+

V

V = R'R R V

Z

X = R'Z

:



94/140

Random Numbers


rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};

sigvec={53 36 12 47};

cvmat=rmat#(sigvec`*sigvec);

upr=half(cvmat);

print rmat;

print sigvec;

print cvmat;

print upr;

r1 = j(1000,4,.);

r2 = j(1000,4,.);

call randgen(r1,'uniform');


pi= 4*atan(1);

print pi;

/* Lets be wasteful */

z1=sqrt(-2*log(r1))#cos(2*pi*r2);

z1=z1*upr;

varnames={"x1","x2","x3","x4"};

create nrand from z1 [colname=varnames];

append from z1;

quit;

proc corr data=work.nrand pearson;

var x1 x2 x3 x4;

run;

SAS

rmat

1 0 3 0 2 0 1


95/140

1 0.3 0.2 0.1

0.3 1 0.3 0.2

0.2 0.3 1 0.3

0.1 0.2 0.3 1

sigvec

53 36 12 47

cvmat

2809 572.4 127.2 249.1

572.4 1296 129.6 338.4

127.2 129.6 144 169.2

249.1 338.4 169.2 2209

upr

53 10.8 2.4 4.7

0 34.341811 3.0190603 8.3757958

0 0 11.36333 11.672016

0 0 0 44.503035

SAS

The CORR Procedure


96/140

4 Variables: x1 x2 x3 x4

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299

x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335

x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908

x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509

Pearson Correlation Coefficients, N = 1000

Prob > |r| under H0: Rho=0

x1 x2 x3 x4

x1 1.00000 0.30338 0.20341 0.11397


97/140

Random Numbers

cmat cmat

[,1] [,2] [,3] [,4]

[1,] 1.0 0.3 0.2 0.1

[2,] 0.3 1.0 0.3 0.2

[3,] 0.2 0.3 1.0 0.3

[4,] 0.1 0.2 0.3 1.0

> vv

[,1] [,2] [,3] [,4]

[1,] 2809.0 572.4 127.2 249.1

[2,] 572.4 1296.0 129.6 338.4

[3,] 127.2 129.6 144.0 169.2

[4,] 249.1 338.4 169.2 2209.0

> rr


98/140

R

[,1] [,2] [,3] [,4]

[1,] 53 10.80000 2.400000 4.700000

[2,] 0 34.34181 3.019060 8.375796

[3,] 0 0.00000 11.363330 11.672016

[4,] 0 0.00000 0.000000 44.503035

> cov(rvs)

[,1] [,2] [,3] [,4]

[1,] 2832.4200 561.2585 134.0656 533.7351

[2,] 561.2585 1235.7616 124.2373 382.5441[3,] 134.0656 124.2373 127.4132 160.2173

[4,] 533.7351 382.5441 160.2173 2205.5903

> cor(rvs)

[,1] [,2] [,3] [,4]

[1,] 1.0000000 0.2999969 0.2231676 0.2135426

[2,] 0.2999969 1.0000000 0.3130961 0.2317137

[3,] 0.2231676 0.3130961 1.0000000 0.3022317

[4,] 0.2135426 0.2317137 0.3022317 1.0000000

> sd(rvs)

[1] 53.22048 35.15340 11.28774 46.96371


99/140

Subject-specific Random Effects

We have an error term (eij) for

measurement j in subject i.

We also have a subject specific randomeffect (ki)

For the subject in the measurement:th th

ij ij i

i j

y x e k= + +

Recipe for Subject specific Random Effects


100/140

Recipe for Subject-specific Random Effects

Create subjects for study NAssign treatment, covariates

Give each subject a random effect

Drawn from, say, N(0,V) Generate predicted values based on

regression + random effects

Generate outcomes for each repeated

measure from specific distribution

Logistic Model


101/140

Logistic Model

( )( )

( )( )

0

0

0

i

expPlacebo: 0.25 , 1.0986

1 exp

Drug: OR=2.0, ln(OR)=0.6931

Time (0,1,2): OR 1.5, ln(OR)=0.4055

K N(0,1)

exp 1.0986 0.6931* 0.4055*CDF

1 exp 1.0986 0.6931* 0.4055*

iDrug Time KSuccess

Drug Time

= =

+

+ + +=

+ + +

:

( )iK+


u = j(600,1,.);d

12

1

btabta

id


102/140

u j(600,1,.);

d1=j(100,1,0)//j(100,1,1);

d1=d1//d1//d1;

id=j(200,1,0);

do i=1 to 200 by 1;

id[i,1]=i;

end;

id=id//id//id;

t=j(200,1,0)//j(200,1,1)//j(200,1,2);

k=j(200,1,.);

call randgen(k,'normal');

k=k//k//k;

bta= {-1.0986 , 0.6931,.4055,1};d = j(600,1,1)||d1||t||k;


expit=exp(d*bta)/(1+exp(d*bta));

y=u


103/140

Random effects logistic regression Number of obs 600Group variable: id Number of groups = 200

Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3

Wald chi2(2) = 11.07Log likelihood = -394.754 Prob > chi2 = 0.0040

------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047

t | .235297 .1123071 2.10 0.036 .015179 .4554149_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642

-------------+----------------------------------------------------------------/lnsig2u | -.1281394 .3971684 -.9065751 .6502964

-------------+----------------------------------------------------------------sigma_u | .9379396 .18626 .6355353 1.384236

rho | .2109869 .0661172 .1093476 .3680594------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000

. di -394.754*(-2)789.508

Stata, SAS data

Finally got this to run in SAS. I had forgotten that SAS requires you to sort.

Stata does not require sorted data for their mixed models.


104/140

proc sort data=erand;

by id t;

run;

proc nlmixed data=erand qpoints=5 ;

parms b0=0 b1=-.7 b2=.6 sig=0 ;

theta2 = b0+b1*treat+b2*t+u;

prb= exp(theta2)/(1+exp(theta2));

model outcome ~ binary(prb);

random u ~normal(0,sig) subject=id ;

run;

SAS

Th NLMIXED P d


105/140

The NLMIXED Procedure

Fit Statistics

-2 Log Likelihood 789.5

AIC (smaller is better) 797.5AICC (smaller is better) 797.6

BIC (smaller is better) 810.7

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient

b0 -0.9332 0.2039 199 -4.58


106/140

q

k1=matrix(runif(200), 200,1)

k2=matrix(runif(200), 200,1)

k=sqrt(-2*log(k1))*cos(2*pi*k2)

id=rbind(id,id,id)k=rbind(k,k,k)

drug =rbind(matrix(0,100,1), matrix(1,100,1))

drug=rbind(drug,drug,drug)

d=cbind(matrix(1,600,1),drug,k)

parms=matrix(c(-1.0986 , 0.6931,1),3,1)

expit=exp(d%*%parms)/(1+exp(d%*%parms))

outcome=r


107/140

g gGroup variable: id Number of groups = 200

Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3

Wald chi2(1) = 14.55Log likelihood = -368.112 Prob > chi2 = 0.0001

------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085

-------------+----------------------------------------------------------------/lnsig2u | -.6325958 .5624138 -1.734907 .469715

-------------+----------------------------------------------------------------sigma_u | .7288423 .2049555 .4200198 1.264729

rho | .1390212 .0673177 .050895 .3271436------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012

R

Got this to run in R, using a mixed effects package called Zelig.

z.out1


108/140

g g g g

Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic

model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical

Software," http://gking.harvard.edu/zelig

summary(z.out1)

Generalized linear mixed model fit by the Laplace approximation

Formula: outcome ~ drug + tag(1 | id)

AIC BIC logLik deviance

743.4 756.6 -368.7 737.4

Random effects:

Groups Name Variance Std.Dev.id (Intercept) 0.39486 0.62838

Number of obs: 600, groups: id, 200

Fixed effects:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***

drug 0.8174 0.2023 4.041 5.32e-05 ***

---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Correlation of Fixed Effects:

(Intr)

drug -0.747

R

General Approach to Correlated
http://gking.harvard.edu/zelighttp://gking.harvard.edu/zelig


109/140

General Approach to CorrelatedMultivariate Random Numbers

Copulas allow us to draw correlated random

numbers from different distributions

Random effects in Mixture Models They use CDF probabilities of correlated

variables on the inside to map tocorrelated uniform random numbers on

the margins Those correlated uniform RVs may be used

to marry vastly different distributions.

Maintain Marginal Distributions

Generating Multivariate Random


110/140


Numbers

From SAS documentation, a Gaussian Copula

Independent Normal (N(0,1) ) random variables are

generated

These variables are transformed to a correlated set ofz-scores by using the Cholesky Decomposition of the

covariance matrix.

These correlated normal RVs are transformed to a

uniform by using (z).

F-1

() is used to compute the final sample value



111/140


Numbers

proc iml;/* begin IML session */

rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};

sigvec={1 1 1 1};

cvmat=rmat#(sigvec`*sigvec);

/* # is element-wise multiplication */

upr=half(cvmat);

print rmat;print sigvec;

print cvmat;

print upr;

r1 = j(1000,4,.);

r2 = j(1000,4,.);


call randgen(r2,'uniform');pi= 4*atan(1);

print pi;

z1=sqrt(-2*log(r1))#cos(2*pi*r2);

/* Note I could have gotten another z here */

z1=z1*upr;

z1=cdf('Normal',z1);

z1=gaminv(z1,3.0);

/* Standardized gamma parameter, also the

mean */

varnames={"x1","x2","x3","x4"};

create nrand from z1 [colname=varnames];

append from z1;

quit;

proc corr data=work.nrand pearson;var x1 x2 x3 x4;

run;

SAS


112/140

rmat

1 0.3 0.2 0.1

0.3 1 0.3 0.2

0.2 0.3 1 0.3

0.1 0.2 0.3 1

sigvec

1 1 1 1

cvmat

1 0.3 0.2 0.1

0.3 1 0.3 0.20.2 0.3 1 0.3

0.1 0.2 0.3 1

SAS

The CORR Procedure


113/140

4 Variables: x1 x2 x3 x4

Simple Statistics

Variable N Mean Std Dev Sum Minimum Maximum

x1 1000 2.96320 1.73566 2963 0.11528 12.19072

x2 1000 3.01249 1.68236 3012 0.14039 10.20117

x3 1000 3.00336 1.68803 3003 0.34496 13.72023

x4 1000 3.08106 1.79858 3081 0.11148 13.25409

Pearson Correlation Coefficients, N = 1000

Prob > |r| under H0: Rho=0

x1 x2 x3 x4

x1 1.00000 0.25874 0.19005 0.10052


114/140


Numberscmat U = copularnd('Gaussian',.4,10)>> X = norminv(U,0,1);


115/140

U =

0.8017 0.9388

0.3650 0.22500.8104 0.6253

0.3467 0.0988

0.6067 0.6561

0.4743 0.6723

0.6273 0.7427

0.9905 0.8249

0.4427 0.6925

0.3443 0.2711

>> U = copularnd('Gaussian',.4,10000);

>> corr(U)

ans =

1.0000 0.37650.3765 1.0000

o (U,0, );

>> corr(X)

ans =

1.0000 0.3901

0.3901 1.0000

>> Xg = gaminv(U,2,3);

>> corr(Xg)

ans =

1.0000 0.3721

0.3721 1.0000

>>

Matlab(a little more clear)


116/140


117/140

O S


118/140

Old Slides

Grabbing a Seed from theS t Cl k (St t )


119/140

program define seedset

local ct =c(current_time)

local s1=substr("`ct'",7,2)



global newseed=real("`s1'" +"`s2'" +"`s3'")

di $newseed

set seed $newseed

end

System Clock (Stata)

LCG is default for Stata


120/140

. set obs 100obs was 0, now 100

. gen x0=ceil(uniform()*100)

. gen m=ceil(uniform()*10)

. gen x1=mod(x0,m)

. list in 1/10

+------------+| x0 m x1 ||------------|

1. | 70 2 0 |2. | 62 7 6 |3. | 92 7 1 |4. | 53 1 0 |5. | 37 3 1 |

|------------|

6. | 78 1 0 |7. | 47 2 1 |8. | 91 2 1 |9. | 98 1 0 |10. | 71 9 8 |

+------------+

Testing Randomness (Stata)


121/140

Correlogram ofXi versusXi+k for serial

correlation . gen tv=_n. tsset tv

time variable: tv, 1 to 20000delta: 1 unit

. corrgram x, lags(40)

-1 0 1 -1 0

1LAG AC PAC Q Prob>Q [Autocorrelation] [PartialAutocor]-------------------------------------------------------------------------------1 0.0026 0.0026 .13551 0.7128 | |

2 -0.0011 -0.0011 .15995 0.9231 | |

3 -0.0004 -0.0004 .16301 0.9833 | |

4 -0.0131 -0.0131 3.5987 0.4630 | |

5 -0.0008 -0.0007 3.6115 0.6066 | |

6 0.0119 0.0118 6.4238 0.3774 | |

7 -0.0060 -0.0061 7.1533 0.4131 | |

8 0.0004 0.0003 7.1571 0.5198 | |

9 0.0057 0.0057 7.815 0.5529 | |

d t

(Stata)


122/140

. seedset23491

. set obs 2000obs was 0, now 2000

. gen P=uniform()

. gen enum=-ln(P)/.04

. ci

Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------

P | 2000 .5030619 .0065298 .4902559 .5158678enum | 2000 24.79822 .553316 23.71309 25.88336

. set obs 200obs was 0, now 200 (Stata)


123/140

. gen P=uniform()

. gen tte=(-ln(P)/0.1)^1.5

. gen fail=1

. replace fail=0 if tte>200

. replace tte=200 if tte>200

. stset tte, fail(fail)

failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure

-------------------------------------------------------------------200 total obs.0 exclusions

-------------------------------------------------------------------200 obs. remaining, representing188 failures in single record/single failure data

9019.163 total analysis time at risk, at risk from t = 0earliest observed entry t = 0

last observed exit t = 200

(Stata)


124/140

0.0

0

0.2

5

0

.50

0.7

5

1.0

0

0 50 100 150 200

analysis time

Kaplan-Meier survival estimate

. streg, d(w) nohr(Stata)


125/140

failure _d: failanalysis time _t: tte

Weibull regression -- log relative-hazard form

No. of subjects = 200 Number of obs = 200No. of failures = 188Time at risk = 9019.163067

LR chi2(0) = 0.00Log likelihood = -403.39593 Prob > chi2 = .

--------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]

-------------+------------------------------------------------------_cons | -2.245 0.167 -13.47 0.000 -2.572 -1.918

-------------+------------------------------------------------------delta | 0.625 0.036 0.558 0.701

--------------------------------------------------------------------

.

. gen P=uniform()

. gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5(Stata)


126/140

. gen fail=1

. replace fail=0 if tte>200

(39 real changes made)

. replace tte=200 if tte>200(39 real changes made)

. stset tte, fail(fail)

failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure

------------------------------------------------------------------------------400 total obs.0 exclusions

------------------------------------------------------------------------------

400 obs. remaining, representing361 failures in single record/single failure data

25170.04 total analysis time at risk, at risk from t = 0earliest observed entry t = 0

last observed exit t = 200

.

. list drug P tte fail in 1/15

(Stata)


127/140

+-----------------------------------+| drug P tte fail |

|-----------------------------------|1. | 1 .842721 6.331312 1 |2. | 0 .3839878 29.6119 1 |3. | 1 .3483792 96.8484 1 |4. | 0 .6035132 11.34804 1 |5. | 1 .8460417 6.114305 1 |

|-----------------------------------|6. | 1 .4935982 53.06192 1 |

7. | 1 .5173908 47.84433 1 |8. | 1 .385052 83.39208 1 |9. | 0 .8726683 1.589515 1 |10. | 0 .0356283 192.5611 1 |

|-----------------------------------|11. | 0 .8018837 3.280757 1 |12. | 0 .6059877 11.21039 1 |

13. | 1 .7919235 10.07838 1 |14. | 0 .1920578 67.02081 1 |15. | 0 .0819428 125.1301 1 |

+-----------------------------------+

0Kaplan-Meier survival estimates

(Stata)


128/140

0.0

0

0.2

5

0.5

0

0.7

5

1.0

0

0 50 100 150 200analysis time

drug = 0 drug = 1

. streg drug, d(w) nohr

f il d f il

(Stata)


129/140

failure _d: failanalysis time _t: tte

Weibull regression -- log relative-hazard form

No. of subjects = 400 Number of obs = 400No. of failures = 361Time at risk = 25170.03819

LR chi2(1) = 53.70Log likelihood = -757.69677 Prob > chi2 = 0.0000

--------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]

-------------+------------------------------------------------------drug | -0.788 0.107 -7.34 0.000 -0.998 -0.577_cons | -2.458 0.143 -17.19 0.000 -2.738 -2.177

-------------+------------------------------------------------------

delta | 0.706 0.031 0.648 0.768--------------------------------------------------------------------

.


(Stata)


130/140

Ge e at g u t a ate o a

Random Numbers

In Stata , gennorm (webseek to download):

Typing

. gennorm a b c, corr(.2 .3 .4)

creates a, b, and c with value draw from a N(0,S) distribution where

+- -+

| 1 |

S = | .2 1 |

| .3 .4 1 |

+- -+

That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4

CONTINUED NEXT PAGE


(Stata)


131/140

g

Random Numbers

In Stata:

Example-------

. set obs 10000

obs was 0, now 10000

. set seed 6819

. gennorm a b c, corr(.2 .3 .4)

. summarize a b c

Variable | Obs Mean Std. Dev. Min Max

-------------+----------------------------------------------------------------------------

a | 10000 -.0105333 1.005723 -3.694448 3.775433

b | 10000 -.0042212 1.000254 -3.695302 3.648826

c | 10000 -.0069625 .9989002 -3.996779 3.606923

. corr a b c(obs=10000)

| a b c

-------------+---------------------------------

a | 1.0000

b | 0.2137 1.0000

c | 0.3035 0.3952 1.0000

Generating Multivariate Normal(Stata)


132/140

g

Random Numbers

In Stata, drawnorm:

. clear

. matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)

. drawnorm a b c, n(10000) corr(C)

(obs 10000)

. summarize a b c

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

a | 10000 -.0176275 .9920181 -3.701594 3.7838

b | 10000 .0009005 1.003002 -3.709259 3.518793

c | 10000 -.0149926 .9925292 -3.716346 4.009713

. corr a b c

(obs=10000)

| a b c

-------------+---------------------------

a | 1.0000

b | 0.1937 1.0000

c | 0.3051 0.4056 1.0000


Ti D d i D Eff t


133/140

( )Survival Time: ( ) exp , exp( )

-2.30 0.3* - 0.08* *

Inverse Prob Transform: ?????????

How do you solve for ? (Not all answers are in the book.)

S P t

x drug drug t

t

= = =

= +

x

Time-Dependency in Drug Effect

Remember Newtons Method?


134/140

( )*

01 0

0

( ) exp exp(-2.30 0.3* - 0.004* * )( ) 0

( ) ( )( )

( )

( )

f t drug drug t t Pf t

f t d f tf t

d

f tt t

f t

= + =

+

=

t0 t1

clearset obs 400gen drug=_n>200

(Stata)


135/140

gen double P=uniform()gen double t=1

gen double tpd=t+.0001gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pgen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pgen double slope=(fp-f)/0.0001

forvalues i=1/50 {

qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pqui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pqui replace slope=(fp-f)/0.0001

qui replace t=t-f/slopequi replace tpd=t+.0001

}

MatlabMatlab


136/140

>> drug=[zeros(1000,1);ones(1000,1)];

>> P=rand(2000,1);

>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));

>> outcome=P> b = glmfit(drug,outcome,'binomial')

b =

-1.0616

0.6562

00

Kaplan-Meier survival estimates

(Stata)


137/140

0

.00

0.2

5

0.5

0

0.7

5

1.0

0 50 100 150 200analysis time

drug = 0 drug = 1

. gen P=uniform()

. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-

(Stata)


138/140

1.0986+0.6931*drug))

. list in 1/10

+----------------------------+| drug P cdf0 ||----------------------------|

1. | 0 .2865897 .2500023 |

2. | 0 .3788754 .2500023 |3. | 1 .3597057 .3999916 |4. | 1 .7182508 .3999916 |5. | 1 .4315197 .3999916 |

|----------------------------|6. | 1 .2963237 .3999916 |

7. | 1 .7961193 .3999916 |8. | 0 .056983 .2500023 |9. | 0 .4622037 .2500023 |

10. | 0 .5336403 .2500023 |+----------------------------+

. gen outcome=P


139/140

. list in 1/10

+--------------------------------------+| drug P cdf0 outcome ||--------------------------------------|

1. | 0 .2865897 .2500023 0 |2. | 0 .3788754 .2500023 0 |3. | 1 .3597057 .3999916 1 |

4. | 1 .7182508 .3999916 0 |5. | 1 .4315197 .3999916 0 |

|--------------------------------------|6. | 1 .2963237 .3999916 1 |7. | 1 .7961193 .3999916 0 |8. | 0 .056983 .2500023 1 |9. | 0 .4622037 .2500023 0 |

10. | 0 .5336403 .2500023 0 |+--------------------------------------+

gen outcome=P


140/140

. gen outcome=P chi2 = 0.0000

Log likelihood = -1245.3138 Pseudo R2 = 0.0230

--------------------------------------------------------------------outcome | OR SE z P>|z| [95% CI]-------------+------------------------------------------------------

drug | 2.084 0.202 7.57 0.000 1.723 2.519--------------------------------------------------------------------

_cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935--------------------------------------------------------------------

.

14_RandomNumber

Documents

Transcript of 14_RandomNumber