14_RandomNumber
-
Upload
shruthi-shruthi -
Category
Documents
-
view
213 -
download
0
Transcript of 14_RandomNumber
-
7/28/2019 14_RandomNumber
1/140
BSTA 670 Statistical Computing
27 October 2010
Lecture 14:
Random Number Generation
Presented by:Paul Wileyto, Ph.D.
-
7/28/2019 14_RandomNumber
2/140
Anyone who uses software to produce random
numbers is in a state of sin.
John von Neumann
A good analog
random numbergenerator.
-
7/28/2019 14_RandomNumber
3/140
Why do we need Random
Numbers?
Simulation Input
Statistical Sampling Assignment in Trials
Games
-
7/28/2019 14_RandomNumber
4/140
Where do you get random numbers?
Uniform Random Numbers Published Tables
Make Them
Computer Algorithms Harvest from Nature
Random draws from a specificdistribution
Make them from Uniform RandomNumbers
-
7/28/2019 14_RandomNumber
5/140
Software Random Number Generators
There are no true random numbergenerators but
There are Pseudo-Random NumberGenerators
Computers have only a limited number ofbits to represent a number
Sooner or later, the sequence of randomnumbers will repeat itself (period of the
generator) The trick is to be good enough to look likerandom numbers
-
7/28/2019 14_RandomNumber
6/140
Algorithms for Uniform Random Numbers
-
7/28/2019 14_RandomNumber
7/140
Good pseudo-random numbers:
Independent of the previous number
Long period
Sequence reproducible if started with
same initial conditions
Fast
-
7/28/2019 14_RandomNumber
8/140
Good pseudo-random numbers:
Equal probability for any number inside
interval [a,b]
Probability Density:
1,( )
0, ,
a x bf x b a
x a x b
=
< >
-
7/28/2019 14_RandomNumber
9/140
We are interested primarily inuniform random numbers in the
interval [0,1].
Well refer to the realization of a uniform
random number over [0,1] as U. Many of the algorithms produce integer
valued random numbers over interval
[0,b]. Transform to interval [0,1]
-
7/28/2019 14_RandomNumber
10/140
Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X
= +
e desired range.
Mod in SAS
proc iml; /* begin IML session */
q={20,30,40,50,70,90,160};
t=mod(q,7);
qt=q||t;
print qt; /* print matrix */
quit;
qt
20 6
30 2
40 5
50 1
70 0
90 6
160 6
SAS
-
7/28/2019 14_RandomNumber
11/140
Linear Congruential Generator (LCG)
-1
Most common
( ) mod
= seed, modulus m (large prime),
muliplier , and increment c
Repeats due to the modular
arithmetic that forces wrapping
of values into th
n n
o
X X c m
X
= +
e desired range.
Mod in R
q
-
7/28/2019 14_RandomNumber
12/140
proc iml; /* begin IML session */
seed = 123456;c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
SAS
-
7/28/2019 14_RandomNumber
13/140
RNGkind()[1] "Mersenne-Twister" "Inversion"
set.seed(as.integer(format(Sys.time(), "%S%M%H")))
c
-
7/28/2019 14_RandomNumber
14/140
proc iml; /* begin IML session */
seed = 0;c = j(5,1,seed);
b = uniform(c);
print b;
quit;
b
0.73902
0.2724794
0.7095326
0.3191636
0.367853
Unit Random Variates in SAS
Set seed to 0 to
grab a seed value
from the system
clock.
SAS
-
7/28/2019 14_RandomNumber
15/140
RANUNI() and IML UNIFORM() use a multiplicative
linear congruential generator (from SAS docs) where
SEED = mod( SEED * 397204094, 2**31-1 )
and then returns
SEED / (2**31-1)
SAS
-
7/28/2019 14_RandomNumber
16/140
Testing Randomness
Is it Uniform?
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3x 10
4
-
7/28/2019 14_RandomNumber
17/140
Testing Randomness
Generate two sets
and plot against
each other Might see
correlation in higher
dimensions
PlotXi versusXi+k for
serial correlation
0
.2
.4
.6
.8
1
x
0 .2 .4 .6 .8 1y
0
.2
.4
.6
.8
1
x1
0 .2 .4 .6 .8 1y2
-
7/28/2019 14_RandomNumber
18/140
-
7/28/2019 14_RandomNumber
19/140
Linear Congruential Generator
The good
Fast
Up to period of m random numbers
The Bad
Sequential correlation
Plots in more than 1 dimension do not fill in the
space uniformly, but tend to form bands
Not cryptographically secure
Selections of m, , and c are important
-
7/28/2019 14_RandomNumber
20/140
Linear Congruential Generator
Good magic number for linear
congruent method:
a = 16,807, c = 0, M = 2,147,483,647
-
7/28/2019 14_RandomNumber
21/140
Overflow Method for integers
Multiply two 32-bit numbers to get a 64bit integer, that cannot be represented
in 32-bit space. Low order 32 bits remain after the
overflow.
Divide by 2
32
to get floating point valuesbetween 0 and 1.
Very Fast
1j jI aI c+ = +
-
7/28/2019 14_RandomNumber
22/140
Blum, Blum, Shub
Very slow Not suited to simulation
Passes all tests
Cryptographically secure
( )
2
1 mod , ,where p and q are large primes
n nX X M M pq+= =
-
7/28/2019 14_RandomNumber
23/140
Mersenne Twister
By Matsumoto and Nishimura (1997) Caused a great deal of excitement in
1997.
Good statistical properties Not good for cryptography
SAS IML RANDGEN function
Default technique for R runif()
-
7/28/2019 14_RandomNumber
24/140
Mersenne Twister
Im just going to give you the flavor of it Its a bit shifting algorithm
32 bit word:
0000 1111
-
7/28/2019 14_RandomNumber
25/140
Mersenne Twister
XOR Logical bitwise comparison
function Compares two bits
If they are different, value is1
If they are the same, value iszero
>> a=[0 0 1 1]
a =
0 0 1 1
>> b=[0 1 1 0]
b =
0 1 1 0
>> c=xor(a,b)
c =
0 1 0 1
>> MATLAB
-
7/28/2019 14_RandomNumber
26/140
Mersenne Twister
XOR Logical bitwise comparison
function Compares two bits
If they are different, value is1
If they are the same, value iszero
> a b c c
[1] FALSE TRUE FALSE TRUE
> as.integer(c)
[1] 0 1 0 1
R
-
7/28/2019 14_RandomNumber
27/140
Mersenne Twister
Bit shifting algorithm
Use XOR function to flip values
32 bit word:
0001 1111
XOR
-
7/28/2019 14_RandomNumber
28/140
Mersenne Twister
Use 624 32 bit words to make one19937 bit word (623*32 + 1) XOR flip function in each 32-bit word
32 bit word:
0000 1111
XOR
Tonext
word
From
last
word
-
7/28/2019 14_RandomNumber
29/140
MersenneTwister
From:
John Savards
Cryptology
Page
http://www.quadibloc.com
-
7/28/2019 14_RandomNumber
30/140
Mersenne Twister
By Matsumoto and Nishimura (1997) Mersenne Prime Numbers (powers of 2 1) give period length: 219937-1 for 32 bitnumbers
Free C source code
Fast
Passes all randomness smell tests
Not cryptographically secure
-
7/28/2019 14_RandomNumber
31/140
proc iml; /* begin IML session */
r = j(10,1,.);
call randgen(r,'uniform');
print r;quit;
r
0.01510130.5743561
0.5829185
0.6437729
0.1823678
0.3977417
0.4768810.9845982
0.3211301
0.9623223
SAS
-
7/28/2019 14_RandomNumber
32/140
> RNGkind()
[1] "Mersenne-Twister" "Inversion"
> r=matrix(runif(10), 10,1)
> r[,1]
[1,] 0.14645262
[2,] 0.04558767
[3,] 0.79254901
[4,] 0.57810786
[5,] 0.57831079[6,] 0.30258424
[7,] 0.08682622
[8,] 0.77980499
[9,] 0.34161593
[10,] 0.98705945
R
-
7/28/2019 14_RandomNumber
33/140
Both R and SAS automatically grab a seed value
from the system clock at first use, unless you call
set.seed (in R) or randseed (in SAS) to set a specificstarting point
Grabbing a Seed from theSystem Clock (SAS)
SAS
-
7/28/2019 14_RandomNumber
34/140
proc iml; /* begin IML session */
call randseed(12345);
r = j(10,1,.);
call randgen(r,'uniform');print r;
quit;
r
0.58329710.9936254
0.5878877
0.8574689
0.8246889
0.2805668
0.64739690.3819192
0.4489572
0.8757847
SAS
-
7/28/2019 14_RandomNumber
35/140
> set.seed(12345)
> r=matrix(runif(10), 10,1)
> r[,1]
[1,] 0.7209039
[2,] 0.8757732
[3,] 0.7609823
[4,] 0.8861246
[5,] 0.4564810[6,] 0.1663718
[7,] 0.3250954
[8,] 0.5092243
[9,] 0.7277053
[10,] 0.9897369
R
-
7/28/2019 14_RandomNumber
36/140
Obtaining Random Numbers from
Specific Distributions Inverse Probability Transform
Methods
Rejection Methods
Mixed Rejection and Transform
Methods for Correlated Random
Numbers
-
7/28/2019 14_RandomNumber
37/140
Obtaining Random Numbers from
Specific Distributions Inverse Probability Transform methods
LetXbe a random variable described by CDF F(X)
We wish to generate values ofXdistributedaccording to F(X).
Given a continuous Uniform Random Variable U, in[0,1], the Random VariableX=F-1(U).
{ }1( ) inf | ( ) ,0 1F u x F x u u = = < r=matrix(runif(10000), 10000,1)
>exrand=-log(1-r)/.04
>hist(exrand, freq = FALSE)
> mean(exrand)
[1] 24.55222
> 1/mean(exrand)
[1] 0.04072951
> hist(exrand, freq = FALSE)> help.search("means")
R
-
7/28/2019 14_RandomNumber
45/140
Histogram of exrand
exrand
Density
0 50 100 150 200 250
0.0
00
0.0
10
0.0
20
R
-
7/28/2019 14_RandomNumber
46/140
( )
( )1
Survival Time: ( ) exp
lnInverse Prob Transform:
1.5, 0.001
S U t
Ut
= =
=
= =
Weibull Survival
SAS
-
7/28/2019 14_RandomNumber
47/140
proc iml; /* begin IML session */
u = j(2000,1,.);
call randgen(u,'uniform');
wrand=(-log(1-u)/.001)##(1/1.5);tbl=u||wrand;
print tbl;
varnames={"u","weibrand"};
create wrand from tbl [colname=varnames];
append from tbl;
Quit;
proc means data=wrand;
var u weibrand;
run;
title 'Analysis of Weibull RVs';proc univariate data=wrand noprint;
histogram weibrand / midpoints=5 to 205 by 10 weibull;
run;
SAS
-
7/28/2019 14_RandomNumber
48/140
SAS
-
7/28/2019 14_RandomNumber
49/140
( )
( )1
Survival Time: ( ) exp
lnInverse Prob Transform:
1.5, 0.001
S U t
Ut
= =
=
= =
Weibull Survival
R
-
7/28/2019 14_RandomNumber
50/140
> r=matrix(runif(10000), 10000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> hist(wrand, freq = FALSE, main = paste("Histogram of
Survival Times"), breaks=50, xlab = "Survival Time")
R
-
7/28/2019 14_RandomNumber
51/140
R
-
7/28/2019 14_RandomNumber
52/140
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0);
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time'; SAS
-
7/28/2019 14_RandomNumber
53/140
SAS
-
7/28/2019 14_RandomNumber
54/140
> r=matrix(runif(1000), 1000,1)
> wrand=(-log(1-r)/.001)^(1/1.5)
> event=wrand wrand2=wrand*(event)+200*(1-event)
> fit plot(fit, lty = 2:3,xlab = "Days", ylab="Survival")>
R
-
7/28/2019 14_RandomNumber
55/140
R
-
7/28/2019 14_RandomNumber
56/140
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
lnInverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
Pt
HR Drug
= = =
=
= =
= =
= =
x
x
Simulating Weibull Regression Data, with
Proportional Hazards
SAS
-
7/28/2019 14_RandomNumber
57/140
proc iml; /* begin IML session */
u = j(400,1,.);
d = j(200,1,0) // j(200,1,1);
call randgen(u,'uniform');
wrand=(-log(1-u)/exp(log(.001)-0.69*d))##(1/1.5);c = wrand
-
7/28/2019 14_RandomNumber
58/140
options pageno=1;
proc lifetest data=Work.Wrand method=pl OUTSURV=work._surv;
time WEIBRAND * CENS (0); strata TREAT;
run; quit;
goptions reset=all device=WIN;
data work._surv; set work._surv;
if survival > 0 then _lsurv = -log(survival);
if _lsurv > 0 then _llsurv = log(_lsurv);
run;
** Survival plots **;
title;
footnote;
goptions reset=symbol;
goptions ftext=SWISS ctext=BLACK htext=1 cells;
proc gplot data=work._surv ;
label weibrand = 'Survival Time';
axis2 minor=none major=(number=6)
label=(angle=90 'Survival Distribution Function');
symbol1 i=stepj l=1 width=1; symbol2 i=stepj l=2 width=1; symbol3 i=stepj l=3 width=1; SAS
-
7/28/2019 14_RandomNumber
59/140
SAS
-
7/28/2019 14_RandomNumber
60/140
Testing Global Null Hypothesis: BETA=0
Test Chi-Square DF Pr > ChiSq
Likelihood Ratio 202.5356 1
-
7/28/2019 14_RandomNumber
61/140
( )
( )
1
0
1
Survival Time: ( ) exp , exp( )
lnInverse Prob Transform:
1.5, exp( )
ln(0.001) 2.30
( ) 0.5, 0.69
S P t
Pt
HR Drug
= = =
=
= =
= =
= =
x
x
Simulating Weibull Regression Data, with
Proportional Hazards
R
-
7/28/2019 14_RandomNumber
62/140
r=matrix(runif(400), 400,1)
drug=rbind(matrix(1,200,1),matrix(0,200,1))
wrand=(-log(1-r)/exp(log(.001)-0.69*drug))^(1/1.5)
event = wrand
-
7/28/2019 14_RandomNumber
63/140
R
Package eha
-
7/28/2019 14_RandomNumber
64/140
Call:phreg(formula = Surv(enter, wrand, event) ~ drug)
Covariate W.mean Coef Exp(Coef) se(Coef) Wald p
drug 0.586 -0.731 0.481 0.113 0.000
log(scale) 4.663 105.903 0.050 0.000
log(shape) 0.402 1.495 0.047 0.000
Events 327
Total time at risk 44359
Max. log. likelihood -1886.9
LR test statistic 42.4
Degrees of freedom 1
Overall p-value 7.38224e-11
> enter=matrix(0,400,1)
> fit fit
> plot.phreg(fn="sur)
R
Package eha
-
7/28/2019 14_RandomNumber
65/140
R
-
7/28/2019 14_RandomNumber
66/140
Generating Numbers from
Specific Distributions Rejection Method
Fast
Good for Count Models Good when you cannot find F-1 , but have
f(x)
Generally Use Pairs of Random Numbers Just like playing the game Battleship
The Rejection Method is Like Playing the Game Battleship
-
7/28/2019 14_RandomNumber
67/140
The Rejection Method is Like Playing the Game Battleship
-
7/28/2019 14_RandomNumber
68/140
Rejection
Choose pairs of uniform random
numbers
xU betweenXmin andXmax yU between Ymin and Ymax
RejectxU ifyU > f(x) at xU
Rejection
-
7/28/2019 14_RandomNumber
69/140
Rejection
Xmin Xmax
Ymax
Ymin
f(x) Hit
MissMiss
Sample the area (two dimensions) containing theprobability distribution or density function uniformly.
-
7/28/2019 14_RandomNumber
70/140
Rejection
Simple version becomes inefficient if
the rejection area is large.
Large Dead Zone
g(x)
Bi i l C t
-
7/28/2019 14_RandomNumber
71/140
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
0
0.05
0.1
0.15
0.2
0.25239 In, 761 Rejected
Binomial Count:p=0.2
Trials=20
Matlab
Rejection
-
7/28/2019 14_RandomNumber
72/140
Rejection
Can be made more efficient by uniform
sampling over a smaller target area.
g(x)
Smaller
Dead Zone
The trick is to sample uniformly overthe smaller area.
Rejection
-
7/28/2019 14_RandomNumber
73/140
Rejection
Can be made more efficient by uniform
sampling over a smaller target area.
g(x)f(x)
First, define "dominating function" ( ),
and corresponding integral or Cumulative
Distribution ( ).( ) need not be normalized.
f x
F xF x
Smaller
Dead Zone
Rejection
-
7/28/2019 14_RandomNumber
74/140
Rejection
Can be made more efficient by uniform
sampling over a smaller target area.
g(x)f(x)
Smaller
Dead Zone
2
2
( ) , 0
( )2
( )2
ab
ab
Max
f x a bx x
bF x ax x
aF
b
ab
x
=
=
=
=
R j ti
-
7/28/2019 14_RandomNumber
75/140
Rejection
ChoosexU based on inverse transform of the
integrated dominance function (F(x)).
Choose a uniform random numberU1 in the range:
Calculatexby setting F(x)=U, and solving (the quadratic) forx.
g(x)f(x)
xU
2
0 12
aU
b
Rejection
-
7/28/2019 14_RandomNumber
76/140
Rejection
Evaluate f(x), choose a second uniform
random numberU2between 0 and f(x). Reject ifU2>f(x)
g(x)f(x)
xU
-
7/28/2019 14_RandomNumber
77/140
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
0
0.05
0.1
0.15
0.2
0.25315 In, 685 Rejected
-
7/28/2019 14_RandomNumber
78/140
Weibull Function?
( )
( )
( )( )
1
11
( )
( ) exp 2
( ) 1 exp
ln , 6.5, 1.8u
x xf x times
xF x
F x u
=
=
= = = =
-
7/28/2019 14_RandomNumber
79/140
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
0
0.05
0.1
0.15
0.2
0.25
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
-5 0 5 10 15 20 250
0.05
0.1
0.15
0.2
0.25
Count
Frequency
0
0.05
0.1
0.15
0.2
0.25574 In, 426 Rejected
Binomial Distribution
-
7/28/2019 14_RandomNumber
80/140
(Bernoulli Trials are the simplestexample of the rejection method.)
Probability Pr(X=1): p
>>proc iml; /* begin IML session */
-
7/28/2019 14_RandomNumber
81/140
r = j(10,1,.);
call randgen(r,'uniform');
b=r>0.5;
print b;quit;
b
1
0
1
1
0
0
0
00
1
SAS
-
7/28/2019 14_RandomNumber
82/140
> r=matrix(runif(10), 10,1)
> b=r cbind(r,b)
[,1] [,2]
[1,] 0.4919652 1
[2,] 0.5088624 0
[3,] 0.5955355 0
[4,] 0.5243394 0
[5,] 0.5923056 0[6,] 0.1610980 1
[7,] 0.9663659 0
[8,] 0.2548106 1
[9,] 0.4582953 1
[10,] 0.1170421 1>
R
But then you never have just one
-
7/28/2019 14_RandomNumber
83/140
Simulating Outcomes from a
Logistic Model
But then, you never have just onevalue of p for your Bernoulli Trials
Placebo Controlled Drug Trial 25% Success for Placebo
Odds Ratio of 2.0 for Treatment
Two different success probabilities,
based on logistic model
-
7/28/2019 14_RandomNumber
84/140
Logistic Model
( )
( )
( ) ( )( )
0
0
0
expPlacebo: 0.25 , 1.0986
1 exp
Drug (0,1): OR=2.0, ln(OR)=0.6931
exp 1.0986 0.6931*CDF
1 exp 1.0986 0.6931*
DrugSuccess
Drug
= =
+
+=+ +
proc iml; /* begin IML session */
j(400 1 )
-
7/28/2019 14_RandomNumber
85/140
u = j(400,1,.);
d = j(400,1,1)||(j(200,1,0) // j(200,1,1));
bta= {-1.0986 , 0.6931};
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));outcome=u
-
7/28/2019 14_RandomNumber
86/140
SAS
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 -1.2367 0.1693 53.3383
-
7/28/2019 14_RandomNumber
87/140
r=matrix(runif(400), 400,1)
drug=rbind(matrix(0,200,1),matrix(1,200,1))
d=cbind(matrix(1,400,1), drug)
parms=matrix(c(-1.0986 , 0.6931),2,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r drugtrial
-
7/28/2019 14_RandomNumber
88/140
R
> drugtrial< glm(outcome drug, family binomial(link logit ))
> summary(drugtrial)
Call:
glm(formula = outcome ~ drug, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.036 -1.036 -0.776 1.326 1.641
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0460 0.1612 -6.488 8.68e-11 ***
drug 0.7026 0.2158 3.255 0.00113 **
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 511.49 on 399 degrees of freedom
Residual deviance: 500.67 on 398 degrees of freedomAIC: 504.67
Number of Fisher Scoring iterations: 4
Normally Distributed Random Numbers
-
7/28/2019 14_RandomNumber
89/140
Normally Distributed Random Numbers
Inverse transform methods inefficient
for normal random numbers
Box-Muller Transform
z transformation of two random uniform
variates [X1,X2~U(0,1)] Random radius, random
1 2
1 1 2
2 1 2
Get two z variates from two uniform
random numbers, and :
cos( ) 2 ln( ) cos(2 )
sin( ) 2 ln( ) sin(2 )
X X
z r X X
z r X X
= =
= =
-
7/28/2019 14_RandomNumber
90/140
Normally Distributed Random Numbers
Box-Muller Transform X1, X2, specify a position within the unit circle Random angle, random radius
Would be more efficient if it did not make calls totrigonometric functions.
Marsaglia Method Places the Unit Circle within a square, -1 to +1,
and samples the square uniformly. Rejects draws that fall outside the circle.
But it avoids calls to trig functions.2 2
1 2
1 1 2 2
1
2ln( ) 2 ln( ),
s X X
s sz X z X
s s
= +
= =
G ti N b f
-
7/28/2019 14_RandomNumber
91/140
Generating Numbers from
Specific Distributions Normal, Using CLT (quick & dirty)
Sum several iterations ofu
Standardize Recall that Var(u)=1/12
12
16i
iX u
==
-
7/28/2019 14_RandomNumber
92/140
Correlated Multivariate Random Numbers
Simulating panel data, repeated
measures
Mixture distributions
-
7/28/2019 14_RandomNumber
93/140
Generating Multivariate Normal
Random Numbers
Desired Covariance Matrix
, is the Cholesky Decomposition of
Begin with independent standard normal RVs (0,1)
Correlated (Multivariate) Normal RVs:
N
+
V
V = R'R R V
Z
X = R'Z
:
Generating Multivariate Normal
-
7/28/2019 14_RandomNumber
94/140
Random Numbers
proc iml; /* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={53 36 12 47};
cvmat=rmat#(sigvec`*sigvec);
upr=half(cvmat);
print rmat;
print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');
pi= 4*atan(1);
print pi;
/* Lets be wasteful */
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
z1=z1*upr;
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;
var x1 x2 x3 x4;
run;
SAS
rmat
1 0 3 0 2 0 1
-
7/28/2019 14_RandomNumber
95/140
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
53 36 12 47
cvmat
2809 572.4 127.2 249.1
572.4 1296 129.6 338.4
127.2 129.6 144 169.2
249.1 338.4 169.2 2209
upr
53 10.8 2.4 4.7
0 34.341811 3.0190603 8.3757958
0 0 11.36333 11.672016
0 0 0 44.503035
SAS
The CORR Procedure
-
7/28/2019 14_RandomNumber
96/140
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 -0.86090 51.78291 -860.89502 -167.70178 157.51299
x2 1000 -0.21592 36.41244 -215.92386 -122.58068 120.05335
x3 1000 -0.06176 11.60953 -61.75755 -37.09589 43.83908
x4 1000 0.46483 46.63351 464.82762 -152.65527 143.41509
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.30338 0.20341 0.11397
-
7/28/2019 14_RandomNumber
97/140
Random Numbers
cmat cmat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.3 0.2 0.1
[2,] 0.3 1.0 0.3 0.2
[3,] 0.2 0.3 1.0 0.3
[4,] 0.1 0.2 0.3 1.0
> vv
[,1] [,2] [,3] [,4]
[1,] 2809.0 572.4 127.2 249.1
[2,] 572.4 1296.0 129.6 338.4
[3,] 127.2 129.6 144.0 169.2
[4,] 249.1 338.4 169.2 2209.0
> rr
-
7/28/2019 14_RandomNumber
98/140
R
[,1] [,2] [,3] [,4]
[1,] 53 10.80000 2.400000 4.700000
[2,] 0 34.34181 3.019060 8.375796
[3,] 0 0.00000 11.363330 11.672016
[4,] 0 0.00000 0.000000 44.503035
> cov(rvs)
[,1] [,2] [,3] [,4]
[1,] 2832.4200 561.2585 134.0656 533.7351
[2,] 561.2585 1235.7616 124.2373 382.5441[3,] 134.0656 124.2373 127.4132 160.2173
[4,] 533.7351 382.5441 160.2173 2205.5903
> cor(rvs)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2999969 0.2231676 0.2135426
[2,] 0.2999969 1.0000000 0.3130961 0.2317137
[3,] 0.2231676 0.3130961 1.0000000 0.3022317
[4,] 0.2135426 0.2317137 0.3022317 1.0000000
> sd(rvs)
[1] 53.22048 35.15340 11.28774 46.96371
-
7/28/2019 14_RandomNumber
99/140
Subject-specific Random Effects
We have an error term (eij) for
measurement j in subject i.
We also have a subject specific randomeffect (ki)
For the subject in the measurement:th th
ij ij i
i j
y x e k= + +
Recipe for Subject specific Random Effects
-
7/28/2019 14_RandomNumber
100/140
Recipe for Subject-specific Random Effects
Create subjects for study NAssign treatment, covariates
Give each subject a random effect
Drawn from, say, N(0,V) Generate predicted values based on
regression + random effects
Generate outcomes for each repeated
measure from specific distribution
Logistic Model
-
7/28/2019 14_RandomNumber
101/140
Logistic Model
( )( )
( )( )
0
0
0
i
expPlacebo: 0.25 , 1.0986
1 exp
Drug: OR=2.0, ln(OR)=0.6931
Time (0,1,2): OR 1.5, ln(OR)=0.4055
K N(0,1)
exp 1.0986 0.6931* 0.4055*CDF
1 exp 1.0986 0.6931* 0.4055*
iDrug Time KSuccess
Drug Time
= =
+
+ + +=
+ + +
:
( )iK+
proc iml; /* begin IML session */
u = j(600,1,.);d
12
1
btabta
id
-
7/28/2019 14_RandomNumber
102/140
u j(600,1,.);
d1=j(100,1,0)//j(100,1,1);
d1=d1//d1//d1;
id=j(200,1,0);
do i=1 to 200 by 1;
id[i,1]=i;
end;
id=id//id//id;
t=j(200,1,0)//j(200,1,1)//j(200,1,2);
k=j(200,1,.);
call randgen(k,'normal');
k=k//k//k;
bta= {-1.0986 , 0.6931,.4055,1};d = j(600,1,1)||d1||t||k;
call randgen(u,'uniform');
expit=exp(d*bta)/(1+exp(d*bta));
y=u
-
7/28/2019 14_RandomNumber
103/140
Random effects logistic regression Number of obs 600Group variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3
Wald chi2(2) = 11.07Log likelihood = -394.754 Prob > chi2 = 0.0040
------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------treat | .6023501 .2279107 2.64 0.008 .1556533 1.049047
t | .235297 .1123071 2.10 0.036 .015179 .4554149_cons | -.9334699 .2040373 -4.57 0.000 -1.333376 -.5335642
-------------+----------------------------------------------------------------/lnsig2u | -.1281394 .3971684 -.9065751 .6502964
-------------+----------------------------------------------------------------sigma_u | .9379396 .18626 .6355353 1.384236
rho | .2109869 .0661172 .1093476 .3680594------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 13.01 Prob >= chibar2 = 0.000
. di -394.754*(-2)789.508
Stata, SAS data
Finally got this to run in SAS. I had forgotten that SAS requires you to sort.
Stata does not require sorted data for their mixed models.
-
7/28/2019 14_RandomNumber
104/140
proc sort data=erand;
by id t;
run;
proc nlmixed data=erand qpoints=5 ;
parms b0=0 b1=-.7 b2=.6 sig=0 ;
theta2 = b0+b1*treat+b2*t+u;
prb= exp(theta2)/(1+exp(theta2));
model outcome ~ binary(prb);
random u ~normal(0,sig) subject=id ;
run;
SAS
Th NLMIXED P d
-
7/28/2019 14_RandomNumber
105/140
The NLMIXED Procedure
Fit Statistics
-2 Log Likelihood 789.5
AIC (smaller is better) 797.5AICC (smaller is better) 797.6
BIC (smaller is better) 810.7
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower Upper Gradient
b0 -0.9332 0.2039 199 -4.58
-
7/28/2019 14_RandomNumber
106/140
q
k1=matrix(runif(200), 200,1)
k2=matrix(runif(200), 200,1)
k=sqrt(-2*log(k1))*cos(2*pi*k2)
id=rbind(id,id,id)k=rbind(k,k,k)
drug =rbind(matrix(0,100,1), matrix(1,100,1))
drug=rbind(drug,drug,drug)
d=cbind(matrix(1,600,1),drug,k)
parms=matrix(c(-1.0986 , 0.6931,1),3,1)
expit=exp(d%*%parms)/(1+exp(d%*%parms))
outcome=r
-
7/28/2019 14_RandomNumber
107/140
g gGroup variable: id Number of groups = 200
Random effects u_i ~ Gaussian Obs per group: min = 3avg = 3.0max = 3
Wald chi2(1) = 14.55Log likelihood = -368.112 Prob > chi2 = 0.0001
------------------------------------------------------------------------------outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------drug | .8331252 .2183881 3.81 0.000 .4050924 1.261158_cons | -1.240213 .1708219 -7.26 0.000 -1.575018 -.9054085
-------------+----------------------------------------------------------------/lnsig2u | -.6325958 .5624138 -1.734907 .469715
-------------+----------------------------------------------------------------sigma_u | .7288423 .2049555 .4200198 1.264729
rho | .1390212 .0673177 .050895 .3271436------------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 5.08 Prob >= chibar2 = 0.012
R
Got this to run in R, using a mixed effects package called Zelig.
z.out1
-
7/28/2019 14_RandomNumber
108/140
g g g g
Delia Bailey and Ferdinand Alimadhi. 2007. "logit.mixed: Mixed effects logistic
model" in Kosuke Imai, Gary King, and Olivia Lau, "Zelig: Everyone's Statistical
Software," http://gking.harvard.edu/zelig
summary(z.out1)
Generalized linear mixed model fit by the Laplace approximation
Formula: outcome ~ drug + tag(1 | id)
AIC BIC logLik deviance
743.4 756.6 -368.7 737.4
Random effects:
Groups Name Variance Std.Dev.id (Intercept) 0.39486 0.62838
Number of obs: 600, groups: id, 200
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2172 0.1511 -8.054 8.02e-16 ***
drug 0.8174 0.2023 4.041 5.32e-05 ***
---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Correlation of Fixed Effects:
(Intr)
drug -0.747
R
General Approach to Correlated
http://gking.harvard.edu/zelighttp://gking.harvard.edu/zelig -
7/28/2019 14_RandomNumber
109/140
General Approach to CorrelatedMultivariate Random Numbers
Copulas allow us to draw correlated random
numbers from different distributions
Random effects in Mixture Models They use CDF probabilities of correlated
variables on the inside to map tocorrelated uniform random numbers on
the margins Those correlated uniform RVs may be used
to marry vastly different distributions.
Maintain Marginal Distributions
Generating Multivariate Random
-
7/28/2019 14_RandomNumber
110/140
Generating Multivariate Random
Numbers
From SAS documentation, a Gaussian Copula
Independent Normal (N(0,1) ) random variables are
generated
These variables are transformed to a correlated set ofz-scores by using the Cholesky Decomposition of the
covariance matrix.
These correlated normal RVs are transformed to a
uniform by using (z).
F-1
() is used to compute the final sample value
Generating Multivariate Random
-
7/28/2019 14_RandomNumber
111/140
Generating Multivariate Random
Numbers
proc iml;/* begin IML session */
rmat={1 .3 .2 .1, .3 1 .3 .2, .2 .3 1 .3 , .1 .2 .3 1};
sigvec={1 1 1 1};
cvmat=rmat#(sigvec`*sigvec);
/* # is element-wise multiplication */
upr=half(cvmat);
print rmat;print sigvec;
print cvmat;
print upr;
r1 = j(1000,4,.);
r2 = j(1000,4,.);
call randgen(r1,'uniform');
call randgen(r2,'uniform');pi= 4*atan(1);
print pi;
z1=sqrt(-2*log(r1))#cos(2*pi*r2);
/* Note I could have gotten another z here */
z1=z1*upr;
z1=cdf('Normal',z1);
z1=gaminv(z1,3.0);
/* Standardized gamma parameter, also the
mean */
varnames={"x1","x2","x3","x4"};
create nrand from z1 [colname=varnames];
append from z1;
quit;
proc corr data=work.nrand pearson;var x1 x2 x3 x4;
run;
SAS
-
7/28/2019 14_RandomNumber
112/140
rmat
1 0.3 0.2 0.1
0.3 1 0.3 0.2
0.2 0.3 1 0.3
0.1 0.2 0.3 1
sigvec
1 1 1 1
cvmat
1 0.3 0.2 0.1
0.3 1 0.3 0.20.2 0.3 1 0.3
0.1 0.2 0.3 1
SAS
The CORR Procedure
-
7/28/2019 14_RandomNumber
113/140
4 Variables: x1 x2 x3 x4
Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
x1 1000 2.96320 1.73566 2963 0.11528 12.19072
x2 1000 3.01249 1.68236 3012 0.14039 10.20117
x3 1000 3.00336 1.68803 3003 0.34496 13.72023
x4 1000 3.08106 1.79858 3081 0.11148 13.25409
Pearson Correlation Coefficients, N = 1000
Prob > |r| under H0: Rho=0
x1 x2 x3 x4
x1 1.00000 0.25874 0.19005 0.10052
-
7/28/2019 14_RandomNumber
114/140
Generating Multivariate Random
Numberscmat U = copularnd('Gaussian',.4,10)>> X = norminv(U,0,1);
-
7/28/2019 14_RandomNumber
115/140
U =
0.8017 0.9388
0.3650 0.22500.8104 0.6253
0.3467 0.0988
0.6067 0.6561
0.4743 0.6723
0.6273 0.7427
0.9905 0.8249
0.4427 0.6925
0.3443 0.2711
>> U = copularnd('Gaussian',.4,10000);
>> corr(U)
ans =
1.0000 0.37650.3765 1.0000
o (U,0, );
>> corr(X)
ans =
1.0000 0.3901
0.3901 1.0000
>> Xg = gaminv(U,2,3);
>> corr(Xg)
ans =
1.0000 0.3721
0.3721 1.0000
>>
Matlab(a little more clear)
-
7/28/2019 14_RandomNumber
116/140
-
7/28/2019 14_RandomNumber
117/140
O S
-
7/28/2019 14_RandomNumber
118/140
Old Slides
Grabbing a Seed from theS t Cl k (St t )
-
7/28/2019 14_RandomNumber
119/140
program define seedset
local ct =c(current_time)
local s1=substr("`ct'",7,2)
local s2=substr("`ct'",4,2)
local s3=substr("`ct'",2,1)
global newseed=real("`s1'" +"`s2'" +"`s3'")
di $newseed
set seed $newseed
end
System Clock (Stata)
LCG is default for Stata
-
7/28/2019 14_RandomNumber
120/140
. set obs 100obs was 0, now 100
. gen x0=ceil(uniform()*100)
. gen m=ceil(uniform()*10)
. gen x1=mod(x0,m)
. list in 1/10
+------------+| x0 m x1 ||------------|
1. | 70 2 0 |2. | 62 7 6 |3. | 92 7 1 |4. | 53 1 0 |5. | 37 3 1 |
|------------|
6. | 78 1 0 |7. | 47 2 1 |8. | 91 2 1 |9. | 98 1 0 |10. | 71 9 8 |
+------------+
Testing Randomness (Stata)
-
7/28/2019 14_RandomNumber
121/140
Correlogram ofXi versusXi+k for serial
correlation . gen tv=_n. tsset tv
time variable: tv, 1 to 20000delta: 1 unit
. corrgram x, lags(40)
-1 0 1 -1 0
1LAG AC PAC Q Prob>Q [Autocorrelation] [PartialAutocor]-------------------------------------------------------------------------------1 0.0026 0.0026 .13551 0.7128 | |
2 -0.0011 -0.0011 .15995 0.9231 | |
3 -0.0004 -0.0004 .16301 0.9833 | |
4 -0.0131 -0.0131 3.5987 0.4630 | |
5 -0.0008 -0.0007 3.6115 0.6066 | |
6 0.0119 0.0118 6.4238 0.3774 | |
7 -0.0060 -0.0061 7.1533 0.4131 | |
8 0.0004 0.0003 7.1571 0.5198 | |
9 0.0057 0.0057 7.815 0.5529 | |
d t
(Stata)
-
7/28/2019 14_RandomNumber
122/140
. seedset23491
. set obs 2000obs was 0, now 2000
. gen P=uniform()
. gen enum=-ln(P)/.04
. ci
Variable | Obs Mean Std. Err. [95% Conf. Interval]-------------+---------------------------------------------------------------
P | 2000 .5030619 .0065298 .4902559 .5158678enum | 2000 24.79822 .553316 23.71309 25.88336
. set obs 200obs was 0, now 200 (Stata)
-
7/28/2019 14_RandomNumber
123/140
. gen P=uniform()
. gen tte=(-ln(P)/0.1)^1.5
. gen fail=1
. replace fail=0 if tte>200
. replace tte=200 if tte>200
. stset tte, fail(fail)
failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure
-------------------------------------------------------------------200 total obs.0 exclusions
-------------------------------------------------------------------200 obs. remaining, representing188 failures in single record/single failure data
9019.163 total analysis time at risk, at risk from t = 0earliest observed entry t = 0
last observed exit t = 200
(Stata)
-
7/28/2019 14_RandomNumber
124/140
0.0
0
0.2
5
0
.50
0.7
5
1.0
0
0 50 100 150 200
analysis time
Kaplan-Meier survival estimate
. streg, d(w) nohr(Stata)
-
7/28/2019 14_RandomNumber
125/140
failure _d: failanalysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 200 Number of obs = 200No. of failures = 188Time at risk = 9019.163067
LR chi2(0) = 0.00Log likelihood = -403.39593 Prob > chi2 = .
--------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------_cons | -2.245 0.167 -13.47 0.000 -2.572 -1.918
-------------+------------------------------------------------------delta | 0.625 0.036 0.558 0.701
--------------------------------------------------------------------
.
. gen P=uniform()
. gen tte=(-ln(P)/(exp(log(0.1)+log(0.5)*drug)))^1.5(Stata)
-
7/28/2019 14_RandomNumber
126/140
. gen fail=1
. replace fail=0 if tte>200
(39 real changes made)
. replace tte=200 if tte>200(39 real changes made)
. stset tte, fail(fail)
failure event: fail != 0 & fail < .obs. time interval: (0, tte]exit on or before: failure
------------------------------------------------------------------------------400 total obs.0 exclusions
------------------------------------------------------------------------------
400 obs. remaining, representing361 failures in single record/single failure data
25170.04 total analysis time at risk, at risk from t = 0earliest observed entry t = 0
last observed exit t = 200
.
. list drug P tte fail in 1/15
(Stata)
-
7/28/2019 14_RandomNumber
127/140
+-----------------------------------+| drug P tte fail |
|-----------------------------------|1. | 1 .842721 6.331312 1 |2. | 0 .3839878 29.6119 1 |3. | 1 .3483792 96.8484 1 |4. | 0 .6035132 11.34804 1 |5. | 1 .8460417 6.114305 1 |
|-----------------------------------|6. | 1 .4935982 53.06192 1 |
7. | 1 .5173908 47.84433 1 |8. | 1 .385052 83.39208 1 |9. | 0 .8726683 1.589515 1 |10. | 0 .0356283 192.5611 1 |
|-----------------------------------|11. | 0 .8018837 3.280757 1 |12. | 0 .6059877 11.21039 1 |
13. | 1 .7919235 10.07838 1 |14. | 0 .1920578 67.02081 1 |15. | 0 .0819428 125.1301 1 |
+-----------------------------------+
0Kaplan-Meier survival estimates
(Stata)
-
7/28/2019 14_RandomNumber
128/140
0.0
0
0.2
5
0.5
0
0.7
5
1.0
0
0 50 100 150 200analysis time
drug = 0 drug = 1
. streg drug, d(w) nohr
f il d f il
(Stata)
-
7/28/2019 14_RandomNumber
129/140
failure _d: failanalysis time _t: tte
Weibull regression -- log relative-hazard form
No. of subjects = 400 Number of obs = 400No. of failures = 361Time at risk = 25170.03819
LR chi2(1) = 53.70Log likelihood = -757.69677 Prob > chi2 = 0.0000
--------------------------------------------------------------------_t | Coef. SE z P>|z| [95% CI]
-------------+------------------------------------------------------drug | -0.788 0.107 -7.34 0.000 -0.998 -0.577_cons | -2.458 0.143 -17.19 0.000 -2.738 -2.177
-------------+------------------------------------------------------
delta | 0.706 0.031 0.648 0.768--------------------------------------------------------------------
.
Generating Multivariate Normal
(Stata)
-
7/28/2019 14_RandomNumber
130/140
Ge e at g u t a ate o a
Random Numbers
In Stata , gennorm (webseek to download):
Typing
. gennorm a b c, corr(.2 .3 .4)
creates a, b, and c with value draw from a N(0,S) distribution where
+- -+
| 1 |
S = | .2 1 |
| .3 .4 1 |
+- -+
That is, corr(a,b)=.2, corr(a,c)=.3, and corr(b,c)=.4
CONTINUED NEXT PAGE
Generating Multivariate Normal
(Stata)
-
7/28/2019 14_RandomNumber
131/140
g
Random Numbers
In Stata:
Example-------
. set obs 10000
obs was 0, now 10000
. set seed 6819
. gennorm a b c, corr(.2 .3 .4)
. summarize a b c
Variable | Obs Mean Std. Dev. Min Max
-------------+----------------------------------------------------------------------------
a | 10000 -.0105333 1.005723 -3.694448 3.775433
b | 10000 -.0042212 1.000254 -3.695302 3.648826
c | 10000 -.0069625 .9989002 -3.996779 3.606923
. corr a b c(obs=10000)
| a b c
-------------+---------------------------------
a | 1.0000
b | 0.2137 1.0000
c | 0.3035 0.3952 1.0000
Generating Multivariate Normal(Stata)
-
7/28/2019 14_RandomNumber
132/140
g
Random Numbers
In Stata, drawnorm:
. clear
. matrix C=(1, 0.2, 0.3 \ 0.2, 1, 0.4 \ 0.3, 0.4, 1)
. drawnorm a b c, n(10000) corr(C)
(obs 10000)
. summarize a b c
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
a | 10000 -.0176275 .9920181 -3.701594 3.7838
b | 10000 .0009005 1.003002 -3.709259 3.518793
c | 10000 -.0149926 .9925292 -3.716346 4.009713
. corr a b c
(obs=10000)
| a b c
-------------+---------------------------
a | 1.0000
b | 0.1937 1.0000
c | 0.3051 0.4056 1.0000
Simulating Weibull Regression Data, with
Ti D d i D Eff t
-
7/28/2019 14_RandomNumber
133/140
( )Survival Time: ( ) exp , exp( )
-2.30 0.3* - 0.08* *
Inverse Prob Transform: ?????????
How do you solve for ? (Not all answers are in the book.)
S P t
x drug drug t
t
= = =
= +
x
Time-Dependency in Drug Effect
Remember Newtons Method?
-
7/28/2019 14_RandomNumber
134/140
( )*
01 0
0
( ) exp exp(-2.30 0.3* - 0.004* * )( ) 0
( ) ( )( )
( )
( )
f t drug drug t t Pf t
f t d f tf t
d
f tt t
f t
= + =
+
=
t0 t1
clearset obs 400gen drug=_n>200
(Stata)
-
7/28/2019 14_RandomNumber
135/140
gen double P=uniform()gen double t=1
gen double tpd=t+.0001gen double f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pgen double fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pgen double slope=(fp-f)/0.0001
forvalues i=1/50 {
qui replace f=exp(-exp(-2.30+0.3*drug-0.004*drug*t)*(t^0.67))-Pqui replace fp=exp(-exp(-2.30+0.3*drug-0.004*drug*tpd)*(tpd^0.67))-Pqui replace slope=(fp-f)/0.0001
qui replace t=t-f/slopequi replace tpd=t+.0001
}
MatlabMatlab
-
7/28/2019 14_RandomNumber
136/140
>> drug=[zeros(1000,1);ones(1000,1)];
>> P=rand(2000,1);
>> cdf0=exp(-1.0986+0.6931*drug)./(1+exp(-1.0986+0.6931*drug));
>> outcome=P> b = glmfit(drug,outcome,'binomial')
b =
-1.0616
0.6562
00
Kaplan-Meier survival estimates
(Stata)
-
7/28/2019 14_RandomNumber
137/140
0
.00
0.2
5
0.5
0
0.7
5
1.0
0 50 100 150 200analysis time
drug = 0 drug = 1
. gen P=uniform()
. gen cdf0=exp(-1.0986+0.6931*drug)/(1+exp(-
(Stata)
-
7/28/2019 14_RandomNumber
138/140
1.0986+0.6931*drug))
. list in 1/10
+----------------------------+| drug P cdf0 ||----------------------------|
1. | 0 .2865897 .2500023 |
2. | 0 .3788754 .2500023 |3. | 1 .3597057 .3999916 |4. | 1 .7182508 .3999916 |5. | 1 .4315197 .3999916 |
|----------------------------|6. | 1 .2963237 .3999916 |
7. | 1 .7961193 .3999916 |8. | 0 .056983 .2500023 |9. | 0 .4622037 .2500023 |
10. | 0 .5336403 .2500023 |+----------------------------+
. gen outcome=P
-
7/28/2019 14_RandomNumber
139/140
. list in 1/10
+--------------------------------------+| drug P cdf0 outcome ||--------------------------------------|
1. | 0 .2865897 .2500023 0 |2. | 0 .3788754 .2500023 0 |3. | 1 .3597057 .3999916 1 |
4. | 1 .7182508 .3999916 0 |5. | 1 .4315197 .3999916 0 |
|--------------------------------------|6. | 1 .2963237 .3999916 1 |7. | 1 .7961193 .3999916 0 |8. | 0 .056983 .2500023 1 |9. | 0 .4622037 .2500023 0 |
10. | 0 .5336403 .2500023 0 |+--------------------------------------+
gen outcome=P
-
7/28/2019 14_RandomNumber
140/140
. gen outcome=P chi2 = 0.0000
Log likelihood = -1245.3138 Pseudo R2 = 0.0230
--------------------------------------------------------------------outcome | OR SE z P>|z| [95% CI]-------------+------------------------------------------------------
drug | 2.084 0.202 7.57 0.000 1.723 2.519--------------------------------------------------------------------
_cons | -1.077 0.073 -14.83 0.000 -1.220 -0.935--------------------------------------------------------------------
.