Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation &...

21
Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7

Transcript of Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation &...

Page 1: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Computing for Research ISpring 2013

Presented by: Liqiong Fan

R: Random number generation & Simulations

April 7

Page 2: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Outline

How to sample from common distribution:

• Randomization code generation

• Simulation 1 (explore the relationship between power and effect size)

• Simulation 2 (explore the relationship between power and sample size)

Uniform distribution Binomial distribution Normal distribution

Examples:

Pre-specified vector

Page 3: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Syntax for random number generation in R

e.g., runif() Uniform rbinom() Binomial rnorm() normal …

1. Sample from a known distribution: “r” + name of distribution:

2. Sample from a vector: sample()

e.g., extract two numbers from {1,2,3,4,5,6} with replacement

Page 4: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Uniform distribution (continuous)

PDF Mean:

Variance:

Page 5: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Uniform distribution

runif(n, min=0, max=1)

See R code …

Page 6: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Uniform distribution

Use UNIFORM distribution to generate BERNOULLI distribution

See R code …

Basic idea:0

.00

.20

.40

.60

.81

.0

aa

1

0

1Uniform

distributionBernoulli

distribution

Page 7: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Binomial distribution

rbinom(n, size, prob)

See R code …

e.g. generate 10 Binomial random number with Binom(100, 0.6)

n = 10size = 100prob = 0.6

rbinom(10, 100, 0.6)

e.g. generate 100 Bernoulli random number with p=0.6

n = 100size = 1prob = 0.6

rbinom(100, 1, 0.6)

Page 8: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Normal distribution

See R code …

rnorm(n, mean, sd) #random numberdnorm(x, mean, sd) #densitypnorm(q, mean, sd) #P(X<=q) cdf qnorm(p, mean, sd) #quantile

Page 9: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Normal distribution

dnorm(x, mean, sd) #density

e.g. plot a standard normal curve

pnorm(q, mean, sd) #probability P(X<=x)

e.g. calculate the p-value for a one sides test with standardized test statistic

H0: X<=0H1: X>0

Reject H0 if “Z” is very large

If from the one-sided test, we got the Z value = 3.0, what’s the p-value?

P-value = P(Z>=z) = 1 - P(Z<=z)

1 - pnorm(3, 0, 1)

Page 10: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Normal distribution

See R code …

qnorm(p, mean, sd) #quantile

See R code …

rnorm(n, mean, sd) #random number

Page 11: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Another useful command for sampling from a vector – “sample()”

e.g. randomly choose two number from {2,4,6,8,10} with/without replacement

2

4

6

8

10

sample(x, size, replace = FALSE, prob = NULL)

sample(c(2,4,6,8,10), 2, replace = F)

Page 12: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Another useful command for sampling from a vector – “sample()”

e.g. A question from our THEORY I CLASS:

1

14

2

8

7

“Draw a histogram of all possible average of 6 numbers selected from {1,2,7,8,14,20} with replacement”

20

Answer:A quick way to solve this question is to do a simulation:That is: we assume we repeat selection of 6 balls with replacement from left urn for many many times, and plot their averages. The R code is looked like:

a <- NULLfor (i in 1:10000){a[i] <- mean(sample(c(1,2,7,8,14,20),6, replace = T))}hist(a)

Page 13: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

[R] Another useful command for sampling from a vector – “sample()”

e.g. Generate 1000 Bernoulli random number with P = 0.6

sample(x, size, replace = T, prob =)

Answer:Let x = (0, 1),Let size = 1,Let replace = T/F,Let prob = (0.4, 0.6).

Repeat 1000 times0 1

Page 14: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 1Generate randomization sequence

Goal: randomize 100 patients to TRT A and B

runif(), rbinom(), sample().

1. Simple randomization (like flipping a coin) – Bernoulli distribution

0 0 1 0 0 1 0 1 0 0 …. 1 0 1 0

See R code …

Page 15: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 1Generate randomization sequence

Goal: randomize 100 patients to TRT A and B

2. Random allocation rule (RAL)

Unlike simple randomization, number of allocation for each treatment need to be fixed in advance

Again, think about the urn model!

50

50Draw the balls without replacement

RAL can only guarantee treatment allocation is balanced toward the end.

Page 16: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 1Generate randomization sequence

Goal: randomize 100 patients to TRT A and B3. Permuted block randomization

AABB BABA BBAA BABA BAAB … BBAABlock size = 4

sample()

Think about multi urns model! 50

50

25

Page 17: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 2Investigate the relationship between effect size and power – drug increases SBP

Y: Systolic Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)

Linear model: Y = b0 + b1X + e

b1 represents the effect size of new drug relative to the control. For instance, assuming that the SBP in control population is distributed as N(120, 49), what is the power if the new drug can truly increase SBP by 0, 1, 2, 3, 4 and 5 units in a study with a sample size of 100 (50 in drug, 50 in placebo)

Important information: Y (placebo) ~ N(120, 49)b0 = 120e ~ N(0, 49)

When X=0, E(Y) = b0, effect of control;When X=1, E(Y) = b0 + b1, effect of drug;Between group difference is represented by b1

Page 18: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 2Investigate the relationship between effect size and power - drug increases BP

Y: Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)

Linear model: Y = b0 + b1X + e

Important information: Y (placebo) ~ N(120, 49)b0 = 120e ~ N(0, 49)

We try to answer:What’s the power given b1 (the real effect size of the treatment) is 0, 1, 2, 3, 4 or 5

If we run simulation for N times, power means the probability that b1 (treatment effect) shows significant (P<0.05) from linear regression tests out of N simulations

Definition of Power:Probability of rejecting NULL when ALTERNATIVE IS TRUE (i.e., b1 = some non-zero value).

Page 19: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 2Investigate the relationship between effect size and power - drug increases BP

Linear model: Y = b0 + b1X + e

Simulation steps (E.g. sample size = 50/ per group, 1000 simulations):1. Generate X according to study design (50 “1”s and 50 “0”s);2. Generate 100 “e” from N(0, 49);3. Given b0 and b1, generate Y using Y = b0 + b1X + e;4. Use 100 pairs of (Y, X) to refit a new linear model, and get the new b0 and b1 and

their p-value;5. Repeat these steps for 1000 times.6. If type I error is 0.05, for a two-sided test

# p value for b1 0.05 in 1000 simulationsPower

1000

Y: Systolic Blood Pressure (response)X: intervention (1 = drug vs. 0 = control)e: random error = var(Y)

Page 20: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Example 3Investigate the relationship between sample size and power

We try to answer:What’s the power given b1 = 2 and sample size = 25, 50, 75, 100, 125, and 150 per group

Linear model: Y = b0 + b1X + e

Page 21: Computing for Research I Spring 2013 Presented by: Liqiong Fan R: Random number generation & Simulations April 7.

Some recommendation

1. Try not “fix” the parameters in your simulation

2. Always test your code with small number of iterations before you actually start your simulation

3. Use append / write.table (… append = T …) to save the result or simulated data

4. Print the number of interations / senarios Code:

print(c)flush.console()