How to Speed Up Your R Code With Virtually No Effort

23
How to Speed Up Your R Code With Virtually No Effort Alec Stephenson

description

How to Speed Up Your R Code With Virtually No Effort. Alec Stephenson. What I am Talking About. Basic Principals system.time(), benchmark() in rbenchmark Profiling R Code Rprof() Parallel Programming mclapply() parLapply() in parallel Using R With Amazon EC2. - PowerPoint PPT Presentation

Transcript of How to Speed Up Your R Code With Virtually No Effort

Page 1: How to Speed Up Your R Code With Virtually No Effort

How to Speed Up Your R Code With Virtually No Effort

Alec Stephenson

Page 2: How to Speed Up Your R Code With Virtually No Effort

What I am Talking About

• Basic Principals system.time(), benchmark() in rbenchmark

• Profiling R Code Rprof()

• Parallel Programming mclapply() parLapply() in parallel

• Using R With Amazon EC2

Page 3: How to Speed Up Your R Code With Virtually No Effort

What I am Not Talking About

• Writing complied code (e.g. C/C++)• Parallelizing compiled code (e.g. OpenMP)• Byte code compiler for R• Is code optimization worth your time?• Is code optimization worth turning your code

into an unreadable mess?

Page 4: How to Speed Up Your R Code With Virtually No Effort

Basic Principals

• Vectorize rather than loop• Allocate memory up front• Get to the compiled code faster

.C .Fortran .External .Internal .Call .Primitive

• R can be unpredictable

Page 5: How to Speed Up Your R Code With Virtually No Effort

Vectorize rather than loop

x <- rnorm(1e06); y <- rnorm(1e06)

0.01s, Good:z <- x + y

6.23s, Bad:z <- numeric(1e06)for(i in 1:1e06) z[i] <- x[i] + y[i]

Page 6: How to Speed Up Your R Code With Virtually No Effort

Allocate memory up front

x <- rnorm(1e06); y <- rnorm(1e06)

0.01s, Good:z <- x + y

One Hour, Terrible:z <- NULLfor(i in 1:1e06) z[i] <- x[i] + y[i]

Page 7: How to Speed Up Your R Code With Virtually No Effort

Get to the compiled code faster

Page 8: How to Speed Up Your R Code With Virtually No Effort

Get to the compiled code faster

Page 9: How to Speed Up Your R Code With Virtually No Effort

Get to the compiled code faster

Could you:

• Use pmax() rather than ifelse()• Use tabulate() rather than table()• Use lapply(split()) rather than tapply()• etc...

Page 10: How to Speed Up Your R Code With Virtually No Effort

Get to the compiled code faster

x <- rnorm(1e07); y <- rnorm(1e07)

0.42s, Good:pmax(x, y)

3.09s, Bad:ifelse(x < y, y, x)

Page 11: How to Speed Up Your R Code With Virtually No Effort

R can be unpredictablex <- rnorm(1e08); y <- rnorm(50); n <- 100000

0.14s, 0.11s, Good:sum(x)/length(x), for(i in 1:n) sum(y)/length(y)

0.28s, 0.98s, Not So Good:mean(x), for(i in 1:n) mean(y)

Page 12: How to Speed Up Your R Code With Virtually No Effort

R can be unpredictablex <- rnorm(1e08); y <- rnorm(50); n <- 100000

0.28s, 0.98s, Not So Good:mean(x), for(i in 1:n) mean(y)

0.28s, 0.26s, Improves In Loop:mean.default(x), for(i in 1:n) mean.default(y)

0.28s, 0.06s, Improves A Lot In Loop:.Internal(mean(x)), for(i in 1:n) .Internal(mean(y))

Page 13: How to Speed Up Your R Code With Virtually No Effort

R can be unpredictablex <- rnorm(50); y <- rnorm(50); n <- 100000

0.06s:for(i in 1:n) x + y

0.12s, Seriously!:for(i in 1:n) (((((x + y)))))

Page 14: How to Speed Up Your R Code With Virtually No Effort

Profiling R Code

• Very simple do to and always useful• Often surprising (R can be unpredictable)• Indicates where to concentrate your

optimization efforts• Large potential gains for virtually no effort• A small piece of code often takes up virtually

all the running time

Page 15: How to Speed Up Your R Code With Virtually No Effort

Profiling R Code

Rprof()## run some code here e.g. BadIdea(100000)Rprof(NULL)summaryRprof()

Page 16: How to Speed Up Your R Code With Virtually No Effort

This function is stupid

BadIdea <- function(n) { x <- seq(-1, 1, length=n) y <- numeric(n) for (i in 1:n) y[i] <-((det(as.matrix(x[i]))*2)^1.1)/n y}

Page 17: How to Speed Up Your R Code With Virtually No Effort

Profiling R Code$by.self self.time self.pct total.time total.pctas.matrix 1.14 20.00 1.96 34.39determinant 0.96 16.84 3.78 66.32determinant.matrix 0.76 13.33 0.86 15.09det 0.72 12.63 5.16 90.53$ 0.66 11.58 0.66 11.58array 0.62 10.88 0.70 12.28BadIdea 0.44 7.72 5.70 100.00as.matrix.default 0.12 2.11 0.82 14.39^ 0.10 1.75 0.10 1.75as.vector 0.08 1.40 0.08 1.40ncol 0.06 1.05 0.06 1.05nrow 0.04 0.70 0.04 0.70

Page 18: How to Speed Up Your R Code With Virtually No Effort

Profiling R Code

$by.total total.time total.pct self.time self.pctBadIdea 5.70 100.00 0.44 7.72det 5.16 90.53 0.72 12.63determinant 3.78 66.32 0.96 16.84as.matrix 1.96 34.39 1.14 20.00determinant.matrix 0.86 15.09 0.76 13.33as.matrix.default 0.82 14.39 0.12 2.11array 0.70 12.28 0.62 10.88$ 0.66 11.58 0.66 11.58^ 0.10 1.75 0.10 1.75as.vector 0.08 1.40 0.08 1.40ncol 0.06 1.05 0.06 1.05nrow 0.04 0.70 0.04 0.70

Page 19: How to Speed Up Your R Code With Virtually No Effort

Parallel Programming

Multicore: single machine, not windowsSnow: multiple machines, any OS

Parallel: snow plus multicore mclapply(x, FUN, ..., mc.cores) parLapply(cl, x, FUN)Foreach: front-end for easier coding

Page 20: How to Speed Up Your R Code With Virtually No Effort

Parallel Programming

fn <- function(a) a * 1:10lapply(c(22,56,70,43), fn)

mclapply(c(22,56,70,43), fn, mc.cores = 4)

cl <- makeCluster(4)parLapply(cl, c(22,56,70,43), fn)stopCluster(cl)

Page 21: How to Speed Up Your R Code With Virtually No Effort

Parallel Programming

Parallel random forest function (multicore type):

parRandomForest <- function(xx, ..., ntree = 500, mc = 1) { rfwrap <- function(ntree, xx, ...) randomForest(x=xx,

ntree=ntree, ...) rfpar <- mclapply(rep(ceiling(ntree/mc), mc), rfwrap,

xx=xx, ..., mc.cores = mc) do.call(combine, rfpar) # not very fast}

Page 22: How to Speed Up Your R Code With Virtually No Effort

Amazon EC2• Like Hiring One or More Powerful Computers By

The Hour• Start Amazon EC2 Instance(s)• AMIs available with R and R Studio Server pre-

installed • $1.8 per hour (or part hour) for quadruple extra

large high memory instance with linux 64-bit OS:68.4 GB of memory26 EC2 Compute Units (8 virtual cores)

Page 23: How to Speed Up Your R Code With Virtually No Effort

Important Talk Clarification:

When I answered ‘it’s important to turn it off’ I should have said ‘it’s important to turn off the instance explicitly by terminating (or stopping) it’. If you only turn off your computer, the EC2 instance will continue to run and you will continue to be charged by the hour.

Thanks for attending, Alec.