Plyr, one data analytic strategy

plyrOne data-analytic strategy

Hadley WickhamRice University

Friday, 29 May 2009

1. Motivation: Deseasonlising ozone measurements

2. Outline of strategy: split-apply-combine

3. Specifics: input vs. output

4. Fiddly details

5. Thoughts on data analysis

Friday, 29 May 2009

−110 −85 −60

24 x 24 x 72 = 41,472

Friday, 29 May 2009

−110 −85 −60

24 x 24 x 72 = 41,472

Friday, 29 May 2009

−1.0

−0.5

−1.0 −0.5 0.0 0.5 1.0

Friday, 29 May 2009

−1.0

−0.5

−1.0 −0.5 0.0 0.5 1.00.0

0.0 0.2 0.4 0.6 0.8 1.0

Friday, 29 May 2009

0.0 0.2 0.4 0.6 0.8 1.0

Friday, 29 May 2009

timeresid

0.0 0.2 0.4 0.6 0.8 1.0

Friday, 29 May 2009

timeresid

0.0 0.2 0.4 0.6 0.8 1.0

Friday, 29 May 2009

How can we do this for all 24 x 24 locations?

(assume ozone levels stored in a 24 x 24 x 72 array)

Friday, 29 May 2009

models <- as.list(rep(NA, 24 * 24))

dim(models) <- c(24, 24)

deseas <- array(NA, c(24, 24, 72))

dimnames(deseas) <- dimnames(ozone)

for (i in seq_len(24)) {

for(j in seq_len(24)) {

mod <- deseasf(ozone[i, j, ])

models[[i, j]] <- mod

deseas[i, j, ] <- resid(mod)

With a for loop

Friday, 29 May 2009

models <- as.list(rep(NA, 24 * 24))

dim(models) <- c(24, 24)

deseas <- array(NA, c(24, 24, 72))

for (i in seq_len(24)) {

for(j in seq_len(24)) {

mod <- deseasf(ozone[i, j, ])

models[[i, j]] <- mod

deseas[i, j, ] <- resid(mod)

With a for loop

Friday, 29 May 2009

models <- apply(ozone, 1:2, deseasf)

resids <- unlist(lapply(models, resid))

dim(resids) <- c(72, 24, 24)

deseas <- aperm(resids, c(2, 3, 1))

With apply

Friday, 29 May 2009

models <- apply(ozone, 1:2, deseasf)

resids <- unlist(lapply(models, resid))

dim(resids) <- c(72, 24, 24)

deseas <- aperm(resids, c(2, 3, 1))

With apply

Friday, 29 May 2009

models <- aaply(ozone, 1:2, deseasf)

deseas <- aaply(models, 1:2, resid)

With plyr

Succinct, but you need to know what aaply does

cf. onomatopoeia, schadenfreude, soliloquyFriday, 29 May 2009

−110 −85 −60

avg250260270280290300310

Friday, 29 May 2009

−110 −85 −60

Friday, 29 May 2009

Many problems involve splitting up a large data structure, operating on each piece and joining the results back together:

split-apply-combine

Friday, 29 May 2009

How you split up depends on the type of input: arrays, data frames, lists

How you combine depends on the type of output: arrays, data frames, lists, nothing

Friday, 29 May 2009

array data frame list nothing

data frame

aaply adply alply a_ply

daply ddply dlply d_ply

laply ldply llply l_ply

Friday, 29 May 2009

array data frame list nothing

data frame

apply adply alply a_ply

daply aggregate by d_ply

sapply ldply lapply l_ply

Friday, 29 May 2009

Split: array, data frame, list

Friday, 29 May 2009

1,2 1,3 2,31,2,3

Friday, 29 May 2009

models <- aaply(ozone, 1:2, deseasf)

deseas <- aaply(models, 1:2, resid)

Splitting up ozone gives 576 vectors of length 72.Splitting up models gives 576 rlm models

Take 3d array, split up by first two dimensions.

How are they combined?

Friday, 29 May 2009

Combine: array, data frame, list

Friday, 29 May 2009

name age sex

John 13 Male

Peter 13 Male

Roger 14 Male

John 13 Male

Mary 15 Female

Alice 14 Female

Peter 13 Male

Roger 14 Male

Phyllis 13 Female

name age sex

Mary 15 Female

Alice 14 Female

Phyllis 13 Female

name age sex

John 13 Male

Peter 13 Male

Phyllis 13 Female

name age sex

Mary 15 Female

name age sex

Alice 14 Female

Roger 14 Male

name age sex

.(sex) .(age)

Friday, 29 May 2009

Female

Female 13 1

.(sex) .(age) .(sex, age)

Applying nrow to each piece

Friday, 29 May 2009

Case study: Baseball

Friday, 29 May 2009

id year team g ab r h

ruthba01 1914 BOS 5 10 1 2

ruthba01 1915 BOS 42 92 16 29

ruthba01 1916 BOS 67 136 18 37

ruthba01 1917 BOS 52 123 14 40

ruthba01 1918 BOS 95 317 50 95

ruthba01 1919 BOS 130 432 103 139

ruthba01 1920 NYA 142 457 158 172

ruthba01 1921 NYA 152 540 177 204

ruthba01 1922 NYA 110 406 94 128

ruthba01 1923 NYA 152 522 151 205

ruthba01 1924 NYA 153 529 143 200

ruthba01 1925 NYA 98 359 61 104

ruthba01 1926 NYA 152 495 139 184

ruthba01 1927 NYA 151 540 158 192

ruthba01 1928 NYA 154 536 163 173

ruthba01 1929 NYA 135 499 121 172

21 699 records

1228 players

15-31 years for each player

Friday, 29 May 2009

How does performance (rbi/ab) change over the course of a career?

First need to add column that gives “career year”

Easy for a single player.

baberuth <- subset(baseball, id == "ruthba01") baberuth <- transform(baberuth, cyear = year - min(year) + 1)

For many players, use ddply + transform

baseball <- ddply(baseball, "id", transform, cyear = year - min(year) + 1)

Friday, 29 May 2009

baseball <- subset(baseball, ab >= 25)

xlim <- range(baseball$cyear, na.rm=TRUE)

ylim <- range(baseball$rbi / baseball$ab, na.rm=TRUE)

plotpattern <- function(df) {

qplot(cyear, rbi / ab, data = df, geom = "line",

xlim = xlim, ylim = ylim)

pdf("paths.pdf", width = 8, height = 4)

d_ply(baseball, .(reorder(id, rbi / ab)), failwith(NA, plotpattern), .print = TRUE)

dev.off()

Draw time series for all 1228 players

Friday, 29 May 2009

rsquare

0.0 0.2 0.4 0.6 0.8 1.0

Friday, 29 May 2009

intercept

−0.5

−0.04−0.020.000.020.040.060.08

rsquare0.000.250.500.751.00

intercept

−0.10

−0.05

−0.010 −0.005 0.000 0.005 0.010

rsquare0.000.250.500.751.00

Friday, 29 May 2009

Fiddly details

Labelling

Progress bars

Consistent argument names

Missing values / Nulls

Friday, 29 May 2009

Data analysis

What other patterns of data analysis are waiting to be discovered?

How can we identify these strategies and then develop software to support them?

Does teaching these patterns make it easier for novices to become experts?

Friday, 29 May 2009

http://had.co.nz/plyr

Friday, 29 May 2009

Plyr, one data analytic strategy

Technology

Transcript of Plyr, one data analytic strategy

Full file at ://fratstock.eu/sample/Test-Bank-Marketing-15th-Edition-Pride.pdf · b) target design. c) mix strategy. d) marketing strategy. e) marketing tactic. Ans: d AACSB: Analytic,

oracle analytic functions windowing clause oracle analytic functions windowing clause

An Integrated Approach for ATM Location Strategy Using ......An Integrated Approach for ATM Location Strategy Using Analytic ... hybrid genetic algorithm to generate low cost [16].

PEP Web - The Analytic Third: Working with Intersubjective ... … · analytic third'. This third subjectivity, the intersubjective analytic third Green's [1975] 'analytic object'),

Analytic Cell Decomposition and Analytic Moti Vic Integration

Analytic Functions

GAO-19-385, DEFENSE STRATEGY: Revised Analytic ...4 In this report, we (1) describe the approach that DOD has established to provide senior leaders with analytic support for making

From analytic psychology to analytic philosophy: The ...

Non-Analytic Singular Continuations of Complex Analytic ...bpeckham/export/Papers/BozykThesis.pdf · complex analytic dynamical systems. We start with a complex analytic dynamical

Package ‘vetools’ · Package ‘vetools’ February 20, 2015 Encoding UTF-8 Depends R (>= 2.10), sp Imports stringr, tis, lubridate, maptools, plyr, xts, scales Maintainer

The Analytic Hierarchy Process (AHP) and the Analytic ...lekheng/meetings/mathofranking/slides/saaty.pdf · The Analytic Hierarchy Process (AHP) and the Analytic Network Process (ANP)

Package ‘RForcecom’ - cran.r- · PDF filePackage ‘RForcecom ... See Also XML httr plyr Examples

René Descartes and Analytic Geometry What is Analytic Geometry ...

Analyze This! Using Oracle8i Analytic Functionskingtraining.com/confdownloads/downloads/Analytic8i_slides.pdf · Oracle8i Analytic Functions Oracle 8.1.6 Analytic functions allow

THE PREOCUPATION AND CRISIS OF ANALYTIC … · Analytic philosophers typically are not interested in such naturalistic ... anti-psychologism is just one strategy for answering the

Community participation in schools in developing countries ... · PDF fileCommunity Participation in Schools in Developing Countries: Characteristics, Methods and ... Analytic Strategy

plyr package

Cross-National Research as an Analytic Strategy: · PDF fileCross-National Research as an Analytic Strategy: American Sociological Association, 1987 Presidential Address Author(s):

Analytic spaces and dynamic programming : a measure ... · II. ANALYTIC SPACES § 7. Analytic s~aces § 8. Separating classes § 9. Probabilities on analytic spaces ... If f is a

Analytic Marching: An Analytic Meshing Solution from Deep ...