Programming and Simulations Frank Witmer 6 January 2011.

12
Programming and Simulations Frank Witmer 6 January 2011

Transcript of Programming and Simulations Frank Witmer 6 January 2011.

Page 1: Programming and Simulations Frank Witmer 6 January 2011.

Programming and Simulations

Frank Witmer6 January 2011

Page 2: Programming and Simulations Frank Witmer 6 January 2011.

Outline

• General programming tips• Programming loops• Simulation– Distributions– Sampling– Bootstrapping

Page 3: Programming and Simulations Frank Witmer 6 January 2011.

General Programming Tips

• Use meaningful variable names• Include more comments than you think necessary• Debugging your code– Since R is interpreted, non-function variables are

available for inspection if execution terminates– Built-in debugging support: debug(), browser(), trace()– But generally adding print statements in functions is

sufficient• Syntax highlighting!– http://sourceforge.net/projects/npptor/

Page 4: Programming and Simulations Frank Witmer 6 January 2011.

Loops

• Because R is an interpreted language, all variables in the system are evaluated and stored at every step

• So avoid loops for computationally intense analysis

Page 5: Programming and Simulations Frank Witmer 6 January 2011.

For & While loop syntax

for (variable in sequence) { expression expression

}while (condition) {

expression expression

}

Page 6: Programming and Simulations Frank Witmer 6 January 2011.

if/else control statements

if ( condition1 ) {expression1

} else if ( condition2 ) {expression2

} else {expression3

}

Page 7: Programming and Simulations Frank Witmer 6 January 2011.

Ways to avoid loops (sometimes)

• tapply: apply a function (FUN) to a variable based on a grouping variable

• lapply: apply a function (FUN) to each variable in a given list– sapply: same as lapply but output is more user-

friendly

Page 8: Programming and Simulations Frank Witmer 6 January 2011.

Data simulation

• Can simulate data using standard distribution functions, e.g. core names norm, pois

• Use ‘r’ prefix to generate random values of the distribution– rnorm(numVals, mean, sd)– rpois(numVals, mean)

• Use set.seed() if you want your simulated data to be reproducible

Page 9: Programming and Simulations Frank Witmer 6 January 2011.

Standard distribution functions

Page 10: Programming and Simulations Frank Witmer 6 January 2011.

Sampling

• Sample from a dataset using:sample(dataset, numItems, replace?)

• Can use to simulate survey results or bootstrap statistical estimates

Page 11: Programming and Simulations Frank Witmer 6 January 2011.

Bootstrap overview

• Method to measure accuracy of estimates from a sample empirically

• For a sample of size n, draw many random samples, also of size n, with replacement

• Two ways to bootstrap regression estimates– residual resampling: add resampled regression

residuals to the original dep. var. & re-estimate– data resampling: sample complete cases of

original data and estimate coefficients

Page 12: Programming and Simulations Frank Witmer 6 January 2011.

Recall: Boston Metadata CRIM per capita crime rate by town

ZN proportion of residential land zoned for lots over 25,000 ft2

INDUS proportion of non-retail business acres per town

CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise)

NOX Nitrogen oxide concentration (parts per 10 million)

RM average number of rooms per dwelling

AGE proportion of owner-occupied units built prior to 1940

DIS weighted distances to five Boston employment centres

RAD index of accessibility to radial highways

TAX full-value property-tax rate per $10,000

PTRATIO pupil-teacher ratio by town

B 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town

LSTAT % lower status of the population

MEDV Median value of owner-occupied homes in $1000's