Programming and Simulations
Frank Witmer6 January 2011
Outline
• General programming tips• Programming loops• Simulation– Distributions– Sampling– Bootstrapping
General Programming Tips
• Use meaningful variable names• Include more comments than you think necessary• Debugging your code– Since R is interpreted, non-function variables are
available for inspection if execution terminates– Built-in debugging support: debug(), browser(), trace()– But generally adding print statements in functions is
sufficient• Syntax highlighting!– http://sourceforge.net/projects/npptor/
Loops
• Because R is an interpreted language, all variables in the system are evaluated and stored at every step
• So avoid loops for computationally intense analysis
For & While loop syntax
for (variable in sequence) { expression expression
}while (condition) {
expression expression
}
if/else control statements
if ( condition1 ) {expression1
} else if ( condition2 ) {expression2
} else {expression3
}
Ways to avoid loops (sometimes)
• tapply: apply a function (FUN) to a variable based on a grouping variable
• lapply: apply a function (FUN) to each variable in a given list– sapply: same as lapply but output is more user-
friendly
Data simulation
• Can simulate data using standard distribution functions, e.g. core names norm, pois
• Use ‘r’ prefix to generate random values of the distribution– rnorm(numVals, mean, sd)– rpois(numVals, mean)
• Use set.seed() if you want your simulated data to be reproducible
Standard distribution functions
Sampling
• Sample from a dataset using:sample(dataset, numItems, replace?)
• Can use to simulate survey results or bootstrap statistical estimates
Bootstrap overview
• Method to measure accuracy of estimates from a sample empirically
• For a sample of size n, draw many random samples, also of size n, with replacement
• Two ways to bootstrap regression estimates– residual resampling: add resampled regression
residuals to the original dep. var. & re-estimate– data resampling: sample complete cases of
original data and estimate coefficients
Recall: Boston Metadata CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 ft2
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (=1 if tract bounds river; 0 otherwise)
NOX Nitrogen oxide concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centres
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV Median value of owner-occupied homes in $1000's
Top Related