Introduction to Scientific Programming and Simulation Using R-Chapman & Hall_CRC (2014)
Introduction to R Programming
-
Upload
izahn -
Category
Technology
-
view
3.555 -
download
4
description
Transcript of Introduction to R Programming
Introduction To Programming In R
Last updated November 20 2013
e Institutefor Quantitative Social Scienceat Harvard University
Introduction To Programming In RLast updated November 20 2013 1
71
Outline
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 2
71
Workshop overview and materials
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 3
71
Workshop overview and materials
Workshop description
This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives
Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system
Introduction To Programming In RLast updated November 20 2013 4
71
Workshop overview and materials
Running example
Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary
Introduction To Programming In RLast updated November 20 2013 5
71
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Outline
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 2
71
Workshop overview and materials
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 3
71
Workshop overview and materials
Workshop description
This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives
Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system
Introduction To Programming In RLast updated November 20 2013 4
71
Workshop overview and materials
Running example
Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary
Introduction To Programming In RLast updated November 20 2013 5
71
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Workshop overview and materials
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 3
71
Workshop overview and materials
Workshop description
This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives
Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system
Introduction To Programming In RLast updated November 20 2013 4
71
Workshop overview and materials
Running example
Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary
Introduction To Programming In RLast updated November 20 2013 5
71
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Workshop overview and materials
Workshop description
This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives
Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system
Introduction To Programming In RLast updated November 20 2013 4
71
Workshop overview and materials
Running example
Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary
Introduction To Programming In RLast updated November 20 2013 5
71
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Workshop overview and materials
Running example
Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary
Introduction To Programming In RLast updated November 20 2013 5
71
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Workshop overview and materials
Materials and setup
Lab computer usersUSERNAME dataclassPASSWORD dataclass
Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract
Introduction To Programming In RLast updated November 20 2013 6
71
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 7
71
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Vectors and data classes
Values can be combined into vectors using the c() function
gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt
Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character
Introduction To Programming In RLast updated November 20 2013 8
71
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Vector conversion and info
Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA
In addition to class you can examine the length()and str() ucture of vectors
gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4
Introduction To Programming In RLast updated November 20 2013 9
71
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Factor vectors
Factors are stored as numbers but have character labels Factors are usefulfor
Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Introduction To Programming In RLast updated November 20 2013 10
71
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Lists and dataframes
A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything
gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows
x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5
$y[1] 1 2 3
$zx y
1 1 a2 2 b3 3 c4 4 d5 5 e
Introduction To Programming In RLast updated November 20 2013 11
71
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Data types summary
Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type
Functions introduced in this sectionc combine elements
asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe
ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object
length get the number of elements in an objectmean calculate the mean of a vector
Introduction To Programming In RLast updated November 20 2013 12
71
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Exercise 0
1 Create a new vector called test containing five numbers of your choice[ c() lt- ]
2 Create a second vector called students containing five common namesof your choice [ c() lt- ]
3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as
defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [
asnumeric() lt- str() ]
Introduction To Programming In RLast updated November 20 2013 13
71
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Data types
Exercise 0 prototype
1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)
2 a Create a second vector called students containing five commonnames of your choice
students lt- c(Mary Joan Steve Alex Suzy)
3 Determine the class of students and testclass(students)class(test)
4 Create a data frame containing two columns students and tests asdefined above
testScores lt- dataframe(students tests)
5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)
Introduction To Programming In RLast updated November 20 2013 14
71
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 15
71
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Indexing by position or name
Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name
gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110
gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x
a f101 106
Introduction To Programming In RLast updated November 20 2013 16
71
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Logical indexing
Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106
a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106
g h i j107 108 109 110
Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than
gt= greater than or equal tolt= less than or equal to
in is included inamp and| or
gt x[x gt 106 amp x lt= 108]g h
107 108gt x[x gt 106 | names(x) in c(a b c)]
a b c g h i j101 102 103 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 17
71
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Indexing matrices
Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns
gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns
x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3
y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4
Note that unspecified indexrsquos (as in the column index in the example above )return all values
Introduction To Programming In RLast updated November 20 2013 18
71
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Indexing lists
Lists can be indexed in the same way as vectors with the following extension
gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5
$y[1] 1 2 3
gt L[1] a list with one element$x[1] 1 2 3 4 5
gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 19
71
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Indexing dataframes
A dataframe can be indexed in the same ways as a matrix and also the sameways as a list
gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y
3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5
There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5
gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 20
71
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Extractionreplacement summary
Key points
elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices
Functions introduced in this section[ extraction operator used to extractreplace object elements
names get the names of an object usually a vector list or dataframeprint print an object
Introduction To Programming In RLast updated November 20 2013 21
71
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Exercise 1
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for
the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Introduction To Programming In RLast updated November 20 2013 22
71
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Extracting and replacing object elements
Exercise 1 prototype
1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2
data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)
2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])
3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2
print(mean(SepalLength[Species == setosa])))
4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])
Introduction To Programming In RLast updated November 20 2013 23
71
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Applying functions to list elements
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 24
71
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Applying functions to list elements
The apply function
The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90
Introduction To Programming In RLast updated November 20 2013 25
71
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Applying functions to list elements
The sapply function
It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this
gt sapply(DF class) get the class of each column in the DF dataframex y
integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric
x yTRUE FALSE
Introduction To Programming In RLast updated November 20 2013 26
71
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Applying functions to list elements
Combining sapply and indexing
The sapply function can be used in combination with indexing to extractelements that meet certain criteria
Recall that we can index using logical vectors
gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y
TRUE FALSEgt DF[DFwhichnum] select the numeric columns
x1 12 23 34 45 5
Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column
Introduction To Programming In RLast updated November 20 2013 27
71
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Applying functions to list elements
Applying functions summary
Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details
Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list
isnumeric returns TRUE or FALSE depending on the type of object
Introduction To Programming In RLast updated November 20 2013 28
71
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 29
71
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Functions
A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )
Introduction To Programming In RLast updated November 20 2013 30
71
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Function return value
The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function
Other function output can come fromCalls to print() message() or cat() in function bodyError messages
Assignment inside the body of a function takes place in a localenvironmentExample
gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1
a b c d e f g h i j101 102 103 1 105 106 107 108 109 110
Introduction To Programming In RLast updated November 20 2013 31
71
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Writing functions example
Goal write a function that returns the square of itrsquos argument
gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25
Introduction To Programming In RLast updated November 20 2013 32
71
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Debugging basics
Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact
Introduction To Programming In RLast updated November 20 2013 33
71
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Writing functions summary
Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging
Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value
browser sets a break pointdebug turns on the debugging flag of a function so you can step
through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went
wrong)
Introduction To Programming In RLast updated November 20 2013 34
71
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Exercise 2
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
Introduction To Programming In RLast updated November 20 2013 35
71
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Writing functions
Exercise 2 prototype
1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)
statsum(iris)
2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable
statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 36
71
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 37
71
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Control flow
Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input
Introduction To Programming In RLast updated November 20 2013 38
71
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Control flow examples
Goal write a function that tells us if a number is positive or negative
gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative
Need to do something different if x equals zero
Introduction To Programming In RLast updated November 20 2013 39
71
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Control flow examples
Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt
Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive
We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)
Introduction To Programming In RLast updated November 20 2013 40
71
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Control flow examples
Do something reasonable if x is not numeric
gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one
Introduction To Programming In RLast updated November 20 2013 41
71
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Control flow summary
Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings
Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met
else used with if code to execute if condition is not met
1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R
Last updated November 20 2013 42 71
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Exercise 3
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
2 Insert a break point with browser() and step through your function
Introduction To Programming In RLast updated November 20 2013 43
71
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Control flow
Exercise 3 prototype
1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe
statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(110)statsum(iris)
2 Insert a break point with browser() and step through your functionstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)browser()
classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 44
71
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 45
71
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
The S3 object class system
R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects
Introduction To Programming In RLast updated November 20 2013 46
71
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Object class
The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo
Objects are not limited to a single class and can have many classes
gt class(x) lt- c(A B)gt class(x)[1] A B
Introduction To Programming In RLast updated November 20 2013 47
71
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Function methods
Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic
gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe
Introduction To Programming In RLast updated November 20 2013 48
71
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Creating new function methods
To create a new method for a function that is already generic all you have todo is name your function functionclass
gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric
Introduction To Programming In RLast updated November 20 2013 49
71
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Creating generic functions
S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function
gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))
[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066
Introduction To Programming In RLast updated November 20 2013 50
71
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
S3 classes summary
Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class
Functions introduced in this sectionplot creates a graphical display the type of which depends on the
class of the object being plottedmethods lists the methods defined for a function or class
UseMethod the body of a generic functioninvisible returns an object but does not print it
Introduction To Programming In RLast updated November 20 2013 51
71
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Exercise 4
1 Modify your function so that it also returns the standard deviations ofthe numeric variables
2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class
Introduction To Programming In RLast updated November 20 2013 52
71
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
The S3 object class system
Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of
the numeric variablesstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))
statsum(iris)
2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)
if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)
str(statsum(iris))
3 [3] Write a print method for the statsum classprintstatsum lt- function(x)
cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])
statsum(iris)
Introduction To Programming In RLast updated November 20 2013 53
71
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 54
71
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Gotcharsquos
There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf
Introduction To Programming In RLast updated November 20 2013 55
71
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Floating point comparison
Floating point arithmetic is not exact
gt 1 == 33[1] FALSE
Solution use allequal()
gt allequal(1 33)[1] TRUE
Introduction To Programming In RLast updated November 20 2013 56
71
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Missing values
R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown
gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA
NA is not equal to anything not even NA
gt NA == NA[1] NA
Solutions use narm = TRUE option when calculating and isna to test formissing
Introduction To Programming In RLast updated November 20 2013 57
71
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Automatic type conversion
Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes
gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b
gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE
Maybe this is what you expect I would like to at least get a warning
Introduction To Programming In RLast updated November 20 2013 58
71
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Optional argument inconsistencies
Functions you might expect to work similarly donrsquot always
gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15
Why are these different
gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL
Ouch That is not nice at all
Introduction To Programming In RLast updated November 20 2013 59
71
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Things that may surprise you
Trouble with Factors
Factors sometimes behave as numbers and sometimes as characters whichcan be confusing
gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1
gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6
Introduction To Programming In RLast updated November 20 2013 60
71
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Additional resources
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 61
71
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Additional resources
Additional reading and resources
S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)
httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)
httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)
httpwwwjstatso|orgv31i01paper
Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc
Introduction To Programming In RLast updated November 20 2013 62
71
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Additional resources
Feedback
Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback
Introduction To Programming In RLast updated November 20 2013 63
71
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Topic
1 Workshop overview and materials
2 Data types
3 Extracting and replacing object elements
4 Applying functions to list elements
5 Writing functions
6 Control flow
7 The S3 object class system
8 Things that may surprise you
9 Additional resources
10 Loops (supplimental)
Introduction To Programming In RLast updated November 20 2013 64
71
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Looping
A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop
Introduction To Programming In RLast updated November 20 2013 65
71
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Looping for-loop examples
For each value in a vector print the number and its square
gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25
Introduction To Programming In RLast updated November 20 2013 66
71
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Looping while-loop example
Goal simulate rolling two dice until we roll two sixes
gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12
Introduction To Programming In RLast updated November 20 2013 67
71
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Using loops to fill in lists
Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing
gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1
[[2]][1] 1 2
[[3]][1] 1 2 3
[[4]][1] 1 2 3 4
[[5]][1] 1 2 3 4 5
Introduction To Programming In RLast updated November 20 2013 68
71
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Word of caution donrsquot overuse loops
Most operations in R are vectorized ndash This makes loops unnecessary in manycases
Use vector arithmatic instead of loops
gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10
Use paste instead of loops
gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25
Loops are handy but save them for when you really need themIntroduction To Programming In R
Last updated November 20 2013 69 71
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Exercise 5
1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data
Introduction To Programming In RLast updated November 20 2013 70
71
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-
Loops (supplimental)
Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set
gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species
numeric numeric numeric numeric factor
1 use the results from step 1 to select the numeric columns
gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)
SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02
1 use a loop to calculate the mean of each numeric column in the iris data
gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth
5843333 3057333 3758000 1199333
Introduction To Programming In RLast updated November 20 2013 71
71
- Workshop overview and materials
- Data types
- Extracting and replacing object elements
- Applying functions to list elements
- Writing functions
- Control flow
- The S3 object class system
- Things that may surprise you
- Additional resources
- Loops (supplimental)
-