Introduction to R Programming

71
Introduction To Programming In R Last updated November 20, 2013 e Institute for Quantitative Social Science at Harvard University Introduction To Programming In R Last updated November 20, 2013 1/ 71

description

This hands-on R course will guide users through a variety of programming functions in the open-source statistical software program, R. Topics covered include indexing, loops, conditional branching, S3 classes, and debugging. Full workshop materials available from http://projects.iq.harvard.edu/rtc/r-prog

Transcript of Introduction to R Programming

Page 1: Introduction to R Programming

Introduction To Programming In R

Last updated November 20 2013

e Institutefor Quantitative Social Scienceat Harvard University

Introduction To Programming In RLast updated November 20 2013 1

71

Outline

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 2

71

Workshop overview and materials

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 3

71

Workshop overview and materials

Workshop description

This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives

Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system

Introduction To Programming In RLast updated November 20 2013 4

71

Workshop overview and materials

Running example

Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary

Introduction To Programming In RLast updated November 20 2013 5

71

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 2: Introduction to R Programming

Outline

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 2

71

Workshop overview and materials

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 3

71

Workshop overview and materials

Workshop description

This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives

Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system

Introduction To Programming In RLast updated November 20 2013 4

71

Workshop overview and materials

Running example

Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary

Introduction To Programming In RLast updated November 20 2013 5

71

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 3: Introduction to R Programming

Workshop overview and materials

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 3

71

Workshop overview and materials

Workshop description

This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives

Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system

Introduction To Programming In RLast updated November 20 2013 4

71

Workshop overview and materials

Running example

Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary

Introduction To Programming In RLast updated November 20 2013 5

71

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 4: Introduction to R Programming

Workshop overview and materials

Workshop description

This is an intermediateadvanced R courseAppropriate for those with basic knowledge of RLearning objectives

Index data objects by position name or logical conditionUnderstand looping and branchingWrite your own simple functionsDebug functionsUnderstand and use the S3 object system

Introduction To Programming In RLast updated November 20 2013 4

71

Workshop overview and materials

Running example

Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary

Introduction To Programming In RLast updated November 20 2013 5

71

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 5: Introduction to R Programming

Workshop overview and materials

Running example

Throughout this workshop we will return to a running example that involvescalculating descriptive statistics for every column of a dataframe We willoften use the built-in iris data set You can load the iris data by evaluatingdata(iris) at the R promptOur main example today consists of writing a statistical summary functionthat calculates the min mean median max sd and n for all numericcolumns in a dataframe the correlations among these variables and thecounts and proportions for all categorical columns Typically I will describe atopic and give some generic examples then ask you to use the technique tostart building the summary

Introduction To Programming In RLast updated November 20 2013 5

71

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 6: Introduction to R Programming

Workshop overview and materials

Materials and setup

Lab computer usersUSERNAME dataclassPASSWORD dataclass

Download materials fromhttpprojectsiqharvardedurtcr-progScroll to the bottom of the page and download ther-programmingzip fileMove it to your desktop and extract

Introduction To Programming In RLast updated November 20 2013 6

71

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 7: Introduction to R Programming

Data types

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 7

71

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 8: Introduction to R Programming

Data types

Vectors and data classes

Values can be combined into vectors using the c() function

gt numvar lt- c(1 2 3 4) numeric vectorgt charvar lt- c(1 2 3 4) character vectorgt logvar lt- c(TRUE TRUE FALSE TRUE) logical vectorgt charvar2 lt- c(numvar charvar) numbers coverted to charactergt

Vectors have a class which determines how functions treat themgt class(numvar)[1] numericgt mean(numvar) take the mean of a numeric vector[1] 25gt class(charvar)[1] charactergt mean(charvar) cannot average characters[1] NAgt class(charvar2)[1] character

Introduction To Programming In RLast updated November 20 2013 8

71

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 9: Introduction to R Programming

Data types

Vector conversion and info

Vectors can be converted from one class to anothergt class(charvar2)[1] charactergt numvar2 lt- asnumeric(charvar2) convert to numericgt class(numvar2)[1] numericgt mean(asnumeric(charvar2)) now we can calculate the mean[1] 25gt asnumeric(c(a b c)) cannot convert letters to numeric[1] NA NA NA

In addition to class you can examine the length()and str() ucture of vectors

gt ls() list objects in our workspace[1] charvar charvar2 logvar numvar numvar2gt length(charvar) how many elements in charvar[1] 4gt str(numvar2) what is the structure of numvar2num [18] 1 2 3 4 1 2 3 4

Introduction To Programming In RLast updated November 20 2013 9

71

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 10: Introduction to R Programming

Data types

Factor vectors

Factors are stored as numbers but have character labels Factors are usefulfor

Modeling (automatically contrast coded)Sortingpresenting values in arbitrary order

Most of the time we can treat factors as though they were character vectors

Introduction To Programming In RLast updated November 20 2013 10

71

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 11: Introduction to R Programming

Data types

Lists and dataframes

A dataframe is a list of vectors each of the same lengthA list is a collection of objects each of which can be almost anything

gt DF lt- dataframe(x=15 y=letters[15])gt DF dataframe with two columns and 5 rows

x y1 1 a2 2 b3 3 c4 4 d5 5 egtgt DF lt- dataframe(x=110 y=17) illegal becase lengths differgt L lt- list(x=15 y=13 z = DF)gt L lists are much more flexible$x[1] 1 2 3 4 5

$y[1] 1 2 3

$zx y

1 1 a2 2 b3 3 c4 4 d5 5 e

Introduction To Programming In RLast updated November 20 2013 11

71

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 12: Introduction to R Programming

Data types

Data types summary

Key pointsvector classes include numeric logical character and factorsvectors can be combined into lists or dataframesa dataframe can almost always be thought of as a list of vectors ofequal lengtha list is a collection of objects each of which can by of almost any type

Functions introduced in this sectionc combine elements

asnumeric convert an object (eg a character verctor) to numericdataframe combine oject into a dataframe

ls list the objects in the workspaceclass get the class of an objectstr get the structure of an object

length get the number of elements in an objectmean calculate the mean of a vector

Introduction To Programming In RLast updated November 20 2013 12

71

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 13: Introduction to R Programming

Data types

Exercise 0

1 Create a new vector called test containing five numbers of your choice[ c() lt- ]

2 Create a second vector called students containing five common namesof your choice [ c() lt- ]

3 Determine the class of students and test [ class() or str() ]4 Create a data frame containing two columns students and tests as

defined above [ dataframe ]5 Convert test to character class and confirm that you were successful [

asnumeric() lt- str() ]

Introduction To Programming In RLast updated November 20 2013 13

71

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 14: Introduction to R Programming

Data types

Exercise 0 prototype

1 Create a new vector called test containing five numbers of your choicetest lt- c(1 2 3 4 5)

2 a Create a second vector called students containing five commonnames of your choice

students lt- c(Mary Joan Steve Alex Suzy)

3 Determine the class of students and testclass(students)class(test)

4 Create a data frame containing two columns students and tests asdefined above

testScores lt- dataframe(students tests)

5 Convert test to character class and confirm that you were successfultest lt- ascharacter(test)class(test)

Introduction To Programming In RLast updated November 20 2013 14

71

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 15: Introduction to R Programming

Extracting and replacing object elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 15

71

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 16: Introduction to R Programming

Extracting and replacing object elements

Indexing by position or name

Parts of vectors matricies dataframes and lists can be extracted or replacedbased on position or name

gt indexing vectors by positiongt x lt- 101110 Creat a vector of integers from 101 to 110gt x[c(4 5)] extract the fourth and fifth values of x[1] 104 105gt x[4] lt- 1 change the 4th value to 1gt x print x[1] 101 102 103 1 105 106 107 108 109 110

gtgt indexing vectors by namegt names(x) lt- letters[110] give x namesgt print(x) print x

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110gt x[c(a f)] extract the values of a and f from x

a f101 106

Introduction To Programming In RLast updated November 20 2013 16

71

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 17: Introduction to R Programming

Extracting and replacing object elements

Logical indexing

Elements can also be selected or replaced based on logical (TRUEFALSE)vectorsgt x gt 106 shows which elements of x are gt 106

a b c d e f g h i jFALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUEgt x[x gt 106] selects elements of x where x gt 106

g h i j107 108 109 110

Additional operators useful for logical indexing== equal to= not equal togt greater thanlt less than

gt= greater than or equal tolt= less than or equal to

in is included inamp and| or

gt x[x gt 106 amp x lt= 108]g h

107 108gt x[x gt 106 | names(x) in c(a b c)]

a b c g h i j101 102 103 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 17

71

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 18: Introduction to R Programming

Extracting and replacing object elements

Indexing matrices

Extraction on matrices operate in two dimensions first dimension refers torows second dimension refers to columns

gt indexing matriciesgt create a matrixgt (M lt- cbind(x = 15 y = -1-5 z = c(6 3 4 2 8)))

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4[4] 4 -4 2[5] 5 -5 8gt M[13 ] extract rows 1 through 3 all columns

x y z[1] 1 -1 6[2] 2 -2 3[3] 3 -3 4gt M[c(5 3 1) 23] rows 5 3 and 1 columns 2 and 3

y z[1] -5 8[2] -3 4[3] -1 6gt M[M[ 1] in 42 2] second column where first column lt=4 amp gt= 2[1] -2 -3 -4

Note that unspecified indexrsquos (as in the column index in the example above )return all values

Introduction To Programming In RLast updated November 20 2013 18

71

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 19: Introduction to R Programming

Extracting and replacing object elements

Indexing lists

Lists can be indexed in the same way as vectors with the following extension

gt Lists can be indexed with single brackets similar to vector indexinggt L[c(1 2)] the first two elements of L$x[1] 1 2 3 4 5

$y[1] 1 2 3

gt L[1] a list with one element$x[1] 1 2 3 4 5

gt double brackets select the content of a single selected elementgt effectively taking it out of the listgt L[[1]] a vector[1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 19

71

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 20: Introduction to R Programming

Extracting and replacing object elements

Indexing dataframes

A dataframe can be indexed in the same ways as a matrix and also the sameways as a list

gt DF[c(3 1 2) c(1 2)] rows 3 1 and 2 columns 1 and 2x y

3 3 c1 1 a2 2 bgt DF[[1]] column 1 as a vector[1] 1 2 3 4 5

There is a subtle but important difference between [ n] and [n] whenindexing dataframes the first form returns a vector the second returns adataframe with one columngt str(DF[1]) a dataframe with one columnrsquodataframersquo 5 obs of 1 variable$ x int 1 2 3 4 5

gt str(DF[ 1]) a vectorint [15] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 20

71

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 21: Introduction to R Programming

Extracting and replacing object elements

Extractionreplacement summary

Key points

elements of objects can be extracted or replaced using the [ operatorobjects can be indexed by position name or logical (TRUEFALSE)vectorsvectors and lists have only one dimension and hence only one index isusedmatricies and dataframes have two dimensions and extraction methodsfor these objects use two indices

Functions introduced in this section[ extraction operator used to extractreplace object elements

names get the names of an object usually a vector list or dataframeprint print an object

Introduction To Programming In RLast updated November 20 2013 21

71

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 22: Introduction to R Programming

Extracting and replacing object elements

Exercise 1

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

2 Calculate the mean of the SepalLength column in iris23 BONUS (optional) Calculate the mean of SepalLength but only for

the setosa species4 BONUS (optional) Calculate the number of sepal lengths that are more

than one standard deviation below the average sepal length

Introduction To Programming In RLast updated November 20 2013 22

71

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 23: Introduction to R Programming

Extracting and replacing object elements

Exercise 1 prototype

1 Select just the SepalLength and Species columns from the iris data set(built-in will be available in your workspace automatically) and save theresult to a new dataframe named iris2

data(iris)iris2 lt- iris[c(SepalLength Species)]str(iris2)

2 Calculate the mean of the SepalLength column in iris2mean(iris2[ SepalLength])

3 [3] BONUS (optional) Calculate the mean of SepalLength but only forthe setosa speciesmean(iris2[iris2[[Species]] == setosa SepalLength]) shortcutwith(iris2

print(mean(SepalLength[Species == setosa])))

4 [4] BONUS (optional) Calculate the number of sepal lengths that aremore than one standard deviation below the average sepal lengthmminussd lt- mean(iris2[[SepalLength]]) - sd(iris2[[SepalLength]])length(iris2[iris2[[SepalLength]] lt mminussd SepalLength])

Introduction To Programming In RLast updated November 20 2013 23

71

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 24: Introduction to R Programming

Applying functions to list elements

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 24

71

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 25: Introduction to R Programming

Applying functions to list elements

The apply function

The apply function is used to apply a function to the rows or columns of amatrixgt M lt- matrix(120 ncol=4)gt apply(M 2 mean) average across the rows[1] 3 8 13 18gt apply(M 2 sum) sum the columns[1] 15 40 65 90

Introduction To Programming In RLast updated November 20 2013 25

71

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 26: Introduction to R Programming

Applying functions to list elements

The sapply function

It is often useful to apply a function to each element of a vector list ordataframe use the sapply function for this

gt sapply(DF class) get the class of each column in the DF dataframex y

integer factorgt sapply(L length) get the length of each element in the L listx y z5 3 2gt sapply(DF isnumeric) check each column of DF to see if it is numeric

x yTRUE FALSE

Introduction To Programming In RLast updated November 20 2013 26

71

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 27: Introduction to R Programming

Applying functions to list elements

Combining sapply and indexing

The sapply function can be used in combination with indexing to extractelements that meet certain criteria

Recall that we can index using logical vectors

gt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

gt recall that we can index using logical vectorsgt DF[ c(TRUE FALSE)] select the first column of DF but not the second[1] 1 2 3 4 5

sapply() can be used to generate the logical vector

gt (DFwhichnum lt- sapply(DF isnumeric)) check which columns of DF are numericx y

TRUE FALSEgt DF[DFwhichnum] select the numeric columns

x1 12 23 34 45 5

Note the difference between DF[ 1] and DF[1] The first form returns avector the second a dataframe with one column

Introduction To Programming In RLast updated November 20 2013 27

71

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 28: Introduction to R Programming

Applying functions to list elements

Applying functions summary

Key pointsR has convenient methods for applying functions to matricies lists anddataframesother apply-style functions exist eg lapply tapply and mapply (seedocumentation of these functions for details

Functions introduced in this sectionmatrix create a matrix (vector with two dimensions)apply apply a function to the rows or columns of a matrixsapply apply a function to the elements of a list

isnumeric returns TRUE or FALSE depending on the type of object

Introduction To Programming In RLast updated November 20 2013 28

71

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 29: Introduction to R Programming

Writing functions

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 29

71

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 30: Introduction to R Programming

Writing functions

Functions

A function is a collection of commands that takes input(s) and returnsoutputIf you have a specific analysis or transformation you want to do ondifferent data use a functionFunctions are defined using the function() functionFunctions can be defined with any number of named argumentsArguments can be of any type (eg vectors dataframes lists )

Introduction To Programming In RLast updated November 20 2013 30

71

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 31: Introduction to R Programming

Writing functions

Function return value

The return value of a function can beThe last object stored in the body of the functionObjects explicitly returned with the return() function

Other function output can come fromCalls to print() message() or cat() in function bodyError messages

Assignment inside the body of a function takes place in a localenvironmentExample

gt f lt- function() define function f+ print(setting x to 1) print a text string+ x lt- 1 set x to 1gtgt y lt- f() assign y the value returned by f[1] setting x to 1gtgt y print y[1] 1gt x x in the global is not 1

a b c d e f g h i j101 102 103 1 105 106 107 108 109 110

Introduction To Programming In RLast updated November 20 2013 31

71

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 32: Introduction to R Programming

Writing functions

Writing functions example

Goal write a function that returns the square of itrsquos argument

gt square lt- function (x) define function named square with argument x+ return(xx) multiple the x argument by itself+ end the function definitiongtgt check to see that the function worksgt square(x = 2) square the value 2[1] 4gt square(10) square the value 10[1] 100gt square(15) square integers 1 through 5[1] 1 4 9 16 25

Introduction To Programming In RLast updated November 20 2013 32

71

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 33: Introduction to R Programming

Writing functions

Debugging basics

Stepping througth functions and setting breakpointsUse traceback() to see what went wrong after the fact

Introduction To Programming In RLast updated November 20 2013 33

71

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 34: Introduction to R Programming

Writing functions

Writing functions summary

Key pointswriting new functions is easymost functions will have a return value but functions can also printthings write things to file etcfunctions can be stepped through to facilitate debugging

Functions introduced in this sectionfunction defines a new functionreturn used inside a function definition to set the return value

browser sets a break pointdebug turns on the debugging flag of a function so you can step

through itundebug turns off the debugging flagtraceback shows the error stack (call after an error to see what went

wrong)

Introduction To Programming In RLast updated November 20 2013 34

71

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 35: Introduction to R Programming

Writing functions

Exercise 2

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

Introduction To Programming In RLast updated November 20 2013 35

71

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 36: Introduction to R Programming

Writing functions

Exercise 2 prototype

1 Write a function that takes a dataframe as an argument and returns themean of each numeric column in the data frame Test your functionusing the iris data

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)return(means)

statsum(iris)

2 Modify your function so that it returns a list the first element if which isthe means of the numeric variables the second of which is the counts ofthe levels of each categorical variable

statsum lt- function(df) classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 36

71

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 37: Introduction to R Programming

Control flow

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 37

71

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 38: Introduction to R Programming

Control flow

Control flow

Basic idea if some condition is true do one thing If false dosomething elseCarried out in R using if() and else() statements which can benested if necessaryEspecially useful for checking function arguments and performingdifferent operations depending on function input

Introduction To Programming In RLast updated November 20 2013 38

71

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 39: Introduction to R Programming

Control flow

Control flow examples

Goal write a function that tells us if a number is positive or negative

gt use branching to return different result depending on the sign of the inputgt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt test the isPositive() functiongt isPositive(10)10 is positivegt isPositive(-1)-1 is negativegt isPositive(0)0 is negative

Need to do something different if x equals zero

Introduction To Programming In RLast updated November 20 2013 39

71

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 40: Introduction to R Programming

Control flow

Control flow examples

Add a condition to handle x = 0gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongt

Test the new functiongt isPositive(0) test the isPositive() function0 is zerogt isPositive(a) oops that will not worka is positive

We fixed the problem when x = 0 but now we need to make sure x isnumeric of length one (unless we agree with R that a is positive)

Introduction To Programming In RLast updated November 20 2013 40

71

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 41: Introduction to R Programming

Control flow

Control flow examples

Do something reasonable if x is not numeric

gt add condition to handle the case that x is zerogt isPositive lt- function(x) define function isPositive+ if(isnumeric(x) | length(x) gt 1) + cat(x must be a numeric vector of length one n)+ else if (x gt 0) if x is greater than zero then+ cat(x is positive n) say so+ else if (x == 0) otherwise if x is zero+ cat(x is zero n) say so+ else otherwise+ cat(x is negative n) say x is negative+ end function definitiongtgt isPositive(a) test the isPositive() function on characterx must be a numeric vector of length one

Introduction To Programming In RLast updated November 20 2013 41

71

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 42: Introduction to R Programming

Control flow

Control flow summary

Key pointscode can be conditionally executedconditions can be nestedconditional execution is often used for argument checking among otherthings

Functions1 introduced in this sectioncat Concatenates and prints R objectsif execute code only if condition is met

else used with if code to execute if condition is not met

1Technically if and else are not functions but this need not concern us at the momentIntroduction To Programming In R

Last updated November 20 2013 42 71

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 43: Introduction to R Programming

Control flow

Exercise 3

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

2 Insert a break point with browser() and step through your function

Introduction To Programming In RLast updated November 20 2013 43

71

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 44: Introduction to R Programming

Control flow

Exercise 3 prototype

1 Add argument checking code to return an error if the argument to yourfunction is not a dataframe

statsum lt- function(df) if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(110)statsum(iris)

2 Insert a break point with browser() and step through your functionstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)browser()

classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(means counts))

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 44

71

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 45: Introduction to R Programming

The S3 object class system

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 45

71

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 46: Introduction to R Programming

The S3 object class system

The S3 object class system

R has two major object systemsRelatively informal S3 classesStricter more formal S4 classesWe will cover only the S3 system not the S4 systemBasic idea functions have different methods for different types of objects

Introduction To Programming In RLast updated November 20 2013 46

71

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 47: Introduction to R Programming

The S3 object class system

Object class

The class of an object can be retrieved and modified using the class()functiongt x lt- 110gt class(x)[1] integergt class(x) lt- foogt class(x)[1] foo

Objects are not limited to a single class and can have many classes

gt class(x) lt- c(A B)gt class(x)[1] A B

Introduction To Programming In RLast updated November 20 2013 47

71

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 48: Introduction to R Programming

The S3 object class system

Function methods

Functions can have many methods allowing us to have (eg) one plot()function that does different things depending on what is being plotted()Methods can only be defined for generic functions plot print summarymean and several others are already generic

gt see what methods have been defined for the mean functiongt methods(mean)[1] meanDate meandefault meandifftime meanPOSIXct[5] meanPOSIXltgt which functions have methods for dataframesgt methods(class=dataframe)[19][1] aggregatedataframe anyDuplicateddataframe[3] asdataframedataframe aslistdataframe[5] asmatrixdataframe bydataframe[7] cbinddataframe [lt-dataframe[9] [dataframe

Introduction To Programming In RLast updated November 20 2013 48

71

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 49: Introduction to R Programming

The S3 object class system

Creating new function methods

To create a new method for a function that is already generic all you have todo is name your function functionclass

gt create a mean() method for objects of class foogt meanfoo lt- function(x) mean method for foo class+ if(isnumeric(x)) + cat(The average is meandefault(x))+ return(invisible(meandefault(x))) use meandefault for numeric+ else+ cat(x is not numeric n) otherwise say x not numericgtgt x lt- 110gt mean(x)[1] 55gt class(x) lt- foogt mean(x)The average is 55gtgt x lt- ascharacter(x)gt class(x) lt- foogt mean(x)x is not numeric

Introduction To Programming In RLast updated November 20 2013 49

71

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 50: Introduction to R Programming

The S3 object class system

Creating generic functions

S3 generics are most often used for print summary and plot methods butsometimes you may want to create a new generic function

gt create a generic disp() functiongt disp lt- function(x ) + UseMethod(disp)+ gtgt create a disp method for class matrixgt dispmatrix lt- function(x) + print(round(x digits=2))+ gtgt test it outgt disp(matrix(runif(10) ncol=2))

[1] [2][1] 078 021[2] 085 045[3] 031 034[4] 047 080[5] 051 066

Introduction To Programming In RLast updated November 20 2013 50

71

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 51: Introduction to R Programming

The S3 object class system

S3 classes summary

Key pointsthere are several class systems in R of which S3 is the oldest andsimplestobjects have class and functions have corresponding methodsthe class of an object can be set by simple assignmentS3 generic functions all contain UseMethod(x) in the body where xis the name of the functionnew methods for existing generic functions can be written simply bydefining a new function with a special naming scheme the name of thefunction followed by dot followed by the name of the class

Functions introduced in this sectionplot creates a graphical display the type of which depends on the

class of the object being plottedmethods lists the methods defined for a function or class

UseMethod the body of a generic functioninvisible returns an object but does not print it

Introduction To Programming In RLast updated November 20 2013 51

71

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 52: Introduction to R Programming

The S3 object class system

Exercise 4

1 Modify your function so that it also returns the standard deviations ofthe numeric variables

2 Modify your function so that it returns a list of class statsum3 Write a print method for the statsum class

Introduction To Programming In RLast updated November 20 2013 52

71

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 53: Introduction to R Programming

The S3 object class system

Exercise 4 prototype1 Modify your function so that it also returns the standard deviations of

the numeric variablesstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)return(list(cbind(means sds) counts))

statsum(iris)

2 Modify your function so that it returns a list of class statsumstatsum lt- function(df)

if(class(df) = dataframe) stop(df must be a dataframe)classes lt- sapply(df class)means lt- sapply(df[classes == numeric] mean)sds lt- sapply(df[classes == numeric] mean)counts lt- sapply(df[classes == factor] table)R lt- list(cbind(means sds) counts)class(R) lt- c(statsum class(R))return(R)

str(statsum(iris))

3 [3] Write a print method for the statsum classprintstatsum lt- function(x)

cat(Numeric variable descriptive statisticsn)print(x[[1]] digits=2)cat(Factor variable countsn)print(x[[2]])

statsum(iris)

Introduction To Programming In RLast updated November 20 2013 53

71

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 54: Introduction to R Programming

Things that may surprise you

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 54

71

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 55: Introduction to R Programming

Things that may surprise you

Gotcharsquos

There are an unfortunately large number of surprises in R programmingSome of these gotcharsquos are common problems in other languagesmany are unique to RWe will only cover a few ndash for a more comprehensive discussion pleasesee httpwwwburns-statcompagesTutorR_infernopdf

Introduction To Programming In RLast updated November 20 2013 55

71

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 56: Introduction to R Programming

Things that may surprise you

Floating point comparison

Floating point arithmetic is not exact

gt 1 == 33[1] FALSE

Solution use allequal()

gt allequal(1 33)[1] TRUE

Introduction To Programming In RLast updated November 20 2013 56

71

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 57: Introduction to R Programming

Things that may surprise you

Missing values

R does not exclude missing values by default ndash a single missing value in avector means that many thing are unknown

gt x lt- c(110 NA 1220)gt c(mean(x) sd(x) median(x) min(x) sd(x))[1] NA NA NA NA NA

NA is not equal to anything not even NA

gt NA == NA[1] NA

Solutions use narm = TRUE option when calculating and isna to test formissing

Introduction To Programming In RLast updated November 20 2013 57

71

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 58: Introduction to R Programming

Things that may surprise you

Automatic type conversion

Automatic type conversion happens a lot which is often useful but makes iteasy to miss mistakes

gt combining values coereces them to the most general typegt (x lt- c(TRUE FALSE 1 2 a b))[1] TRUE FALSE 1 2 a bgt str(x)chr [16] TRUE FALSE 1 2 a b

gtgt comparisons convert arguments to most general typegt 1 gt a[1] FALSE

Maybe this is what you expect I would like to at least get a warning

Introduction To Programming In RLast updated November 20 2013 58

71

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 59: Introduction to R Programming

Things that may surprise you

Optional argument inconsistencies

Functions you might expect to work similarly donrsquot always

gt mean(1 2 3 4 5)5[1] 5gt sum(1 2 3 4 5)[1] 15

Why are these different

gt args(mean)function (x )NULLgt args(sum)function ( narm = FALSE)NULL

Ouch That is not nice at all

Introduction To Programming In RLast updated November 20 2013 59

71

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 60: Introduction to R Programming

Things that may surprise you

Trouble with Factors

Factors sometimes behave as numbers and sometimes as characters whichcan be confusing

gt (x lt- factor(c(5 5 6 6) levels = c(6 5)))[1] 5 5 6 6Levels 6 5gtgt str(x)Factor w 2 levels 65 2 2 1 1

gtgt ascharacter(x)[1] 5 5 6 6gt here is where people sometimes get lostgt asnumeric(x)[1] 2 2 1 1gt you probably wantgt asnumeric(ascharacter(x))[1] 5 5 6 6

Introduction To Programming In RLast updated November 20 2013 60

71

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 61: Introduction to R Programming

Additional resources

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 61

71

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 62: Introduction to R Programming

Additional resources

Additional reading and resources

S3 system overviewhttpsgithubcomhadleydevtoolswikiS3S4 system overviewhttpsgithubcomhadleydevtoolswikiS4R documentation httpcranr-projectorgmanualshtmlCollection of R tutorialshttpcranr-projectorgother-docshtmlR for Programmers (by Norman Matloff UCndashDavis)

httpheathercsucdavisedu~matloffRRProgpdfCalling C and Fortran from R (by Charles Geyer UMinn)

httpwwwstatumnedu~charliercState of the Art in Parallel Computing with R (Schmidberger et al)

httpwwwjstatso|orgv31i01paper

Institute for Quantitative Social Science httpiqharvardeduResearch technology consultinghttpprojectsiqharvardedurtc

Introduction To Programming In RLast updated November 20 2013 62

71

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 63: Introduction to R Programming

Additional resources

Feedback

Help Us Make This Workshop BetterPlease take a moment to fill out a very short feedback formThese workshops exist for you ndash tell us what you needhttptinyurlcomRprogrammingFeedback

Introduction To Programming In RLast updated November 20 2013 63

71

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 64: Introduction to R Programming

Loops (supplimental)

Topic

1 Workshop overview and materials

2 Data types

3 Extracting and replacing object elements

4 Applying functions to list elements

5 Writing functions

6 Control flow

7 The S3 object class system

8 Things that may surprise you

9 Additional resources

10 Loops (supplimental)

Introduction To Programming In RLast updated November 20 2013 64

71

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 65: Introduction to R Programming

Loops (supplimental)

Looping

A loop is a collection of commands that are run over and over againA for loop runs the code a fixed number of times or on a fixed set ofobjectsA while loop runs the code until a condition is metIf yoursquore typing the same commands over and over again you mightwant to use a loop

Introduction To Programming In RLast updated November 20 2013 65

71

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 66: Introduction to R Programming

Loops (supplimental)

Looping for-loop examples

For each value in a vector print the number and its square

gt For-loop examplegt for (num in seq(-55)) for each number in [-5 5]+ cat(num squared is num^2 n) print the number+ -5 squared is 25-4 squared is 16-3 squared is 9-2 squared is 4-1 squared is 10 squared is 01 squared is 12 squared is 43 squared is 94 squared is 165 squared is 25

Introduction To Programming In RLast updated November 20 2013 66

71

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 67: Introduction to R Programming

Loops (supplimental)

Looping while-loop example

Goal simulate rolling two dice until we roll two sixes

gt While-loop example rolling dicegt setseed(15) allows repoducible sample() resultsgt dice lt- seq(16) set dice = [1 2 3 4 5 6]gt roll lt- 0 set roll = 0gt while (roll lt 12) + roll lt- sample(dice1) + sample(dice1) calculate sum of two rolls+ cat(We rolled a roll n) print the result+ end the loopWe rolled a 6We rolled a 10We rolled a 9We rolled a 7We rolled a 10We rolled a 5We rolled a 9We rolled a 12

Introduction To Programming In RLast updated November 20 2013 67

71

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 68: Introduction to R Programming

Loops (supplimental)

Using loops to fill in lists

Often you will want to store the results from a loop You can create an objectto hold the results generated in the loop and fill in the values using indexing

gt save calculations done in a loopgt Result lt- list() create an object to store the resultsgt for (i in 15) for each i in [1 5]+ Result[[i]] lt- 1i assign the sequence 1 to i to Result+ gt Result print Result[[1]][1] 1

[[2]][1] 1 2

[[3]][1] 1 2 3

[[4]][1] 1 2 3 4

[[5]][1] 1 2 3 4 5

Introduction To Programming In RLast updated November 20 2013 68

71

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 69: Introduction to R Programming

Loops (supplimental)

Word of caution donrsquot overuse loops

Most operations in R are vectorized ndash This makes loops unnecessary in manycases

Use vector arithmatic instead of loops

gt x lt- c() create vector xgt for(i in 15) x[i] lt- i+i double a vector using a loopgt print(x) print the result[1] 2 4 6 8 10gtgt 15 + 15 double a vector without a loop[1] 2 4 6 8 10gt 15 + 5 shorter vectors are recycled[1] 6 7 8 9 10

Use paste instead of loops

gt Earlier we saidgt for (num in seq(-55)) for each number in [-5 5]gt cat(num squared is num^2 n) print the numbergt gt a better waygt paste(15 squared = (15)^2)[1] 1 squared = 1 2 squared = 4 3 squared = 9[4] 4 squared = 16 5 squared = 25

Loops are handy but save them for when you really need themIntroduction To Programming In R

Last updated November 20 2013 69 71

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 70: Introduction to R Programming

Loops (supplimental)

Exercise 5

1 use a loop to get the class() of each column in the iris data set2 use the results from step 1 to select the numeric columns3 use a loop to calculate the mean of each numeric column in the iris data

Introduction To Programming In RLast updated November 20 2013 70

71

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)
Page 71: Introduction to R Programming

Loops (supplimental)

Exercise 5 prototype1 use a loop to get the class() of each column in the iris data set

gt classes lt- c()gt for (name in names(iris)) + classes[name] lt- class(iris[[name]])+ gtgt classesSepalLength SepalWidth PetalLength PetalWidth Species

numeric numeric numeric numeric factor

1 use the results from step 1 to select the numeric columns

gt irisnum lt- iris[ names(classes)[classes==numeric]]gt head(irisnum 2)

SepalLength SepalWidth PetalLength PetalWidth1 51 35 14 022 49 30 14 02

1 use a loop to calculate the mean of each numeric column in the iris data

gt irismeans lt- c()gt for(var in names(irisnum)) + irismeans[[var]] lt- mean(iris[[var]])+ gtgt irismeansSepalLength SepalWidth PetalLength PetalWidth

5843333 3057333 3758000 1199333

Introduction To Programming In RLast updated November 20 2013 71

71

  • Workshop overview and materials
  • Data types
  • Extracting and replacing object elements
  • Applying functions to list elements
  • Writing functions
  • Control flow
  • The S3 object class system
  • Things that may surprise you
  • Additional resources
  • Loops (supplimental)