Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014....

62
1/62 Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014

Transcript of Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014....

Page 1: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

1/62

Introduction to R

Adrienn Szabó

DMS Group, MTA SZTAKI

Aug 30, 2014

Page 2: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

2/62

1 What is R?What is R for?Who is R for?

2 BasicsData StructuresControl Structures

3 ExtRa stuffR packagesUnit testing in R

Page 3: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

3/62

What is R?

R is a dialect of the S language. ,

. . . but seriously . . .

R is a . . .� Programming language (free, open

source)� Computing environment (like Matlab)� Community (quite an active one)� Ecosystem (rapid conversion from

data-science knowledge to productivity)

(S was developed at Bell Labs in the 1970s as an internal statistical analysis

environment)

Page 4: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

4/62

What is R?

Some say that it’s „Not Really A ProgrammingLanguage”. . .

. . . but rather R „is an interactive environment for doingstatistics”*

∗http://readwrite.com/2013/11/25/

python-displacing-r-as-the-programming-language-for-data-science

Page 5: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

5/62

What for?

� statistical computing (and statistical tests)

� dataset exploration� analysis (time series analysis, classification, clustering,

etc.)

� linear and nonlinear modelling� visualization� recently the favourite tool of data scientists

Page 6: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

6/62

What for?

Page 7: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

7/62

R is not only for programmers!

A couple of titles from the latest useR! conference(more than 150 talks):

� Teaching R to high school students (and teachers)

� Visualizing Lack of Fit in Complex Regression Mode

� A real time, responsive Quantitative trading analysisMobile App using R

� eegR: an R package to analyze electrophysiological(EEG) signals (MTA TTK)

Page 8: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

8/62

More titles – BD

Because everyone has to deal with Big Datanowadays. . .

� PivotalR: A Package for Machine Learning on Big Data

� Massive Predictive Modeling (Oracle)

� Domino: A Platform-as-a-Service for Industrialized DataAnalysis

� Plyrmr: a data manipulation DSL for big data

Page 9: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

9/62

More titles – ML

Machine Learning is cool. . .

� 10 R packages to win Kaggle competitions

� The Arborist: a Scalable Decision Tree Implementation

� Representing Model Ensembles in PMML

� Distributed Matrix Exponentiation in R

Page 10: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

10/62

More titles – Networks & Twitter &maps

Who doesn’t want to study Twitter or social data? ,� Simulating Influenza Transmission with Real Network

Data

� Spatial Tweetstistics with R: Geographical Distribution ofEnglish Loan Words in Spanish Tweets

� Opportunities through the use of Open-Street-Map datain social sciences

Page 11: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

11/62

More titles – RR

These folks seem to care about Reproducible Researchas well. . .

� R and Reproducibility: a Proposal

� rctrack: An R Package that Automatically Collects andArchives Details for Reproducible Computing

� Fostering the next generation of open science with R

� Teaching data analysis in R through the lens ofreproducibility (poster)

Page 12: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

12/62

Why?

Page 13: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

13/62

Features of R

� Free software� Runs on almost any standard computing

platform/OS� Active development, about yearly releases

+ bugfixes� Sophisticated graphics capabilities� Useful for interactive work, but contains a powerful

programming language for developing new tools(user -> programmer)

Page 14: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

14/62

R vs. Python

� Python is more general-purpose, easier to writeprograms in it

Page 15: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

15/62

R vs. Python

� R has more "stats + data analytics" librariesready-to-use

Page 16: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

16/62

1 What is R?What is R for?Who is R for?

2 BasicsData StructuresControl Structures

3 ExtRa stuffR packagesUnit testing in R

Page 17: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

17/62

Getting help

Code: built-in man pages

> ?librarylibrary package:base R Documentation

Loading and Listing of Packages

Description:

‘library’ and ‘require’ load add-on packages.

Usage:

library(package, help, pos = 2, lib.loc = ...

Page 18: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

18/62

Data structures

� Vector� Matrix� Array� List� Data frame

Page 19: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

19/62

Numbers, assignment

To assign a single number: the ’<-’ operator:Note: the ’=’ operator works the same way in almost allcases, but its usage is not advised.

Code: numbers

> a <- 4> b <- 5> a + b[1] 9> a - b[1] -1> a ^ b[1] 1024

Page 20: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

20/62

Vector

Vectors (similar to Lists in Java) can be crated with thec() function (short name for „concatenate”).

Vectors can hold any kinds of things, but the items ofone vector have to be of the same type.

Code: vector examples

> v1 <- c(1, 2, 3, 4, 5, 6)> v2 <- c(0.8, 0.1)> v1 + v2[1] 1.8 2.1 3.8 4.1 5.8

Magic!

Page 21: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

21/62

Vector

You can concatenate items and vectors as you please.

Code: vector examples 2

> v1 <- c(1, 2, 3)> v2 <- c(0.8, 0.1)> c(22, v1, -3.9, v2)[1] 22.0 1.0 2.0 3.0 -3.9 0.8 0.1> c(v1, "Sponge", "Bob")[1] "1" "2" "3" "Sponge" "Bob"

Warning: the result vector’s elements can be turned into themore general type!

Page 22: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

22/62

Vector

You can select ranges of vectors to get a shorter one.

Indexing begins with 1!

Negative indices: leave it out!

Code: vector subsetting

> v1 <- c(1, 2, 3, 4, 5, 6)> v1[2][1] 2> v1[2:4][1] 2 3 4> v1[c(-2,-5)][1] 1 3 4 6

Page 23: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

23/62

(Nice random image 1)

Page 24: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

24/62

Matrix

A matrix is a vector represented and accessible intwo dimensions. It has a fixed type of elements andfixed number of rows and columns.

Code: matrix examples

> matrix(1:6, byrow=TRUE, nrow=2)[,1] [,2] [,3]

[1,] 1 2 3[2,] 4 5 6> matrix(c(1,2,13,9,8,17,3,4,5), ncol = 3)

[,1] [,2] [,3][1,] 1 9 3[2,] 2 8 4[3,] 13 17 5

Page 25: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

25/62

Matrix

You can give names to columns and/or rows.

Code: matrix naming

> matrix(c(1:9),nrow=3,byrow=TRUE,+ dimnames=list(c("r1","r2","r3"),c("a","b","c")))

a b cr1 1 2 3r2 4 5 6r3 7 8 9

Page 26: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

26/62

Matrix

Subsetting works here as well. . .

Code: matrix subsetting

> m1 <- matrix(c(1:9),nrow=3,byrow=TRUE,+ dimnames=list(c("r1","r2","r3"),c("a","b","c")))> m1[2,]a b c4 5 6> m1[,-2]

a cr1 1 3r2 4 6r3 7 9

Page 27: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

27/62

Matrix

Matrix operations are quite similar to vector operations.For example, inequality will return another logicalmatrix of equal size.

Code: matrix example

> m1 > 5a b c

r1 FALSE FALSE FALSEr2 FALSE FALSE TRUEr3 TRUE TRUE TRUE

> m1[m1>5][1] 7 8 6 9

Page 28: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

28/62

(Nice random image 2)

Page 29: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

29/62

Array

An array is an extension to matrix in its number ofdimensions.It is a vector that is represented and accessible in agiven number of dimensions.

Let’s arrange 20 integers from 0 to 19 in three dimensions: 2x 5 x 2

Code: array example

> a1 <- array(c(0:20), dim = c(2, 5, 2))

Page 30: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

30/62

Array

Code: array example

> array(c(0:20), dim = c(2, 5, 2)), , 1

[,1] [,2] [,3] [,4] [,5][1,] 0 2 4 6 8[2,] 1 3 5 7 9

, , 2

[,1] [,2] [,3] [,4] [,5][1,] 10 12 14 16 18[2,] 11 13 15 17 19

Page 31: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

31/62

Array

Subsetting works similarly to matrices, but we canspecify the selected row/col indices for each dimension.

Code: array example

> a1[-1, 1:4, ][,1] [,2]

[1,] 1 11[2,] 3 13[3,] 5 15[4,] 7 17

> a1[-1, 1:4, 2][1] 11 13 15 17

Page 32: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

32/62

(Nice random image 3)

Page 33: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

33/62

List

A list is a generic vector that is allowed to includedifferent types of objects, even other lists.

Code: list example

> list(1, c(TRUE,FALSE), c("a","b","c"))[[1]][1] 1

[[2]][1] TRUE FALSE

[[3]][1] "a" "b" "c"

Page 34: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

34/62

List

We can assign names to each entry by using namedarguments.

Code: list example

> myl <- list(x=1,y=c(TRUE,FALSE),z=c("a","b","c"))> myl$x[1] 1

$y[1] TRUE FALSE

$z[1] "a" "b" "c"

Page 35: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

35/62

List

To access the members of a list by name, usedollar-sign:

Code: list example

> myl$x[1] 1

> myl $ z[1] "a" "b" "c"

> myl$almaNULL

Page 36: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

36/62

List

To access the N-th member of a list, use double squarebrackets:

Code: list example

> myl[[1]][1] 1

> myl [[ 3 ]][1] "a" "b" "c"

> myl[[4]]Error in myl[[4]] : subscript out of bounds

Page 37: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

37/62

List

Even names can be used inside double brackets:

Code: list example

> myl[["x"]][1] 1

> elemname <- "z"> myl[[elemname]][1] "a" "b" "c"

Page 38: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

38/62

Subsetting a list

Use single-square-bracket notation to extract multiplemembers from a list and construct a new list:

Code: subsetting a list

> myl["x"]$x[1] 1

> mxl[c("x","y")]$x[1] 1

$y[1] TRUE FALSE

Page 39: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

39/62

Subsetting a list

Code: more examples of subsetting a list

> myl[1]$x[1] 1

> myl[c(TRUE, FALSE, TRUE)]$x[1] 1

$z[1] "a" "b" "c"

Page 40: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

40/62

Setting values of a list

Code: setting and adding list members

> myl$x <- 0.6 # overwrite element> myl$m <- 4 # add a new named element> myl$y <- NULL # delete by name> myl$x[1] 0.6

$z[1] "a" "b" "c"

$m[1] 4

Page 41: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

41/62

Setting values of a list

Code: setting and adding list members

> myl[[2]] <- NULL # delete by index ("z")> myl[[1]] <- 0.8 # overwrite element> myl[[5]] <- 5 # add a new element# what do we get?

Page 42: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

42/62

Setting values of a listCode: setting and adding list members

> myl$x[1] 0.8

$m[1] 4

[[3]]NULL

[[4]]NULL

[[5]][1] 5

Page 43: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

43/62

Other list functions

Code: List functions

> is.list(myl[1]) # [ ] -> sublist[1] TRUE> is.list(myl[[1]]) # [[ ]] -> element[1] FALSE> l2 <- as.list(c(a=1,b=2,c=3)) # vector to list> unlist(l2) # list to vectora b c1 2 3> unlist(list(a=1, b=2, c="hello"))

a b c"1" "2" "hello"

Page 44: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

44/62

(Nice random image 4)

Page 45: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

45/62

Factor

The term factor refers to a statistical data type used tostore categorical variables.Use the function factor() to get factors from a vectorof objects.

Code: using factors

> sData <- c("Male", "Female", "Female", "X", "Male")> sFactors <- factor(sData)> sFactors[1] Male Female Female X MaleLevels: Female Male X

A factor is stored internally as a numeric vector with values1, 2, 3, k, where k is the number of levels.

Page 46: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

46/62

Data frame

Data frames are similar to tables in a relationaldatabase.Generalisation of a matrix and a list – different columnsmay have different modes, but all elements of a columnmust have the same mode (all numeric or all factor, orall character).

Typically you’ll load a csv file into a data frame object.

Page 47: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

47/62

Data frame

A data frame has colnames(), and rownames(); thelength() of a data frame is the same as ncol();nrow() gives the number of rows.

Code: data frame example

> df <- data.frame(x = 1:3, y = c("a", "b", "c"))> str(df)’data.frame’: 3 obs. of 2 variables:$ x: int 1 2 3$ y: Factor w/ 3 levels "a","b","c": 1 2 3

Note: data.frame() by default turns strings into factors.Use stringAsFactors = FALSE.

Page 48: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

48/62

Data frame

Code: data frame example

> myy <- c("a", "b", "c", "n")> df <- data.frame(x = 4:7, y = myy,+ stringsAsFactors = FALSE)

> dfx y

1 4 a2 5 b3 6 c4 7 n> str(df)’data.frame’: 4 obs. of 2 variables:$ x: int 4 5 6 7$ y: chr "a" "b" "c" "n"

Page 49: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

49/62

(Nice random image 5)

Page 50: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

50/62

Control structures

if, else testing a conditionfor execute a loop a fixed number of times

while execute a loop while a condition is truerepeat execute an infinite loopbreak break the execution of a loopnext skip an interation of a loop

return exit a function

Page 51: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

51/62

Condition

This is a valid if/else structure.

Code: if and else

if(x > 3) {y <- 10

} else {y <- 0

}

So is this one.

Code: alternative if and else

y <- if(x > 3) { 10 } else { 0 }

Page 52: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

52/62

Loop

We can iterate on lists or vectors.

Code: for loops

x <- c("a", "b", "c", "d")for(letter in x) {print(letter)

}

# another examplefor(i in 1:3) print(x[i])

Page 53: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

53/62

Functions

Functions are objects in their own right!

Three components of a function:� arguments� body� environment

Functions can return only a single object. (But they canreturn a list with any objects.

Call-by-value: modifying a function argument does notchange the original value (some exceptions exist).

Page 54: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

54/62

Functions

Code: defining functions

isSumAboveTen <- function(x, y) {if (x + y > 10) {return(TRUE)

} else {return(FALSE)

}}

# Shorter: the last object will be returnedisSumAbove10 <- function(x, y) {if (x + y > 10) TRUE else FALSE

}

Page 55: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

55/62

Scripts

And now we can write nice little scripts to do whateverwe want! ,

How to get the examples

$ git clone [email protected]/tutorial$ ls tutorial/R

example1 example2 example3 example4

Page 56: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

56/62

Twitter example

Page 57: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

57/62

1 What is R?What is R for?Who is R for?

2 BasicsData StructuresControl Structures

3 ExtRa stuffR packagesUnit testing in R

Page 58: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

58/62

Packages in R

When you install R then you get a base R system withbasic functionality to use R.It does include by default some basic packages (utils,stats, datasets, graphics, methods, tools, parallel, etc.)

For more specific purposes you either :� find an existing add-on package on CRAN that

helps, or maybe on Bioconductor,� or you can write your own package (just for yourself

or you can easily publish it as well)

Page 59: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

59/62

CRAN

CRAN (The Comprehensive R Archive Network)is a network of ftp and web servers around the worldthat store identical & up-to-date versions of code anddocumentation for R.

(Please use the CRAN mirror nearest to you to minimizenetwork load.)

More about packages later. . .

Page 60: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

60/62

Unit testing in R

There are some options:

RUnit is the oldest onesvUnit has a GUI

testthat is actively developed, andsmarter (but isn’tcompatible with either ofthe pervious 2)

Page 61: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

61/62

Sources

1 From the Coursera course "R programming" byRoger D. Peng

2 http://renkun.me/learnR/

3 http://adv-r.had.co.nz/

4 http://www.edureka.in/blog/why-learn-r/

5 http://user2014.stat.ucla.edu/files/Abstracts.pdf

6 http://www.slideshare.net/DataRobot/final-10-r-xc-36610234

7 https://www.datacamp.com/courses/introduction-to-r

8 http://www.stat.berkeley.edu/~statcur/Workshop2/Presentations/functions.pdf

Page 62: Introduction to R - SZTAKI · Introduction to R Adrienn Szabó DMS Group, MTA SZTAKI Aug 30, 2014. 2/62 1 What is R? What is R for? Who is R for? 2 Basics Data Structures Control

62/62

Sources of images1 http://www.opsrules.com/supply-chain-optimization-blog/bid/349734/Combining-Machine-Learning-and-Optimization-in-Supply-Chain-Analytics

2 http://exploredata.wordpress.com/2012/08/20/importing-a-google-spreadsheet-into-r/

3 http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-study-data-structures-1/

4 http://1.bp.blogspot.com/_FsLa1cMTCWU/TPgxWY_QNZI/AAAAAAAAAjc/ORVJjtoDBvg/s1600/program_language_density_plot.png

5 http://www.gettyimages.com/detail/photo/castor-oil-stem-light-micrograph-of-a-high-res-stock-photography/123790451

6 http://illuminarti.weebly.com/patrick-star.html

7 http://vis.cs.ucdavis.edu/papers/social_

networks.pdf