Slides from R crash course by Ilmo van der Löwe

139
CAMBRIDGE PROSOCIALITY AND WELL-BEING LABORATORY

description

Slides for an introductory R class at the University of Cambridge

Transcript of Slides from R crash course by Ilmo van der Löwe

Page 1: Slides from R crash course by Ilmo van der Löwe

CAMBRIDGE PROSOCIALITYAND WELL-BEING LABORATORY

Page 2: Slides from R crash course by Ilmo van der Löwe

CRASH COURSE

Page 3: Slides from R crash course by Ilmo van der Löwe
Page 4: Slides from R crash course by Ilmo van der Löwe

DATA SCIENTISTThe Sexiest Job of the 21st Century

Page 5: Slides from R crash course by Ilmo van der Löwe

Statistics

Domainexpertise

Hacking

Page 6: Slides from R crash course by Ilmo van der Löwe

BIGDATA

ish

Page 7: Slides from R crash course by Ilmo van der Löwe

SOCIAL NETWORK DATA

Page 8: Slides from R crash course by Ilmo van der Löwe

DIGITAL TRACE DATA

Page 9: Slides from R crash course by Ilmo van der Löwe

GLOBAL SURVEY DATA

Page 10: Slides from R crash course by Ilmo van der Löwe

GENETIC DATA

Page 11: Slides from R crash course by Ilmo van der Löwe

SPSS ain’t gonna cut it.

Page 12: Slides from R crash course by Ilmo van der Löwe
Page 13: Slides from R crash course by Ilmo van der Löwe
Page 14: Slides from R crash course by Ilmo van der Löwe
Page 15: Slides from R crash course by Ilmo van der Löwe

Windows Mac Linux

Page 16: Slides from R crash course by Ilmo van der Löwe

Built by scientists for scientists.

Page 17: Slides from R crash course by Ilmo van der Löwe

“We have named our language R – in part to acknowledge the in!uence of S and in part to celebrate our own e"orts.”

Ross IhakaPROFESSOR OF STATISTICS

University of Auckland

Robert GentlemanSENIOR DIRECTOR OF BIOINFORMATICS

Genentech

Page 18: Slides from R crash course by Ilmo van der Löwe

R is the most powerful statistics language

in the world.

Page 19: Slides from R crash course by Ilmo van der Löwe

• Open source- Free as in speech and beer

• Cross-platform- Runs on Windows, Mac, and Linux

• Versatile and extensible- Over 4,000 user-contributed packages

• General-purpose programming language- You can make it do things automagically

Page 20: Slides from R crash course by Ilmo van der Löwe

http://r-project.org

Page 21: Slides from R crash course by Ilmo van der Löwe

RStudio.org

Page 22: Slides from R crash course by Ilmo van der Löwe

Why use ?

Page 23: Slides from R crash course by Ilmo van der Löwe

R is used by the best.

Page 24: Slides from R crash course by Ilmo van der Löwe
Page 25: Slides from R crash course by Ilmo van der Löwe

"...a way to organize the brainpower of the world’s most talented data scientists..."

Hal VarianCHIEF ECONOMIST

Page 26: Slides from R crash course by Ilmo van der Löwe

software on

Page 27: Slides from R crash course by Ilmo van der Löwe

50%of winners use R

Page 28: Slides from R crash course by Ilmo van der Löwe

• Everything in one system- base: linear and nonlinear modeling,

classical statistical tests, time-series analysis, classi#cation, clustering etc.

- packages from multilevel modeling to medical image analysis

• Custom functionality- Programming ➞ Automation

Page 29: Slides from R crash course by Ilmo van der Löwe

4,403 available packages

Page 30: Slides from R crash course by Ilmo van der Löwe

• Automate away “click-click-click” tasks- More e$cient work

• Share analyses and data with ease- Better collaboration

• Make results reproducible- Better science

Page 31: Slides from R crash course by Ilmo van der Löwe

How do I use ?

Page 32: Slides from R crash course by Ilmo van der Löwe

You use R by typing commands, not with a mouse.

Page 33: Slides from R crash course by Ilmo van der Löwe

You use R by typing commands, not with a mouse.

Page 34: Slides from R crash course by Ilmo van der Löwe

R version 2.14.1 (2011-12-22)Copyright (C) 2011 The R Foundation for Statistical ComputingISBN 3-900051-07-0Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.

> Prompt

Page 35: Slides from R crash course by Ilmo van der Löwe

How do you know what to type to R?

Page 36: Slides from R crash course by Ilmo van der Löwe

For beginners:

Page 37: Slides from R crash course by Ilmo van der Löwe

For the statistically minded:

Page 38: Slides from R crash course by Ilmo van der Löwe

For programmers:

Page 39: Slides from R crash course by Ilmo van der Löwe

The very basics

Page 40: Slides from R crash course by Ilmo van der Löwe

Put “this” in “here”

“this”HERE

Page 41: Slides from R crash course by Ilmo van der Löwe

Put “this” in “here”

“this”HERE

Page 42: Slides from R crash course by Ilmo van der Löwe

Put “this” in here

here <- “this”

Page 43: Slides from R crash course by Ilmo van der Löwe

Put “this” in here

here <- “this”

variable

Page 44: Slides from R crash course by Ilmo van der Löwe

Put “this” in here

here <- “this”

a string

Page 45: Slides from R crash course by Ilmo van der Löwe

Put “this” in here

here <- “this”

assignment operator

Page 46: Slides from R crash course by Ilmo van der Löwe

Put “this” in here

here = “this”

assignment operator

Page 47: Slides from R crash course by Ilmo van der Löwe

>

Page 48: Slides from R crash course by Ilmo van der Löwe

here>

Page 49: Slides from R crash course by Ilmo van der Löwe

here[1] "this">

Page 50: Slides from R crash course by Ilmo van der Löwe

here

Row #

[1] "this">

Page 51: Slides from R crash course by Ilmo van der Löwe

functions and data

Page 52: Slides from R crash course by Ilmo van der Löwe

BLACK BOX

Page 53: Slides from R crash course by Ilmo van der Löwe

INPUTBLACK BOX

Page 54: Slides from R crash course by Ilmo van der Löwe

INPUTOUTPUT BLACK BOX

Page 55: Slides from R crash course by Ilmo van der Löwe

FUNCTION

Page 56: Slides from R crash course by Ilmo van der Löwe

INPUTFUNCTION

Page 57: Slides from R crash course by Ilmo van der Löwe

INPUTOUTPUT FUNCTION

Page 58: Slides from R crash course by Ilmo van der Löwe

FUNCTIONS ARE LIKE FACTORIES.

Page 59: Slides from R crash course by Ilmo van der Löwe

( )INPUTOUTPUT

Page 60: Slides from R crash course by Ilmo van der Löwe

In R, parenthesesmean: “DO SOMETHING”

(according to my instructions)

Page 61: Slides from R crash course by Ilmo van der Löwe

x.bar <- mean(x)

Page 62: Slides from R crash course by Ilmo van der Löwe

>

Page 63: Slides from R crash course by Ilmo van der Löwe

mean(x>

Page 64: Slides from R crash course by Ilmo van der Löwe

mean(x+>

Page 65: Slides from R crash course by Ilmo van der Löwe

mean(x

Waits for more

+>

Page 66: Slides from R crash course by Ilmo van der Löwe
Page 67: Slides from R crash course by Ilmo van der Löwe

( )INPUT

OUTPUT is captured into VARIABLES.

Page 68: Slides from R crash course by Ilmo van der Löwe

In R, things are often stored in vectors, lists, matrices, or data frames.

Page 69: Slides from R crash course by Ilmo van der Löwe

Vector

• The work horse of R

- Even individual numbers are a special cases of vectors (i.e., a vector of one)

• All elements have to be of the same mode

- Vectors of numbers are ok

‣ c(0,1,2,3,4,5,6,7,8,9)- So are vectors of character strings

‣ c("Ilmo","Alex","Chris")

Page 70: Slides from R crash course by Ilmo van der Löwe

us <- c("Ilmo","Alex","Chris")

us[1]us[2:3]length(us)class(us)

Page 71: Slides from R crash course by Ilmo van der Löwe

us <- c("Ilmo","Alex","Chris")

us[1]us[2:3]length(us)class(us)

Very classycharacters,

indeed!

Page 72: Slides from R crash course by Ilmo van der Löwe

List

• Mix and match!

- Lists can store things of di"erent modes

- Numeric, character, data frames...

• Many functions return a list for later use

Page 73: Slides from R crash course by Ilmo van der Löwe

me <- list(name = "Ilmo", legs = 2)

me$nameme$legsme["name"]me[["name"]]

Page 74: Slides from R crash course by Ilmo van der Löwe

Matricesare two-dimensional vectors

[,1] [,2] [1,] "Ilmo" "Alex" [2,] "Chris" "Dacher"

[,1] [,2][1,] 1.09 4.20[2,] 2.86 2.92

A numeric matrix

A character string matrix

Page 75: Slides from R crash course by Ilmo van der Löwe

ucb <- rbind( c("Ilmo","Alex"), c("Chris","Dacher") )

ucb[1,1]ucb[,1]ucb[2,2]

Page 76: Slides from R crash course by Ilmo van der Löwe

Data Frames

• The best of both lists and matrices

- Columns and rows‣ Each column contains data of a single mode

‣ Each row can contain data of various modes

• Usually created by reading data from a #le or database

Page 77: Slides from R crash course by Ilmo van der Löwe

DATA FRAMES ARE LIKE WAREHOUSES.

Page 78: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[,]

Page 79: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[1,]

Page 80: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[,1]

Page 81: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[,”age”]

Page 82: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d$age

Page 83: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[,1:3]

Page 84: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[2,2]

Page 85: Slides from R crash course by Ilmo van der Löwe

age gender height weight

1

2

3

d[2,c(“age”,”weight”)]

Page 86: Slides from R crash course by Ilmo van der Löwe

d <- read.csv("MyNobelPrizeData.csv")

What will this do?

Page 87: Slides from R crash course by Ilmo van der Löwe

d <- read.spss("thatExperiment.sav")Error: could not find function "read.spss"

Page 88: Slides from R crash course by Ilmo van der Löwe

library("foreign")

Page 89: Slides from R crash course by Ilmo van der Löwe

library("foreign")

Minitab, S, SAS, Stata, Systat, and dBase

Page 90: Slides from R crash course by Ilmo van der Löwe

library("foreign")

Minitab, S, SAS, Stata, Systat, and dBase

...but no Excel

Page 91: Slides from R crash course by Ilmo van der Löwe

install.packages("xlsx")

Page 92: Slides from R crash course by Ilmo van der Löwe

read.xlsx("recipes.xlsx")

Page 93: Slides from R crash course by Ilmo van der Löwe

read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"):

Page 94: Slides from R crash course by Ilmo van der Löwe

read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"): Please provide a sheet name OR a sheet index.

Page 95: Slides from R crash course by Ilmo van der Löwe

read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"): Please provide a sheet name OR a sheet index.

WTF is a “sheet index”?

Page 96: Slides from R crash course by Ilmo van der Löwe

Two-step guide to solving R problems

Page 97: Slides from R crash course by Ilmo van der Löwe

Step 1: Search

Page 98: Slides from R crash course by Ilmo van der Löwe

help(read.xlsx)or

?read.xlsxR has a lovely built-in documentation system.Most often, all that you need is right there.

Step 1: Search

Page 99: Slides from R crash course by Ilmo van der Löwe
Page 100: Slides from R crash course by Ilmo van der Löwe
Page 101: Slides from R crash course by Ilmo van der Löwe

help.search("bar plot")or??”bar plot"

When you don’t exactly know what you arelooking for, use free-text search.

Step 1: Search

Page 102: Slides from R crash course by Ilmo van der Löwe

Google it.

You are probably not the #rst person to encounter the error. Paste the error message to Google and see what pops up.

Step 1: Search

Page 103: Slides from R crash course by Ilmo van der Löwe

rseek.orgstackexchange.comreddit.com/r/rstatsRead the R expert forums.See if they already have solved the problem.

Step 1: Search

Page 104: Slides from R crash course by Ilmo van der Löwe

Step 2: Ask

Page 105: Slides from R crash course by Ilmo van der Löwe
Page 106: Slides from R crash course by Ilmo van der Löwe

Make a reproducible example.

Pin down the exact problem in as few lines of code as possible. Simplify until only the problem remains.

Step 2: Ask

Page 107: Slides from R crash course by Ilmo van der Löwe

Ask your friends.

Solving problems together is a great way to learn.

Step 2: Ask

Page 108: Slides from R crash course by Ilmo van der Löwe

Ask the experts online.

There’s R mailing list, statsexchange, rstats reddit, Quora, Twitter etc. You probably found these already with your Google searches.

Step 2: Ask

Page 109: Slides from R crash course by Ilmo van der Löwe

Step 2: Ask

They do this for living.

Ask the stats dept experts.

Page 110: Slides from R crash course by Ilmo van der Löwe

Ask Alex or me.

Step 2: Ask

...and show us what you have tried already.

Page 111: Slides from R crash course by Ilmo van der Löwe

Let’s dive in!

Page 112: Slides from R crash course by Ilmo van der Löwe

Who has anyprogrammingexperience?

Page 113: Slides from R crash course by Ilmo van der Löwe

Get your group on.

Page 114: Slides from R crash course by Ilmo van der Löwe
Page 115: Slides from R crash course by Ilmo van der Löwe

OPTIO

NAL

Page 116: Slides from R crash course by Ilmo van der Löwe
Page 117: Slides from R crash course by Ilmo van der Löwe
Page 118: Slides from R crash course by Ilmo van der Löwe

Source

Console

Workspace

Page 119: Slides from R crash course by Ilmo van der Löwe

Frank AnscombeSTATISTICIAN

Page 120: Slides from R crash course by Ilmo van der Löwe

ans

Page 121: Slides from R crash course by Ilmo van der Löwe

ans

Page 122: Slides from R crash course by Ilmo van der Löwe

ans

Page 123: Slides from R crash course by Ilmo van der Löwe

ans

Page 124: Slides from R crash course by Ilmo van der Löwe

x1 x2 x3 x4 y1 y2 y3 y41 10 10 10 8 8.04 9.14 7.46 6.582 8 8 8 8 6.95 8.14 6.77 5.763 13 13 13 8 7.58 8.74 12.74 7.714 9 9 9 8 8.81 8.77 7.11 8.845 11 11 11 8 8.33 9.26 7.81 8.476 14 14 14 8 9.96 8.10 8.84 7.047 6 6 6 8 7.24 6.13 6.08 5.258 4 4 4 19 4.26 3.10 5.39 12.509 12 12 12 8 10.84 9.13 8.15 5.5610 7 7 7 8 4.82 7.26 6.42 7.9111 5 5 5 8 5.68 4.74 5.73 6.89

Page 125: Slides from R crash course by Ilmo van der Löwe
Page 126: Slides from R crash course by Ilmo van der Löwe

a <- anscombe

Page 127: Slides from R crash course by Ilmo van der Löwe

a

Page 128: Slides from R crash course by Ilmo van der Löwe

summary(a$x1)summary(a[,1])

summary(a[,"x1"])

They all mean the same.

Page 129: Slides from R crash course by Ilmo van der Löwe

Min. 1st Qu. Median Mean 3rd Qu. Max. 4.0 6.5 9.0 9.0 11.5 14.0

Page 130: Slides from R crash course by Ilmo van der Löwe

What about the rest of a?

Page 131: Slides from R crash course by Ilmo van der Löwe

summary(a)

Page 132: Slides from R crash course by Ilmo van der Löwe

plot(a)

Page 133: Slides from R crash course by Ilmo van der Löwe
Page 134: Slides from R crash course by Ilmo van der Löwe

plot(a$x1, a$y1)

Page 135: Slides from R crash course by Ilmo van der Löwe
Page 136: Slides from R crash course by Ilmo van der Löwe

cor(a$x1, a$y1)cor.test(a$x1, a$y1)

Page 137: Slides from R crash course by Ilmo van der Löwe
Page 138: Slides from R crash course by Ilmo van der Löwe

a$x4 <- NULLa$y4 <- NULL

a[,c("x4","y4")] <- NULLa[,c(4,8)] <- NULL

Page 139: Slides from R crash course by Ilmo van der Löwe

NULLTRUEFALSENA