Slides from R crash course by Ilmo van der Löwe
-
Upload
ilmo-van-der-loewe -
Category
Education
-
view
482 -
download
3
description
Transcript of Slides from R crash course by Ilmo van der Löwe
CAMBRIDGE PROSOCIALITYAND WELL-BEING LABORATORY
CRASH COURSE
DATA SCIENTISTThe Sexiest Job of the 21st Century
Statistics
Domainexpertise
Hacking
BIGDATA
ish
SOCIAL NETWORK DATA
DIGITAL TRACE DATA
GLOBAL SURVEY DATA
GENETIC DATA
SPSS ain’t gonna cut it.
Windows Mac Linux
Built by scientists for scientists.
“We have named our language R – in part to acknowledge the in!uence of S and in part to celebrate our own e"orts.”
Ross IhakaPROFESSOR OF STATISTICS
University of Auckland
Robert GentlemanSENIOR DIRECTOR OF BIOINFORMATICS
Genentech
R is the most powerful statistics language
in the world.
• Open source- Free as in speech and beer
• Cross-platform- Runs on Windows, Mac, and Linux
• Versatile and extensible- Over 4,000 user-contributed packages
• General-purpose programming language- You can make it do things automagically
http://r-project.org
RStudio.org
Why use ?
R is used by the best.
"...a way to organize the brainpower of the world’s most talented data scientists..."
Hal VarianCHIEF ECONOMIST
software on
50%of winners use R
• Everything in one system- base: linear and nonlinear modeling,
classical statistical tests, time-series analysis, classi#cation, clustering etc.
- packages from multilevel modeling to medical image analysis
• Custom functionality- Programming ➞ Automation
4,403 available packages
• Automate away “click-click-click” tasks- More e$cient work
• Share analyses and data with ease- Better collaboration
• Make results reproducible- Better science
How do I use ?
You use R by typing commands, not with a mouse.
You use R by typing commands, not with a mouse.
R version 2.14.1 (2011-12-22)Copyright (C) 2011 The R Foundation for Statistical ComputingISBN 3-900051-07-0Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or'help.start()' for an HTML browser interface to help.Type 'q()' to quit R.
> Prompt
How do you know what to type to R?
For beginners:
For the statistically minded:
For programmers:
The very basics
Put “this” in “here”
“this”HERE
Put “this” in “here”
“this”HERE
Put “this” in here
here <- “this”
Put “this” in here
here <- “this”
variable
Put “this” in here
here <- “this”
a string
Put “this” in here
here <- “this”
assignment operator
Put “this” in here
here = “this”
assignment operator
>
here>
here[1] "this">
here
Row #
[1] "this">
functions and data
BLACK BOX
INPUTBLACK BOX
INPUTOUTPUT BLACK BOX
FUNCTION
INPUTFUNCTION
INPUTOUTPUT FUNCTION
FUNCTIONS ARE LIKE FACTORIES.
( )INPUTOUTPUT
In R, parenthesesmean: “DO SOMETHING”
(according to my instructions)
x.bar <- mean(x)
>
mean(x>
mean(x+>
mean(x
Waits for more
+>
( )INPUT
OUTPUT is captured into VARIABLES.
In R, things are often stored in vectors, lists, matrices, or data frames.
Vector
• The work horse of R
- Even individual numbers are a special cases of vectors (i.e., a vector of one)
• All elements have to be of the same mode
- Vectors of numbers are ok
‣ c(0,1,2,3,4,5,6,7,8,9)- So are vectors of character strings
‣ c("Ilmo","Alex","Chris")
us <- c("Ilmo","Alex","Chris")
us[1]us[2:3]length(us)class(us)
us <- c("Ilmo","Alex","Chris")
us[1]us[2:3]length(us)class(us)
Very classycharacters,
indeed!
List
• Mix and match!
- Lists can store things of di"erent modes
- Numeric, character, data frames...
• Many functions return a list for later use
me <- list(name = "Ilmo", legs = 2)
me$nameme$legsme["name"]me[["name"]]
Matricesare two-dimensional vectors
[,1] [,2] [1,] "Ilmo" "Alex" [2,] "Chris" "Dacher"
[,1] [,2][1,] 1.09 4.20[2,] 2.86 2.92
A numeric matrix
A character string matrix
ucb <- rbind( c("Ilmo","Alex"), c("Chris","Dacher") )
ucb[1,1]ucb[,1]ucb[2,2]
Data Frames
• The best of both lists and matrices
- Columns and rows‣ Each column contains data of a single mode
‣ Each row can contain data of various modes
• Usually created by reading data from a #le or database
DATA FRAMES ARE LIKE WAREHOUSES.
age gender height weight
1
2
3
d[,]
age gender height weight
1
2
3
d[1,]
age gender height weight
1
2
3
d[,1]
age gender height weight
1
2
3
d[,”age”]
age gender height weight
1
2
3
d$age
age gender height weight
1
2
3
d[,1:3]
age gender height weight
1
2
3
d[2,2]
age gender height weight
1
2
3
d[2,c(“age”,”weight”)]
d <- read.csv("MyNobelPrizeData.csv")
What will this do?
d <- read.spss("thatExperiment.sav")Error: could not find function "read.spss"
library("foreign")
library("foreign")
Minitab, S, SAS, Stata, Systat, and dBase
library("foreign")
Minitab, S, SAS, Stata, Systat, and dBase
...but no Excel
install.packages("xlsx")
read.xlsx("recipes.xlsx")
read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"):
read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"): Please provide a sheet name OR a sheet index.
read.xlsx("recipes.xlsx")Error in read.xls("recipes.xlsx"): Please provide a sheet name OR a sheet index.
WTF is a “sheet index”?
Two-step guide to solving R problems
Step 1: Search
help(read.xlsx)or
?read.xlsxR has a lovely built-in documentation system.Most often, all that you need is right there.
Step 1: Search
help.search("bar plot")or??”bar plot"
When you don’t exactly know what you arelooking for, use free-text search.
Step 1: Search
Google it.
You are probably not the #rst person to encounter the error. Paste the error message to Google and see what pops up.
Step 1: Search
rseek.orgstackexchange.comreddit.com/r/rstatsRead the R expert forums.See if they already have solved the problem.
Step 1: Search
Step 2: Ask
Make a reproducible example.
Pin down the exact problem in as few lines of code as possible. Simplify until only the problem remains.
Step 2: Ask
Ask your friends.
Solving problems together is a great way to learn.
Step 2: Ask
Ask the experts online.
There’s R mailing list, statsexchange, rstats reddit, Quora, Twitter etc. You probably found these already with your Google searches.
Step 2: Ask
Step 2: Ask
They do this for living.
Ask the stats dept experts.
Ask Alex or me.
Step 2: Ask
...and show us what you have tried already.
Let’s dive in!
Who has anyprogrammingexperience?
Get your group on.
OPTIO
NAL
Source
Console
Workspace
Frank AnscombeSTATISTICIAN
ans
ans
ans
ans
x1 x2 x3 x4 y1 y2 y3 y41 10 10 10 8 8.04 9.14 7.46 6.582 8 8 8 8 6.95 8.14 6.77 5.763 13 13 13 8 7.58 8.74 12.74 7.714 9 9 9 8 8.81 8.77 7.11 8.845 11 11 11 8 8.33 9.26 7.81 8.476 14 14 14 8 9.96 8.10 8.84 7.047 6 6 6 8 7.24 6.13 6.08 5.258 4 4 4 19 4.26 3.10 5.39 12.509 12 12 12 8 10.84 9.13 8.15 5.5610 7 7 7 8 4.82 7.26 6.42 7.9111 5 5 5 8 5.68 4.74 5.73 6.89
a <- anscombe
a
summary(a$x1)summary(a[,1])
summary(a[,"x1"])
They all mean the same.
Min. 1st Qu. Median Mean 3rd Qu. Max. 4.0 6.5 9.0 9.0 11.5 14.0
What about the rest of a?
summary(a)
plot(a)
plot(a$x1, a$y1)
cor(a$x1, a$y1)cor.test(a$x1, a$y1)
a$x4 <- NULLa$y4 <- NULL
a[,c("x4","y4")] <- NULLa[,c(4,8)] <- NULL
NULLTRUEFALSENA