Introduction to R
-
Upload
samuel-bosch -
Category
Software
-
view
348 -
download
0
Transcript of Introduction to R
![Page 2: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/2.jpg)
What is R
R is a language and environment for statistical computing and graphics. Itis a GNU project which is similar to the S language.
Created in 1993, license: GNU GPL, current version 3.2.3
Interpreted
C-like syntax
Functional programming language semantics (Lisp, APL)
Object oriented (3 different OO systems)
Garbage collector
Mostly call-by-value
Lexical scope
Function closure
·
·
·
·
·
·
·
·
·/
![Page 3: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/3.jpg)
Popularity
TIOBE: 18
Github: 12
Stackoverflow: 117341 questions (Java: 978006, Python: 507653)
Most popular tool for statistical data analysis
·
·
·
·
/
![Page 4: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/4.jpg)
Usage
CRAN Task Views: https://cran.r-project.org/web/views/
Statistics (frequentist and bayesian)
Machine learning and data mining
Science (mathematics, chemistry, physics, medical, ecology, genetics,economy, history, …)
Finance
Natural Language Processing
Data visualization
Analyzing spatial, spatio-temporal data and time series
…
·
·
·
·
·
·
·
·
/
![Page 5: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/5.jpg)
R Markdown
This is an R Markdown presentation. Markdown is a simple formattingsyntax for authoring HTML, PDF, and MS Word documents. For moredetails on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated thatincludes both content as well as the output of any embedded R codechunks within the document.
/
![Page 6: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/6.jpg)
Competitors/colleagues
SAS, SPSS, STATA, Mathematica and other statistical software
Python + Numpy + Pandas + matplotlib + …
Matlab/Octave
Julia
K/J and other APL like languages
Java (Weka), Clojure, .NET (F#), …
·
·
·
·
·
·
/
![Page 7: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/7.jpg)
Calling R
command line
SAS, SPSS, Stata, Statistica, JMP
Java, C++, F#
Python, Perl, Ruby, Julia
PostgreSQL: PL/R
·
·
·
·
·
/
![Page 8: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/8.jpg)
Ecosystem
IDE: RStudio or one of the alternatives (plugins for Eclipse, VisualStudio, Atom, Sublime Text, Vim, …) Packages: CRAN (6700+ packages),Bioconductor, RForge, Github Learning more and getting help:
Built-in documentation (?, help(), F1) and package vignettes
Official manuals: https://cran.r-project.org/manuals.html
Short reference card: https://cran.r-project.org/doc/contrib/Short-refcard.pdf
(Free) books: Advanced R and R packages by Hadley Wickham
Courses on Edx and Coursera
Stack Overflow and Cross validated (for statistical questions)
·
·
·
·
·
·
·/
![Page 9: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/9.jpg)
View help
?Filter
/
![Page 10: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/10.jpg)
Operators
+ ‐ * / ^ or ** for exponentiation %% modulus %/% integer division < <= > >= == != !x isTRUE(x) xor(x, y)
# element wise OR and AND c(FALSE, FALSE) | c(TRUE, FALSE) & c(TRUE, FALSE)
## [1] TRUE FALSE
# first element OR and AND c(FALSE, FALSE) || c(TRUE, FALSE) && c(TRUE, TRUE)
## [1] TRUE /
![Page 11: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/11.jpg)
Vectors
List of elements of the same type
a <‐ c(1,2,5.3,6,‐2,4) # numeric vector a[c(2,4)] # 2nd and 4th element
## [1] 2 6
names(a) <‐ c("c","d","e","f","g","h") a
## c d e f g h ## 1.0 2.0 5.3 6.0 ‐2.0 4.0
/
![Page 12: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/12.jpg)
Vectors
a[a > 3] ## a[c(F, F, T, T, F, T)]
## e f h ## 5.3 6.0 4.0
a[3:5]
## e f g ## 5.3 6.0 ‐2.0
a[‐1]
## d e f g h ## 2.0 5.3 6.0 ‐2.0 4.0
/
![Page 13: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/13.jpg)
Vectors
a[c("c","d","e")]
## c d e ## 1.0 2.0 5.3
a[a %in% c(1,2)]
## c d ## 1 2
is.null(c()) & is.null(NULL)
## [1] TRUE
/
![Page 14: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/14.jpg)
Vectors
c(1,2,3)[c(TRUE,FALSE,NA)]
## [1] 1 NA
c(sum=sum(a), sumna=sum(c(a,NA)), sumnona=sum(c(a,NA), na.rm = TRUE), mean=mean(a), sd=sd(a), max=max(1,2,a))
## sum sumna sumnona mean sd max ## 16.300000 NA 16.300000 2.716667 2.993604 6.000000
/
![Page 15: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/15.jpg)
Data Types: numeric vectors
Default type for numbers
class(c(1, 2.3))
## [1] "numeric"
c(is.integer(1), is.numeric(1))
## [1] FALSE TRUE
c(seq(from = 1, to = 5, by = 2), rep(c(6,7), times = c(2,3)))
## [1] 1 3 5 6 6 7 7 7
/
![Page 16: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/16.jpg)
Data Types: integer vectors
as.integer(c(1,2.3,"4.5","bla"))
## Warning: NAs introduced by coercion
## [1] 1 2 4 NA
as.integer(c(TRUE,FALSE))
## [1] 1 0
/
![Page 17: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/17.jpg)
Factors
Used to encode a vector as a factor ('category'/'enumerated type')
f <‐ factor(c(1,1,2,2,3,3,2,1), levels=c(1,2,3), labels=c("a", "b", "c")) f
## [1] a a b b c c b a ## Levels: a b c
table(f)
## f ## a b c ## 3 3 2
/
![Page 18: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/18.jpg)
Factors
as.character(f)
## [1] "a" "a" "b" "b" "c" "c" "b" "a"
as.numeric(f)
## [1] 1 1 2 2 3 3 2 1
/
![Page 19: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/19.jpg)
Other Vectorial Data Types
Complex
Logical: 1 < 2, TRUE, T, FALSE, F
Character:
·
·
·
as.character(1.2)
## [1] "1.2"
fizz <‐ paste0("fi", paste(rep("z", 2), collapse = "")) paste(fizz, "buzz", 1:3, sep="_", collapse = " | ")
## [1] "fizz_buzz_1 | fizz_buzz_2 | fizz_buzz_3"
/
![Page 20: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/20.jpg)
Matrices
Multiple vector columns of the same type and the same length
m <‐ matrix(1:10, nrow=5, ncol=2, byrow = TRUE) m[1,] # 1st row
## [1] 1 2
m[,2] # 2nd column
## [1] 2 4 6 8 10
m[1,2] # 1st row, 2nd column
## [1] 2
/
![Page 21: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/21.jpg)
Matrices
m * 1:5
## [,1] [,2] ## [1,] 1 2 ## [2,] 6 8 ## [3,] 15 18 ## [4,] 28 32 ## [5,] 45 50
t(m) %*% m
## [,1] [,2] ## [1,] 165 190 ## [2,] 190 220
/
![Page 22: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/22.jpg)
Matrices
diag(1, nrow = 2, ncol = 2)
## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1
sum(c(rowSums(m), colSums(m))) == sum(2*m)
## [1] TRUE
apply(m, MARGIN = 1, function(x) { sum(x) }) == rowSums(m)
## [1] TRUE TRUE TRUE TRUE TRUE
/
![Page 23: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/23.jpg)
Matrices
head(m, n = 3)
## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ## [3,] 5 6
summary(m)
## V1 V2 ## Min. :1 Min. : 2 ## 1st Qu.:3 1st Qu.: 4 ## Median :5 Median : 6 ## Mean :5 Mean : 6 ## 3rd Qu.:7 3rd Qu.: 8 ## Max. :9 Max. :10 /
![Page 24: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/24.jpg)
Arrays
One, two or more dimensions
a <‐ array(data = t(1:24), dim = c(2,3,4)) a[1,,]
## [,1] [,2] [,3] [,4] ## [1,] 1 7 13 19 ## [2,] 3 9 15 21 ## [3,] 5 11 17 23
a[1,1,1]
## [1] 1
/
![Page 25: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/25.jpg)
Data frames
A data frame combines columns with the same length and differentdata types
d <‐ data.frame(number=1:2, bool=c(TRUE, FALSE), string=c("y", "z")) d$number
## [1] 1 2
d[1,c(2,3)]
## bool string ## 1 TRUE y
/
![Page 26: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/26.jpg)
Data frames
d[,"string"]
## [1] y z ## Levels: y z
data.frame(string=c("y", "z"), stringsAsFactors = FALSE)[,1]
## [1] "y" "z"
/
![Page 27: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/27.jpg)
dplyr
Lots of operators for manipulating local and database data (sqlite,mysql and postgresql). Basic verbs:
Other goodies:
select
filter
arrange (= sort)
mutate
summarise
·
·
·
·
·
piping (chaining)
database access as lazy as possible
Bigquery support (Google)
·
·
· /
![Page 28: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/28.jpg)
dplyr
library(dplyr) cars <‐ mutate(mtcars, hp_mpg = hp/mpg) cars %>% group_by(cyl) %>% summarise(mean(disp), mean(hp), mean(mpg), mean(hp_mpg))
## Source: local data frame [3 x 5] ## ## cyl mean(disp) mean(hp) mean(mpg) mean(hp_mpg) ## (dbl) (dbl) (dbl) (dbl) (dbl) ## 1 4 105.1364 82.63636 26.66364 3.244667 ## 2 6 183.3143 122.28571 19.74286 6.231013 ## 3 8 353.1000 209.21429 15.10000 14.419146
/
![Page 29: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/29.jpg)
dplyr
## options("samuelb@obisdb‐stage.vliz.be" = "<your password here>") pwd <‐ getOption("samuelb@obisdb‐stage.vliz.be") src <‐ src_postgres(dbname = "obis", host = "obisdb‐stage.vliz.be", port = "5432", user="samuelb", password = pwd, options="‐c search_path=obis") tbl(src, "positions") %>% select(id, bottomdepth) %>% filter(longitude == 0 && latitude == 0) %>% collect()
## Source: local data frame [1 x 2] ## ## id bottomdepth ## (int) (int) ## 1 8667455 4935
/
![Page 30: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/30.jpg)
Lists
Ordered collection of objects
l <‐ list(name="Samuel", age=33, workdays=c("Mon","Tues","Wed", "Thurs", "Fri")) l
## $name ## [1] "Samuel" ## ## $age ## [1] 33 ## ## $workdays ## [1] "Mon" "Tues" "Wed" "Thurs" "Fri"
/
![Page 31: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/31.jpg)
Lists
l$name
## [1] "Samuel"
l[["age"]]
## [1] 33
l[[3]]
## [1] "Mon" "Tues" "Wed" "Thurs" "Fri"
/
![Page 32: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/32.jpg)
Functions
sumf <‐ function(x, na.rm = FALSE) { x <‐ ifelse(na.rm, na.omit(x), x) Reduce("+", x) } sumf(1:3)
## [1] 1
sumf
## function(x, na.rm = FALSE) { ## x <‐ ifelse(na.rm, na.omit(x), x) ## Reduce("+", x) ## }
/
![Page 33: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/33.jpg)
Functions
sumf <‐ function(x, na.rm = FALSE) { if(na.rm) { x <‐ na.omit(x) } sum <‐ 0 for (element in x) { sum <‐ sum + element } sum } sumf(c(1, 2, NA))
## [1] NA
sumf(c(1, 2, NA), na.rm = TRUE)
## [1] 3/
![Page 34: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/34.jpg)
Errors
try({ stop("Not supported") }, silent = TRUE) tryCatch(expr = { qwerty + 1 }, error = function (e) str(e), finally = print("Finally"))
## List of 2 ## $ message: chr "object 'qwerty' not found" ## $ call : language doTryCatch(return(expr), name, parentenv, handler) ## ‐ attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ## [1] "Finally"
/
![Page 35: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/35.jpg)
Files
f <‐ file(filename, open = "r") on.exit(close(f)) readLines writeLines cat sink scan parse url gzfile read.table read.csv read.csv2 write.table write.csv write.csv2 /
![Page 36: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/36.jpg)
Short example
aphia_ids <‐ c() for (file in list.files("demo", pattern="*[.]txt", full.names=T)) { print(file) species <‐ read.table(file, header=T, sep="\t", quote="", fill=T) exact_match <‐ species[species$Match.type == "exact",] aphia_ids <‐ c(aphia_ids, exact_match$AphiaID_accepted) }
## [1] "demo/corals_red_sea_matched.txt" ## [1] "demo/red_sea_non_coral_invertebrate_1_matched.txt" ## [1] "demo/red_sea_non_coral_invertebrate_2_matched.txt" ## [1] "demo/red_sea_shore_fish_2_matched.txt" ## [1] "demo/red_sea_shore_fish_matched.txt"
paste(na.omit(aphia_ids[1:6]), collapse = ",")
## [1] "216153,216155,216154,286927,216152,210746"/
![Page 37: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/37.jpg)
Data visualization
plot(runif(n=1000, 0, 0.5), runif(n=1000, 0, 1), pch=3, col="red", xlab="", ylab="", xlim=0:1, ylim=0:1) points(runif(n=50, .5, 1), runif(n=50, 0, 1), pch=20, col="blue")
/
![Page 38: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/38.jpg)
Data visualization
library(ggplot2) mtcars$gear <‐ factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$cyl <‐ factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl")) qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density") ## linear regression qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, main="Regression of MPG on Weight", xlab="Weight", ylab="Miles per Gallon")
/
![Page 39: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/39.jpg)
Data visualization
/
![Page 40: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/40.jpg)
Data visualization
/
![Page 41: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/41.jpg)
Data visualization
ggplot(movies, aes(x=rating)) + geom_histogram(binwidth = 0.1, aes(fill = ..count..)) + scale_fill_gradient("Count", low = "green", high = "red")
/
![Page 42: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/42.jpg)
Objects
Recommended reading: http://adv-r.had.co.nz/OO-essentials.html
S3: generic function OO, very casual system e.g. drawRect(canvas,"blue")
S4: similar to S3 but more rigid, has multiple dispatch
Reference classes: message-passing OO (like Java, C++, etc), objectsare mutable
Base classes: defined in C
·
·
·
·
/
![Page 43: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/43.jpg)
Debugging
RStudio setting: Debug -> On Error -> Break in code
DEMO
recover() traceback()
/
![Page 44: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/44.jpg)
Packages
install.packages("caret") ## installs caret and it's dependencies devtools::install_github("rstudio/packrat") # install from github library(caret) # load the library and import all functions if(!require(raster)) { print("raster package could not be loaded") } dplyr::aggregate ## calling a function without importing the full package plyr::select ## or handle naming conflicts
/
![Page 45: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/45.jpg)
Packrat
Per project private package libraries
install.packages("packrat") packrat::init(project = ".") install.packages("survival") packrat::snapshot() packrat::init() packrat::snapshot() packrat::restore() packrat::clean() packrat::bundle() packrat::unbundle() packrat::on() packrat::off()
/
![Page 46: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/46.jpg)
Package development
devtools + roxygen2 + testthat
Advantages:
Disadvantage:
Get started with the book http://r-pkgs.had.co.nz/ by Hadley Wickham
testing
documentation
versioning
distribution
·
·
·
·
more work·
/
![Page 47: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/47.jpg)
Testing
library(testthat) test_that("list_datasets result is same as datasets.csv", { skip_on_cran() original <‐ read.csv2(data_raw_file("datasets.csv"), stringsAsFactors = FALSE) df <‐ list_datasets() expect_equal(nrow(df),nrow(original)) expect_equal(df, original) })
/
![Page 48: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/48.jpg)
Performance
Some resources:
compiler package (byte-code compiler)
parallel package
http://www.noamross.net/blog/2013/4/25/faster-talk.html
http://adv-r.had.co.nz/Performance.html
http://adv-r.had.co.nz/Profiling.html
http://adv-r.had.co.nz/memory.html
http://adv-r.had.co.nz/Rcpp.html
https://cran.r-project.org/web/views/HighPerformanceComputing.html
·
·
·
·
·
·
·
·
/
![Page 49: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/49.jpg)
Parallel
# Calculate the number of cores no_cores <‐ detectCores() ‐ 1 # Initiate cluster cl <‐ makeCluster(no_cores) on.exit(stopCluster(cl)) clusterExport(cl, "species") clusterExport(cl, "background") results <‐ parLapply(cl, seq(0.1, 0.9, 0.1), function(beta) { source("sdmExperiment.R") kresults <‐ lapply(1:10, function(k) { data <‐ species[[paste0("beta",beta)]][[paste0("k",k)]] cbind(beta, k, t(build_sdm_rcew(data, background))) }) dplyr::rbind_all(kresults) }) results <‐ dplyr::rbind_all(results) ## combine list of data.frames
/
![Page 50: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/50.jpg)
Web
Shiny: http://shiny.rstudio.com/
OpenCPU: https://www.opencpu.org/
RServe: https://rforge.net/Rserve/doc.html
·
interactive web pages
no need for javascript (at least not for simple things)
reactive programming
typically ui.R and a server.R
example: http://shiny.rstudio.com/gallery/movie-explorer.html
DEMO
-
-
-
-
-
-
·
HTTP API for data analysis in R-
·
Binary R server-/
![Page 51: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/51.jpg)
Machine learning
https://cran.r-project.org/web/views/MachineLearning.html
caret, rattle
specific libraries for the different machine learning algorithms
·
·
·
/
![Page 52: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/52.jpg)
Machine Learning example
library(e1071) train_idx <‐ sample(1:nrow(mtcars), nrow(mtcars)/2) train <‐ mtcars[train_idx,] test <‐ mtcars[‐train_idx,] model <‐ svm(hp ~ mpg + cyl + gear, data = train) train_results <‐ predict(model, train) test_results <‐ predict(model, test) rmse <‐ function(error) { sqrt(mean(error^2)) }
/
![Page 53: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/53.jpg)
Machine learning example
print(paste("training rmse",rmse(train_results ‐ train$hp)))
## [1] "training rmse 22.1904078087206"
print(paste("test rmse",rmse(test_results ‐ test$hp)))
## [1] "test rmse 38.6500402049542"
plot_data <‐ data.frame(hp=c(train$hp,test$hp), predicted=c(train_results,test_results), split=c("train","test"))
/
![Page 54: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/54.jpg)
Machine learning example
ggplot(plot_data, aes(hp, predicted)) + geom_point(aes(colour = factor(split), shape = factor(split)))
/
![Page 55: Introduction to R](https://reader031.fdocuments.in/reader031/viewer/2022020301/58f2eefd1a28ab4b788b4581/html5/thumbnails/55.jpg)
Questions
ggplot(data.frame(a="?", x=0, y=0), aes(x=x, y=y, label=a)) + geom_text(size=100)
/