Machine Learning with R
-
date post
19-Sep-2014 -
Category
Documents
-
view
8 -
download
3
description
Transcript of Machine Learning with R
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Machine Learning with R
Joshua Reich
April 2, 2009
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Why R?
How to find out about stuff?
What is Machine Learning?
Show me the money
Learn More
Questions
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
ML Alternatives
• Matlab
• Weka
• Python
• Stand alone (e.g. Vowpal Wabbit)
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
”The best thing about R is that it was developed by statisticians.The worst thing about R is that it was developed by statisticians.”
–Bo Cowgill, Google (at SF R Meetup)
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Why R?
• Working with the CLI - iterative discovery
• Integrated graphics
• Community supported packages (CRAN)
• ODBC Integration
• You already use it
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
How to find out about stuff?
• ?function
• help.search("search string")
• RSiteSearch("search string")
• http://rseek.org/
• names(object) or attributes(object)
• > kmeansfunction (x, centers, iter.max = 10, nstart = 1,algorithm = c("Hartigan-Wong","Lloyd", "Forgy", "MacQueen"))...
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
What is Machine Learning?
Statistics Machine Learning
Probability Model Learning ModelObservations ObservationsEstimation Training
MLE Optimization
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Semantics v. Pragmatics
• For most statistical models there are either closed form orquick numerical approximations for finding model properties -e.g., confidence intervals. Assuming you believe that yourdata generating process is accurately captured by your model,then you can make direct statements about unseen events.
• Machine learning is a close cousin to non-parametrictechniques and relies on training/testing/validation cycles,bootstrapping and cross-validation to determine measures ofreliability.
But invariably, simple models and a lot of data trump moreelaborate models based on less data
–Halevy, Norvig & Pereira.
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Inductive Bias
In the days when Sussman was a novice, Minsky oncecame to him as he sat hacking at the PDP-6.
”What are you doing?”, asked Minsky.
”I am training a randomly wired neural net to playTic-tac-toe”, Sussman replied.
”Why is the net wired randomly?”, asked Minsky.
”I do not want it to have any preconceptions of how toplay”, Sussman said.
Minsky then shut his eyes.
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Inductive Bias
”Why do you close your eyes?” Sussman asked histeacher.
”So that the room will be empty.”
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
What is Machine Learning?
• Regression vs. Classification
Y ∈ Rp
vs.
Y ∈ {Y1, Y2, . . . ,YN}
• Supervised vs. Unsupervised Learning
Y = f (X )
vs.
X1, X2, . . . ,XN
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
General Supervised Learning Framework(Ch 7 of ElemStatLearn)
• Training / Validation / Test
• Variance - Bias Decomposition: Overfitting
• Feature Selection / Regularization
• Bootstrapping / Cross-Validation
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
What we will walk through
• K-Means clustering: kmeans()
• K-Nearest Neighbours: knn()
• Regression Trees: rpart()
• Improving trees with PCA: princomp()
• Linear Discriminant Analysis: lda()
• Support Vector Machines: svm()
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
The Problem
−5 0 5 10
−10
−5
05
1015
x
y
A
AA
A
A
A
A
AAA
A
A
AA
A
AA
A
A
A
AA
A
A
A
AA
A A
A A
A
A
AA
A
A
AA A
A
A
A
A
A
A
A
A
A
A
A
A
A AA
AA
AA
AA
AA
A
A
A
AA
AA
A AA
A
A
A
A
A
A
AA
AA
A
A
A A
A
A
A
A
A
A
A
AA
A
A
AAA
A
A
AA
AA
A
A
A
A
A
A
A
AA
AA
A
A
AA
A
A
A
AA A
A
A A
A
A
A
A
A
A
A
A
A
AAA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
AA
A
AA
A
A
AAA
A
A
AA
A
A
A
A
AA
A
A
A
A
A
A
AA A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
A
A
A
A
AA
A
A
A
AA
A
A
A
A
A
A
A
A
A
A A
A
AAA
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A A A
A
AA
AA
A
A
A
AA
A
A
A
A
AA
A
AAA
A
A
A
A
A
AA
A
A
A
AA
AA
A
AA
A
A
AA
A A
A
A
A
AAA
A
A
A
A
AA
AAA
AA
A
A
A
A
A
AA
A
AA
A
AA
A
A
AA
A
A
A
AA
A
AA
AA
AA
A
A
A
A
A
AA
A
AA
AA
A
A
A
A
A
A
A
AA
A
A
AA
AAA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
A
AA
A
A
A
A
AA
AA A
A AA
A
A
A
A
A
A
A
A
AA
A
AAA
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AA
A
A
A
AA
A
A
A
A
AA
AA
AA
A
A A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
AAA
A
AAA
A
A
A
A
A
A
AA
A
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
AA
A
A AA
A
A
A
A
A
A
A
A
AA
A
AA
A
AA
A
AA
A
A
AAA
A
A
AA
AA
A
A
A
A
A
AA
AA
A
A
A
A
A
AA
A
A
A A
A
A
A
A
A
A
A
AA
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
AA
A
A
A
AAA
A
AA
A
AAA
A
AAA
A
AA
A
A
A
A
A
A
AA
A
A
A
A
A
A
AA
A
AAA
A AA
A A
A
A
A
A
A
A
A
A
A
A
A
AA
A
AAA
A
A A
AAA
A
AAA
AA
A
A
A
A
A
A
AA
A
A
A
AA
A A
A
AA
A
A
A
A
AAA
A A
A
A
AA
A
A
A
A
A
A
A
AA
A
AAA
A
A
A
A
A
A
A
A
A AA
A
A
AA
A
A
A
A
A
A
AAA
A
A
A
A
A
AA
A
AA
A
A
A
AA
AA
A
A
A
A
A
A
A
A
AA
A
A
A
A
AA
A
AA
AA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
AA A
AA
A
A
AAA
AAA
A
A
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
AA
A
A
A
A
A
A
A
A
A
AAA
AA
AA
A
AA
AA
A
AA
A
A
AA
A
A
A
A
A
A
A
A
A
A
A
A
B BBB
B BBB
BB
B
B
B
BBB BBB
BBB BB
BBBB
B
BBB
BB
B
B BB B
BB B B
BB
BB
BB
B
BBB BBB B B
BB
B
B BBBB BB
BBB B BB
B BB
B
BBB
B
B B BB
B BB
BB
B BBB
BBB
BBB
BB B
BBBB
B B
BB
B BBB
B
BB
BBBB
BB BB
BB
BB B
BBB
BBB BB BBB
BBB
BB
B
BB
B
B
BB
B BB BB
B BBB B
BB BB
BB B
BB B
B
B
B
B
BB
BBB
BB
BB
B BB
B BB BBB
BB
B
B
BBB
BB
B
BBB BB
BB
B BBBB
B
BB
BBB BBB
BB
BB B BB BB BB
BBB
BB BB
BB
BB
B BBBB
BB BBB
B
BBBB
BB BB
BBB BB
B
B
BBBB
BB BB
BB
BB
BB B
BB
B
B BBBB
BBBBBB
BBB
B
BBBB
B
BB
B
BB
BBBBBB B
BB
BB
BB
B B BB
BB
BB BB
BB BB
B
B
B
BB BB
B
BB
BBB
BB BBBB BBB BBB
BBBBB
BB
BB
BBBB B BB
B BB BBB B
BB B BB BB
BBB BBB B
BBBBB BB
BB
BB
BB
B
BB
B BB
B BB
BB BBB
B
BB
BB
BB
B B
BB BBBB
BB
B BBB
BB
B
B
BB
BB
B B
BBB
B B BBB
B BB
BB BB B
BB
BB
BBB
B BB
B
BBB
BBBB
BB BBB BB
B
BB B
B
B
BB
B BB
BB
B
BB
BBB
B
BB
B
B
BBBB
B B BB
BBB BB BBB
BBB
BB B B
BBB
BBB
BB
BBB
BB
BB
BBBB
BBB
BB
BBB BB
BB
B
BB
B
BB
BBB
B
B BBBB
BB
BB
BB B B
BBB
BB
BBB B BB
BBBB
B BB BB
B
BBBB
B
BBBB BBB
B
B
B
B
B BB
B
BB
B
BBB
B
BB
BB BB
B
B BBBBB B
BB
B
B
BB BBB
BBB
BB
BBB B
B
B
B
B
B
B
B
BBB BBBB
BB
BBB B BB
BB
BB B B
B
BB B
B
BB
BB
BB
BBB
B
BBB B
BB
B
BB
B
B
B
BB
BB
BBB
BB B BBB
BBB
B
B BB
BB
B BBB BB
B
B
B
B
B BB
BBB
B
BB
B
BBB
B
BBB
BBBBB
BBBB B B
BB
B BB BBB
BBB
B BBB
B BBB
BBBB
B BB
B BBB BB BB B
BBB
B
BB BB
B BBBB
BB
B
BB BB BB BBBB
BB
B
BBB
B
BB
B
BBB
B
B
BBBB
B BB BBBB B
BBB
BBB
BB
BB
BB
BB BBB
B BBB BB BB
BB
BBB B
B B
B BB
B
B BBBB
BBBB
B
B
B
B
B BB
B
B BB
BB
BB BB
B
B
BBB
BBB
BB
BB B B
B
BBBBBB
BB
BB BB
BB
B
BB BBB B
BB
BB
BBB
B
BB
C
CCC
C C
CC
C
CC
C C C
C
CC CC
CCC
CC
CC C
CCC CC C
CC
C
CC
C
CC
CC
C
C
C
C
C
C
CC CC CC
C
C
CC CC
C
C
C
CC
CC C
C
C
CC C
CC
CC
C
C
CC
CC
CC
CCC
C
CCCC
CC
C
C
C
CCC
C
C
CC
CC
CC
CC C
CC
C
C
CCC
CCCC C
C
C
C
C
C CC
CC
C
C
CC
C
C CCC
CC
C
CC CCC
CC
C
CC C
C
C
CC
C
CCCC
CCC
CC
CC
C
C
C
C
C
C
CCC CC
CC
CC
CC
C
CC
C
CCC C
C
C
C C
C CCCCC CCC CC
C CCC
CC C
C
CCC
C
C CC C
CC
CC
CCC
C
C
CCC
CC
CC
CC
CCC C
C CC
C
C
C
C
CC
CC
C CC
C C
CC
C
C
C
C C
C
CC
CCC
CC
CC
CC
C
CC
CC CC C
C
CC
C
CC CC
CC C
CC
CC
CC C
CC
C
C
C
CCC
C
C
C
C
CCC C
CCC
CC C
C
CC C
CC
C
CC
C
C
C CC CCC
C
CC C C
C CC
CCC
C CCC C CC
CCC C
CC
C
C
C
CCC
C
CC
CCC C
CC
CC
C
C
CC
CCC
CCC
C
CC
C
C
CC
CC
CC
CCC C
C
CC
C
C
CC C
C
CC
CCC
CCC
CC C
CCC CC
CCC
C
C C
C
C
CC
C C
C
CC
CC
CC
C
CC
C
C
CC
C C
C
CCC C
CC
C
C
CCC
C
CC
CCC C
CC C
CC
CCC
C
C
C
CC
CCCCC
C
C
CC
CC
C C
C
C CCCC
C CCCC
C
C
CC
CC
CC
C C
C
CC C
CC
C CCC
C CC
C
CC C
CC C
C CC C
C
C C
CCC
CCCC
CC
C
CC CCC C
C
CC CCC
C
CC
CC
C
CC
CC C
CCC
CC
C
C
CC
C
CCC
C
C
C
C
CC
CCCC
C
CC C
C
C
CCCC CC
CC
C
C
C
CC
CC CC
C
CCC
C CC C CC CC
C
CC
C
C
C
CC
C
CC
C CC
C
CCC
C
C CC
C
C
CC
C CCC
CC C
CC
C
C
C
C
CCC
CC
CCC
C
CC
C
C
CC C
CC
C
C
C
C C
CC
CC
C
CC
CCCCC
CCC C
CC
C
CCC CC CC
C
C
C C
C
C
C
CCC C
CCC
CC
CC C
C
CC CC
C CC
CC
C C C
C
C C
CCC
C
C
CC
CCCC
C
CC
C
CC CC
C
CCCC
CC C
C C
CC C
C
C
C
CC C
C
CC
C
CC
C
CC
C
C
CC
CC
C CCC C
CCC C C
CC
C
CCCC
CC CCC
C
CC C CCCC
CCC
C
CC
C
CC
CC CCC
C CC
C CCC
CC
C
CC
CC
C C
C
C
C
CC
CC CC
C
C
C
C
CC
C
CC
CC
CC C
CCC C
C
CCC C
C
CC
CCCC
CC C
CCC
C CC C
C
C CC
C
CCC
CC
CC
CC C
CCC
C
C CCC
CCCC CC C
CC
CC CCC CC
C
C
CCCC
C
CC
CCCC
CC
C CCCC
C
C
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Machine Learning More
• MachineLearning CRAN viewhttp://cran.r-project.org/web/views/MachineLearning.html
• The caret package is a good one.
• Elements of Statistical Learninghttp://www-stat.stanford.edu/ tibs/ElemStatLearn/
• Machine Learning (Mitchell)http://www.cs.cmu.edu/ tom/mlbook.html
• Video Lectureshttp://videolectures.net/
Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
Questions?