R for Data Analysis and Data Mining
description
Transcript of R for Data Analysis and Data Mining
![Page 1: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/1.jpg)
©UFS
R for Data Analysis and Data Mining
Jianping Liu
Mar 19, 2014
![Page 2: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/2.jpg)
2
Outline
• R and RStudio installation
• Basics of R : data types and operators
• R for Statistical Analysis and Data mining
![Page 3: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/3.jpg)
3
What is R?
• “a language and environment for statistical computing and graphics”; a combination of statistical packages ( interactive statistical analysis) and a programming language
• a dialect of the S language that was developed at AT&T Bell Laboratories by Rick Becker, John Chambers and Allan Wilks in 90’s
• Run on multiple platforms and various devices: MacOS, Windows, Linux, PC, iPhone …
• Frequent releases and bugfix; active development
• Free
![Page 4: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/4.jpg)
Installation of R and Resources online
• http://www.r-project.org/
• http://www.rseek.org/
• http://www.rstudio.com/
• http://www.rdatamining.com/
• http://www.ats.ucla.edu/stat/r/
# R download & installation
# RStudio installation
# web-based R search
• http://cran.r-project.org/doc/manuals/R-intro.html
# data mining examples
# Stat analysis examples
• http://www.coursera.org # R Programming start 4/7/2014
![Page 5: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/5.jpg)
5
RStudio : an integrated development environment for R
![Page 6: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/6.jpg)
6
The uses of R
• R may be used as a calculator• R provide numerical or graphical summaries of data• R has extensive graphical abilities• R will handle a variety of specific analyses• R is an interactive programming language
• Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)• S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
useofR
![Page 7: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/7.jpg)
7
Packages
• Install.packages(“name of the package”)
• library(pkg)
• detach(“package:pkg”)
• update.packages(“”)
Example:
install.packages(“sos”)
library(sos)
Alert: R is case sensitive
![Page 8: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/8.jpg)
8
Getting help and info
• help(package=“sos”) #documentation on topic• ?'&&'• ??audit• help.search("time series")• library(sos)• findFn("time series")• example(data.frame)• demo(lm.glm, package=“stats”, ask=T)
helpsearch.R
![Page 9: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/9.jpg)
9
Data Types and Basic Operations
R has five “atomic” classes of Objects:• Character• Numeric (real numbers)• Integer• Complex• Logical(True/False)The most basic object is a vector• A vector contain objects of the same class : c()• A list can contain objects of various classes: list()
![Page 10: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/10.jpg)
10
Data Types and Basic Operations
Matrices are vectors with a dimension attribute.• The dimension attribute is itself an integer vector of length 2
(nrow, ncol)• Matrices are constructed column-wise, or specify row-wise
Factors are used to represent categorical data.• Factors can be unordered or ordered.• One can think of a factor as an integer vector where each
integer has a label.
![Page 11: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/11.jpg)
11
Data frames are used to store tabular data
• They are fundamental to the use of the R modelling and graphics functions
• They are represented as a special type of list where every element of the list has to have the same length
• Unlike matrices, data frames can store different classes of objects in each column (just like lists); matrices must have every element be the same class
• Data frames are usually created by calling read.table() or read.csv()
• Can be converted to a matrix by calling data.matrix()
Data Types and Basic Operationsdatatypes
![Page 12: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/12.jpg)
12
R for Regression Analysis
http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf Faraway_practical linear model
logitRegression.R
• Regression analysis is the analysis of the relationship between a response or outcome variable and another set of variables
• The relationship is expressed through a statistical model equation that predicts a response variable (also called a dependent variable or criterion) from a function of explanatory variables (also called independent variables, predictors, factors, or carriers) and parameters
![Page 13: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/13.jpg)
13
R for Time series Analysis
• Introductory Time Series with R
• Time Series Analysis and Its Applications: With R Examples (3rd ed) by R.H. Shumway and D.S. Stoffer. Springer Texts in Statistics, 2011(package: astsa)
http://www.stat.pitt.edu/stoffer/tsa3/
http://elena.aut.ac.nz/~pcowpert/ts/#RScripts
![Page 14: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/14.jpg)
14
R Reference Card
R_referencecard_2.0
R_referencecard_regression
R _referencecard_timeseries
R_referencecard_data_mining
![Page 15: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/15.jpg)
15
Data Mining with Rattle
# to install package rattle and load the GUI
install.packages("rattle", dependencies = c("Depends", "Suggests"))library(rattle)rattle()
• Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery (Use R!) by Graham Williams • http://www.r-project.org/doc/bib/R-books.html
![Page 16: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/16.jpg)
16
Drawbacks of R
• Little support on dynamic or interactive graphics
• Objects must generally be stored in physical memory
• Functionality is based on consumer demand and user distribution
• Not ideal for all situations
![Page 17: R for Data Analysis and Data Mining](https://reader033.fdocuments.in/reader033/viewer/2022061615/568161d6550346895dd1da98/html5/thumbnails/17.jpg)
17
Thank you !