Introduction To R
-
Upload
michael-driscoll -
Category
Technology
-
view
2.717 -
download
0
description
Transcript of Introduction To R
Visualization and Analysis of Big Data with the R Programming Language
Michael E. Driscoll, Ph.D.Presented to AmyrisApril 2009
“The sexy job in the next ten years will be statisticians.”
– Hal Varian, Chief Economist, Google
What can it do?• data manipulation• statistics• visualization
Why is it different?• created by statisticians• free, open source• extensible via packages
What is R?
Statistical Analysis
• hypothesis testing• model fitting• clustering• machine learning
Data Visualization
What is R?
Data Manipulation
• database connectivity• slicing & dicing data cubes
Statistical analysis
• fit models for the distributions of expression values
• test hypotheses about outliers
• cluster genes with similar patterns
Visualization of hybridization artifacts
I. Taming Microarray Data with Bioconductor
http://www.bioconductor.org
1million transactions during this presentation
Statistical analysis
• every customer has a history of product purchases
• hierarchically cluster products and customers
• other approaches (depending on goals): singular value decomposition
Which products are ordered together?
II. Clustering Product Purchases
2 billion clicks during this presentation
Statistical analysis
• estimate posterior distributions for click rates from observed data
• test hypothesis that the click-rate of a given ad A is greater than for ad B
How confident are we that B beats A?
III. Optimizing Online Advertising
IV. A Tale of Two PitchersH
amel
sW
ebb
“The best thing about R is that it was developed by statisticians. The worst thing about R is that…
it was developed by statisticians.”– Bo Cowgill, Google
R Nuts and Bolts
Data Manipulation
Getting Data InSQL• MySQL• ODBC (Oracle, MS-SQL)ExcelMatlab
Getting Data OutData formats:• Delimited (CSV, Excel)• MatlabGraphic formats:• Vector (PDF, EPS, SVG)• Raster (PNG, TIFF)
driver <- dbDriver("MySQL")con <- dbConnect(driver,user=“tgardner”, password=“julien05”,host=“data.amyris.com”, dbname=“biofx”)resultSet <- dbSendQuery(con, “SELECT * FROM assay”)data <- fetch(resultSet, n=-1)
Statistical Methods
Extending R with Packages
CRAN http://cran.r-project.org
• ~ 2000 packages• organized by field• easy to install > install.package( “lattice”)
R Packages: Beautiful Colors with Colorspace
library(“Colorspace”)red <- LAB(50,64,64)blue <- LAB(50,-48,-48)mixcolor(10, red, blue)
R Packages: Creating Panel Plots with Lattice
library(“Lattice”)xyplot(x ~ y | pitch_type, data = gameday)
Getting Started
Download at R-project.org Choose a UI• Emacs – ESS• JGR – Java GUI for R• Rattle
http://www.r-project.org
Getting Help
Books Online• use inline help> ?plot
• search /post at R-helphttp://tolstoy.newcastle.edu.au/R
Modern Applied Statistics with SW.N.Venables & B.D. Ripley
http://www.springer.com/series/6991 Use R series includes 20 volumes
Data
Desktop
Which is Easier?
Coding Clickingor
R-Based Dashboards
A Simple Script
setContentType("text/html")png("/var/www/hello.png")plot(sample(100,100),col=1:8,pch=19)dev.off()cat("<html>")cat("<body>")cat("<h1>hello world</h1>")cat('<img src="../hello.png"')cat("</body>")cat("</html>")
Download Jeff Horner’s Rapache at http://biostat.mc.vanderbilt.edu/rapache/
R-Based Dashboards
http://labs.dataspora.com/gameday