Data analysis with R and Julia
-
Upload
mark-tabladillo -
Category
Business
-
view
111 -
download
3
description
Transcript of Data analysis with R and Julia
![Page 1: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/1.jpg)
Data Analysis with R and Julia Advanced Analytics and Insights
Mark Tabladillo Ph.D., Data Mining Scientist, MarkTab Inc.
![Page 2: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/2.jpg)
NetworkingInteractive
![Page 3: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/3.jpg)
About MarkTabTraining and Consulting with http://marktab.com
Data Mining Resources and Blog at http://marktab.net
Twitter @marktabnet
![Page 4: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/4.jpg)
OutlineR Language
Market Analysis
Performance
Production Use
Julia Language
Performance
![Page 5: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/5.jpg)
The R Languagehttp://cran.r-project.org
![Page 6: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/6.jpg)
Major R VersionsVersion Description
01996
Initial release: University of Auckland, New Zealand
12000
Completeness and stability high enough to characterize a full statistical system, which could be put to production use
2 2004
Strong enhancements of the memory management subsystem as well as several major features, including Sweave (into LaTeX or LyX).
32013
The inclusion of long vectors (containing more than 2^31-1 elements!). Also, we now have 64 bit support on all platforms, support for parallel processing, the Matrix package
http://www.r-project.org/
![Page 7: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/7.jpg)
How R WorksAs with an automobile, you can use R without worrying very much about how it works.
But computing with data is more complicated than driving a car (fortunately for highway safety)
John Chambers
Software for Data Analysis, page 453
![Page 8: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/8.jpg)
R works in a shellCross-platform, including Windows x32 or x64
Interactive graphical user interface (GUI) to interpret commands
Read – accept user input
Parse -- interpret input using expected syntax
Evaluate – execute commands
Everything is an object
Data are stored in data frames, named lists
R implements S language grammar, with a few extensions
![Page 9: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/9.jpg)
R GUI
![Page 10: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/10.jpg)
Read-Parse-Evaluate Loop
Read
ParseEvaluate
![Page 11: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/11.jpg)
R and SQL Serverinstall.packages("RODBC")
library(RODBC)
MDAC Downloads
![Page 12: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/12.jpg)
R Market Analysis
![Page 13: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/13.jpg)
Listserv Discussion
http://r4stats.com/articles/popularity/
![Page 14: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/14.jpg)
Estimated R UsageEstimated 250,000 people use it regularly (as of 2009)
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?pagewanted=2&_r=0
![Page 15: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/15.jpg)
General Forum Postings
http://r4stats.com/articles/popularity/
![Page 16: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/16.jpg)
Stack Overflow Alone
http://r4stats.com/articles/popularity/
![Page 17: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/17.jpg)
Academic Publications
http://r4stats.com/articles/popularity/
![Page 18: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/18.jpg)
Comparison of R, Matlab, SAS, Stata, SPSS
http://www.analyticbridge.com/group/productreviews2/forum/topics/product-reviews-comparing-r-matlab-sas-stata-spss
![Page 19: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/19.jpg)
R Performance
![Page 20: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/20.jpg)
R is Memory-Bound𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒
4= 𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑅 𝐷𝑎𝑡𝑎
Source: Joseph B. Rickert, February 14, 2013
64𝑏𝑖𝑡 𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒 = 𝑅𝐴𝑀
32𝑏𝑖𝑡 𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒 = 𝑈𝑠𝑒𝑟 𝑉𝑖𝑟𝑡𝑢𝑎𝑙 𝑀𝑒𝑚𝑜𝑟𝑦 − 0.5𝐺𝐵 ≅ 2 𝐺𝐵
Source: http://cran.r-project.org/bin/windows/base/rw-FAQ.html retrieved March 1, 2013
![Page 21: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/21.jpg)
R is Memory-BoundAll objects in an R session are stored in memory
R places a limit of 231 − 1 bytes on all object sizes, independent of RAM
The Art of R Programming, Norman Matloff
![Page 22: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/22.jpg)
R Memory ManagementAutomatic including garbage collection
rm()removes object assignment, but does not delete memory
gc() forces garbage collection with substantial computation
![Page 23: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/23.jpg)
Improving Performance
The Art of R Programming, Chapter 14, Norman Matloff
Power
Simplicity
Vectorization Byte-Code Compilation
Parallel RC/C++
![Page 24: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/24.jpg)
Improving PerformanceMethod Description
C/C++ Call C programs from R
Vectorization Recode for vectorization replacing slower functions
Byte-code compilation cmpfun()
Parallel R parallel packagehttp://cran.r-project.org/web/views/HighPerformanceComputing.html
![Page 25: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/25.jpg)
Improving PerformanceRprof()– measures speed of functions
ff – memory-efficient storage of large data on disk and fast access functions
bigmemory -- Manage massive matrices with shared memory and memory-mapped files
![Page 26: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/26.jpg)
R for Production Use
![Page 27: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/27.jpg)
Derivative ProjectsRStudio – Integrated Development Environment (IDE)
Rattle – Data Mining Package
RExcel – (Statconn) Connection between R and Excel
Weka – Java-based data mining, statistical analysis by R
RapidMiner – Java-based Weka data mining, statistical analysis by R
Revolution Analytics – Scaling R for the Enterprise
Oracle R Enterprise – Integrated into Oracle
![Page 28: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/28.jpg)
About Statconn (as of March 2013)Produces RAndFriends under noncommercial and commercial licenses
All the statconn tools work ONLY with 32-bit R
statconnDCOM
rcom (GPL2, but requires statconnDCOM)
RExcel 3.2.9 (ONLY 32-bit Office: 2003, 2007, 2010)
http://rcom.univie.ac.at/
![Page 29: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/29.jpg)
Sample Projects Using RThe Heritage Health Prize, Thomas Nguyen
A Direct Marketing In-flight Forecasting System, Shannon Terry & Ben Ogorek
Mining Twitter for Airline Consumer Sentiment, Jeffrey Breen
Alternative Data Sources for Measuring Market Sentiment and Events (Using R), Joe Rothermich
![Page 30: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/30.jpg)
The Julia Languagehttp://julialang.org/
![Page 31: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/31.jpg)
About JuliaHigh-level, high-performance dynamic open-source programming language for technical computing
Syntax similar to other technical computing environments
Features
Sophisticated compiler
Distributed parallel execution
Numerical accuracy
Extensive mathematical function library
Uses C, C++, Fortran libraries extensively
![Page 32: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/32.jpg)
Why Julia: “Because we are greedy”
http://julialang.org/blog/2012/04/nyc-open-stats-meetup-announcement/
![Page 33: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/33.jpg)
Julia CommunityHosted on github
550 mailing list subscribers (Google Groups)
1,500 github followers
190 forks
50 total contributors
As of September 2012, all contributors except the core developers had known of the language for six months or less
Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski, Shah, Edelman
![Page 34: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/34.jpg)
The Julia Manual
http://docs.julialang.org/en/latest/manual/
![Page 35: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/35.jpg)
Julia Mathematical Functions
http://docs.julialang.org/en/latest/manual/mathematical-operations/
![Page 36: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/36.jpg)
Julia Standard Library
http://docs.julialang.org/en/latest/stdlib/
![Page 37: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/37.jpg)
Julia Performance
![Page 38: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/38.jpg)
Key Ingredients of Julia PerformanceRich type information, provided naturally by multiple dispatch
Aggressive code specialization against run-time types
Julia’s LLVM-based just-in-time (JIT) compiler
Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski, Shah, Edelman
![Page 39: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/39.jpg)
Julia Performance Comparison
http://julialang.org/
![Page 40: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/40.jpg)
Julia Performance Comparison
Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski, Shah, Edelman
![Page 41: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/41.jpg)
Julia RecommendationsThe software is ready for people already using C or Fortran
The software will develop into a usable scripting language for R users
Wait until version one for production use
![Page 42: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/42.jpg)
Send me Your Questionshttp://marktab.net
![Page 43: Data analysis with R and Julia](https://reader034.fdocuments.in/reader034/viewer/2022052504/54c6d7584a7959b7618b458c/html5/thumbnails/43.jpg)
ConclusionR provides production-ready software for statistical analysis
Julia merits personal investment and promises high performance