If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp...
Transcript of If You Can’t Beat ’EmGrowth 2010 2012 2014 2016 0 100 200 300 400 500 600 700 Growth of Rcpp...
If You Can’t Beat ’Em …
Some Comments on Using R Along With Other Languages
Dirk Eddelbuettel
31 July 2016
Open Source Statistical Software for Data Science — Invited PapersJoint Statistical Meetings (JSM), Chicago, IL
JSM 2016 1/27
Welcome to Lollapalooza!
JSM 2016 2/27
Overview
Content
• Single- or Multi-Language ?
• Interlude
• Empirics
• Illustration
• Conclusion
JSM 2016 3/27
Single- Or Multi-Language ?
JSM 2016 4/27
Claim: 1 + 1 > 2
Better with more than one?
• No one language fits all
• Real-world projects are frequently multi-language
• See e.g. job ads which rarely ever list just one language
JSM 2016 5/27
Counter-claim: 1 + 1 < 2
Or better with just one?
• Mental switching cost between languages? Possibly
• Interop difficult and less portable? Maybe, but that is anargument against weak systems / OSs
• Easier / less to learn?
• “More hoops” to code?
JSM 2016 6/27
Mental switching costs?
JSM 2016 7/27
So which one is it?
Open Question
• Hard to measure or test: Any empirics on real world projects?
• Code competition / comparisons (e.g. Project Euler): Are theyrealistic?
JSM 2016 8/27
Interlude
JSM 2016 9/27
John Chambers
Chambers (2008) Software ForData AnalysisChapters 10 and 11 devoted toInterfaces I: C and Fortran andInterfaces II: Other Systems.
JSM 2016 10/27
John Chambers
Chambers (2016) Extending RAn entire book about this withconcrete Python, Julia and C++code and examples
JSM 2016 11/27
John Chambers
Chambers 2016, Chapter 4
The fundamental lesson about programming in the large isthat requires a correspondingly broad and flexibleresponse. In particular, no single language or softwaresystem os likely to be ideal for all aspects. Interfacingmultiple systems is the essence. Part IV explores thedesign of of interfaces from R.
JSM 2016 12/27
Chamber 1976
Thanks to John Chambers for ascanned copy of this historic sketch.
JSM 2016 13/27
Empirics
JSM 2016 14/27
Growth
2010 2012 2014 2016
010
020
030
040
050
060
070
0
Growth of Rcpp usage on CRAN
n
Number of CRAN packages using Rcpp (left axis)Percentage of CRAN packages using Rcpp (right axis)
010
020
030
040
050
060
070
0
2010 2012 2014 2016
02
46
8
JSM 2016 15/27
Pagerank
library(pagerank) # github.com/andrie/pagerank
cran <- ”http://cloud.r-project.org”
pr <- compute_pagerank(cran)round(100*pr[1:5], 3)
## Rcpp MASS ggplot2 Matrix mvtnorm## 2.452 1.771 1.088 0.920 0.749
JSM 2016 16/27
Pagerank
bootrglzoodata.tablenlmeRCurlcodaXMLforeachreshape2jsonliteRcppArmadillodplyrigraphspstringrhttrlatticeplyrsurvivalmvtnormMatrixggplot2MASSRcpp
0.005 0.010 0.015 0.020 0.025
Top 25 of Page Rank as of July 2016
JSM 2016 17/27
Pagerank
Rcpp
MASS
ggplot2
Matrix
mvtnorm
lattice
plyr
stringr
RColorBrewer
dichromat
munsell
labeling
stringi
magrittr
colorspace
digest
gtable
reshape2
scales
sp
MatrixModels
graph
SparseM
sfsmisc
→→→
ImportsLinkingToEnhances
Top 5 packages by page rank
JSM 2016 18/27
Illustration
JSM 2016 19/27
Using R to C++ to Boost to Python, and back
Setup
py_cflags <- system(”python2.7-config --cflags”, intern=TRUE)se <- Sys.setenv; ge <- Sys.getenv # shorthands to typesetse(”PKG_CFLAGS”=sprintf(”%s %s”, ge(”PKG_CFLAGS”), py_cflags))se(”PKG_CXXFLAGS”=sprintf(”%s %s”, ge(”PKG_CXXFLAGS”), py_cflags))py_ldflags <- system(”python2.7-config --ldflags”, intern=TRUE)se(”PKG_LIBS”=sprintf(”%s %s %s”, ge(”PKG_CFLAGS”),
”-lboost_python-py27”, py_ldflags))
JSM 2016 20/27
Using R to C++ to Boost to Python, and back
#include <Rcpp.h>#include <Python.h>
// [[Rcpp::export]]void initialize_python() {
Py_SetProgramName(””); /* optional but recommended */Py_Initialize();
}
// [[Rcpp::export]]void hello_python() {
PyRun_SimpleString(”from time import time,ctime\n””print ’Today is’,ctime(time())\n”);
}
JSM 2016 21/27
Using R to C++ to Boost to Python, and back
Hello, World: Called from R
initialize_python()hello_python()
## Today is Sat Jul 30 13:38:01 2016
More at http://gallery.rcpp.org/articles/rcpp-python/Disclaimer: For illustration purposes. Works as designed on Ubuntu. Not meant to be universally portable to all three OSs.
JSM 2016 22/27
Conclusion
JSM 2016 23/27
Being Polyglot
Mixing Languages
• Common
• Natural
• Unavoidable
JSM 2016 24/27
Being Polyglot
Consequences
• Must make it easier to interoperate
• Stop bickering among ourselves
• Build systems that are larger that the sum of their parts
JSM 2016 25/27
Being Polyglot
Just Do It
JSM 2016 26/27
Appendix
Lars Wirzenius “Which license is the most free?”Free software licences can be roughly grouped into permissive and copyleft ones.[…] A permissive licence lets you do things that a copyleft one forbids, so clearly thepermissive licence is more free. A copyleft licence means software using it won’tever become non-free against the wills of the copyright holders, so clearly acopyleft licence is more free than a permissive one.
Both sides are both right and wrong, of course, which is why this argument willcontinue forever. […]
If a discussion about the relative freedom of licence types becomes heated, stepaway. It’s not worth participating anymore.
http://yakking.branchable.com/posts/comparative-freeness/
JSM 2016 27/27