English R Lightning Talks @ BURN (2014-04-22)
-
Upload
rapporternet -
Category
Documents
-
view
62 -
download
0
description
Transcript of English R Lightning Talks @ BURN (2014-04-22)
![Page 1: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/1.jpg)
23 April 2014
![Page 2: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/2.jpg)
László Gönczy:Exploratory data analysis:
project experience and ongoing developments
Quanopt
![Page 3: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/3.jpg)
![Page 4: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/4.jpg)
![Page 5: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/5.jpg)
![Page 6: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/6.jpg)
![Page 7: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/7.jpg)
![Page 8: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/8.jpg)
![Page 9: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/9.jpg)
![Page 10: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/10.jpg)
![Page 11: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/11.jpg)
![Page 12: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/12.jpg)
![Page 13: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/13.jpg)
![Page 14: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/14.jpg)
![Page 15: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/15.jpg)
![Page 16: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/16.jpg)
![Page 17: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/17.jpg)
![Page 18: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/18.jpg)
László Gönczy:Exploratory data analysis: project
experience and ongoing developments
Gergely Horváth:R workshop in Bucharest
KSH
![Page 19: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/19.jpg)
buchaRest
![Page 20: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/20.jpg)
Literature
![Page 21: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/21.jpg)
Romania - organizer
![Page 22: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/22.jpg)
Girafe and church
![Page 23: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/23.jpg)
Big-big professor
![Page 24: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/24.jpg)
Mr V. Tepes alias Dracula
![Page 25: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/25.jpg)
Hungary
![Page 26: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/26.jpg)
![Page 27: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/27.jpg)
Serbia
![Page 28: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/28.jpg)
Ancient hero - Traian
![Page 29: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/29.jpg)
Austria
![Page 30: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/30.jpg)
![Page 31: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/31.jpg)
Romania
![Page 32: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/32.jpg)
RO – GB - NL
![Page 33: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/33.jpg)
![Page 34: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/34.jpg)
Gergely Horváth:R workshop in Bucharest
Quanopt
Imre Kocsis:Bigvis: plotting
(relatively) large data in R
![Page 35: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/35.jpg)
Budapesti Műszaki és Gazdaságtudományi EgyetemMéréstechnika és Információs Rendszerek Tanszék
Bigvis: plotting (relatively) large data in R
Kocsis Imre
BURN Meetup, 2014.04.22.
![Page 36: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/36.jpg)
Let’s do Exploratory Data Analysis!
„Flight data”
2008: 113MB df
~7 million x 29
> system.time(print((qplot(data=b,
x=Distance,y=AirTime))))
user system elapsed102.2 60.2 163.5
![Page 37: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/37.jpg)
SotA
Relatively PainlessVisual EDA
Relatively PainlessHandling of Big Data
[…]
[…]
![Page 38: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/38.jpg)
bigvis
From Hadley Wickham
A rather generic approach
o Paper: vita.had.co.nz/papers/bigvis.pdf
o Slides: files.meetup.com/1406240/bigvis.pdf
A reference implementation in R
o ggplot2 gets a huge boost
o GitHub: hadley/bigvis
![Page 39: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/39.jpg)
Big Data EDA?
Subsampling is a hassle.
You probably want…
0. For the whole data
1. Summary statistics over
2. Interval-binned data
+ Error approx. would be nice
+ Supress outliers (or not)
![Page 40: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/40.jpg)
Put in pictures…
ggplot2 bigvis
Few seconds
![Page 41: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/41.jpg)
bigvis (simplified) workflow
bin()
Data in memory
bin()
condense()
bin() Interval binning
count, sum, mean, median, sd
![Page 42: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/42.jpg)
bigvis (simplified) workflow
condense()
smooth()
peel()
count, sum, mean, median, sd
smooth out errors
peel off outliers
![Page 43: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/43.jpg)
… and then plot with ggplot
![Page 44: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/44.jpg)
Some other aspects
Some further automatic magic with KDE
Relative error estimation with alpha / hues
Vis. patterns for (n, m)-d datasets
o n: # of binned variables
om: # of summaries
o Dens. estimate: (1,1)-d, earlier: (2,1)-d
![Page 45: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/45.jpg)
Parallelization & decoupling?
The pattern can scale bymoving out concerns from R
bin: see MapReduce
Some formulations easy forstream proc., too
bin
data
summarize
smooth
visualize
![Page 46: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/46.jpg)
Parallelization & decoupling?
Summary: depends…
Distributive stats: count, sum, min, max
Algebraic stats: mean, sd, higher moments
Holistic…? (quantiles, countdistinct)
bin
data
summarize
smooth
visualize
![Page 47: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/47.jpg)
Parallelization & decoupling?
Input: mostly „resolution” bound
R excels here
bin
data
summarize
smooth
visualize
![Page 48: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/48.jpg)
Towards interactive EDA?
Bin-summarize-smooth can be still long…
Precompute/cache…
… and e.g. update after new batches
Raw data-at-rest
RDBMS / in-memory summarized data
client
![Page 49: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/49.jpg)
Imre Kocsis:Bigvis: plotting (relatively)
large data in R
András Tajti:Changing User Roles in
an Online Forum
![Page 50: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/50.jpg)
Changing User Roles in an Online Forum
András Tajti
BURN meetup
04.23.2014.
![Page 51: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/51.jpg)
Questions
1. Can we declare patterns in user behaviour?
2. Can we detect the change of the behaviour?
Of course, we can!
I will show you one way...
![Page 52: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/52.jpg)
Theoretical tools
● You need features to describe behaviour:– Network science
● You need to find the most important variables:– Principal component analysis
● You need to find users with similar behaviour:– Cluster analysis
![Page 53: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/53.jpg)
Practical tools
● To do all the computations, I used R packages:– Igraph for extracting network features– PcaPP and rrcov for PCA– Fpc for cluster evaluation
● Of course, basic R functions were used mostly:– Princomp for PCA– Hclust for hierarchical clustering– Compiler package for faster computation
![Page 54: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/54.jpg)
How does a forum look like?
● One post is either a reply to another or not:– One post has maximum one out-degree– Can have several in-degrees as any later post can
refer to it.
![Page 55: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/55.jpg)
Users' features
● To describe behaviour, I used:– Number of posts– Number of neighbours– Parent users in- and outdegree– All above as ranks and relative ranks
![Page 56: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/56.jpg)
Choosing important features
● Main problem: all variables have heavy-tailed distribution– Principal component is best for normally
distributed variables– Alternatives:
● Robust correlation estimation● Projection pursuit methods
– Winner: ROBpca from rrcov as PcaHubert– Mostly the same results as the original Princomp
![Page 57: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/57.jpg)
Searching groups
● Cluster analysis:– Hierarchical, with euclidean distance and
complete linkage– Used on the PCA scores increased with explained
variance– Technical limits on the number of clusters:
● Min.: 2 (the result contained groupings with at least three grous)
● Max: 30 (was reached a few times)
![Page 58: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/58.jpg)
Selecting cluster numbers
● For every goodness measure, I was looking for– First local minimum/maximum– Sharpest “elbow”
![Page 59: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/59.jpg)
Select by eye
![Page 60: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/60.jpg)
What is changing?
● I used “time windows“ to slice the data● One window contained 1000 posts and their
full thread● I ran role detection for all sets● Than compared memberships between
clusters
![Page 61: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/61.jpg)
How to compare memberships?
● There are users only in one or the other dataset
● Two groups are similar if they have significant number of common users:
![Page 62: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/62.jpg)
Example
![Page 63: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/63.jpg)
Example
![Page 64: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/64.jpg)
Thank You!
[email protected]@atajti
The code will be availabe at github.com/atajti/changingForumRoles
![Page 65: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/65.jpg)
András Tajti:Changing User Roles in an Online Forum
MTA
Dénes Tóth:Dilemmas in package development:
interactive visualization, GUIs, largish data, extensibility
![Page 66: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/66.jpg)
Dilemmas in package development:
Dénes Tóth
interactive visualization, GUIs, largish data, extensibility
BURN Meetup 1 / 15Budapest, 23.04.2014.
![Page 67: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/67.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 2 / 15Budapest, 23.04.2014.
• Electroencephalography (EEG)– Voltage fluctuations (μV) recorded at the scalp– A typical setup: 32-128 channels, 500-1000 Hz sampling
rate, 30-90 minutes recording time, 20-30 participants → 200 MB – 2 GB / participant
– Tasks: raw data import + signal processing (filtering, resampling, artifact correction [e.g. eye movements])
– Visual inspection is unavoidable → interactive visualization is a must
EEG
![Page 68: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/68.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 3 / 15Budapest, 23.04.2014.
![Page 69: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/69.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 4 / 15Budapest, 23.04.2014.
• Cognitive experiments: what does the brain do if exposed to A versus B
– EEG & events = Event-related potentials (ERP)– 40-200 repetitions per condition, factorial design (Fac1 x
Fac2)– Tasks: marker handling, segmentation, artifact rejection,
averaging, time-frequency analyses → extract components & do statistics (e.g. clustering, ANOVA, etc.) → tremendous number of analytic possibilities
– randomization statistics (e.g. 5000 permutations)
ERP
![Page 70: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/70.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 5 / 15Budapest, 23.04.2014.
![Page 71: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/71.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 6 / 15Budapest, 23.04.2014.
• no dedicated comprehensive package in R for EEG, but a lot of related packages (e.g. signal, mfilter, icaOcularCorrection + one trillion statistical methods)
• Present (eegR)– Base data class: array– Basic operation: apply-like– ~60 functions, ~4000 lines → appropriate for a specific
workflow– No cohesive system
eegR package
![Page 72: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/72.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 7 / 15Budapest, 23.04.2014.
• Future (dream :)– Covers all basic analytic steps + highly extensible– Provides Workflow + GUI + scripting– Handles well out of memory datasets, easy parallelization– Interactive visualization capabilities
eegR package
![Page 73: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/73.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 8 / 15Budapest, 23.04.2014.
• One package– Pros: easy install process, better tuning options– Cons: less general, harder to extend
• One core package + extensions– Pros: anyone can write extensions, easy to invoke other
packages– Cons: the core package must be very well written
Question I. One package or related packages?
![Page 74: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/74.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 9 / 15Budapest, 23.04.2014.
• Range– Introduce only classes, methods and utility functions, or
provide a basic stand-alone package?
• Classes– S3 / S4 / R5 ?
Question I/a. How to write a good core?
![Page 75: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/75.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 10 / 15Budapest, 23.04.2014.
Workflow approach: R AnalyticFlow• Pros:
the natural way ofEEG signal processing
unconstrained scripting
• Cons:
reliability?performance?
Question II.What about the user interface?
![Page 76: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/76.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 11 / 15Budapest, 23.04.2014.
• GUI coverageFull GUI ←→ subtask- (function-) related GUI
• GUI type– Desktop GUI ←→ web based GUI– gWidgets2 |
gWidgetsWWW2 | Shiny |
...
Question II.What about the user interface?
![Page 77: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/77.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 12 / 15Budapest, 23.04.2014.
• SciDB would be great, but only available on Linux• Two candidates: ff & gdsfmt packages
– ff package: more comprehensive– gdsfmt package: lightweight & fast
Question III.Which out-of-RAM package to choose?
![Page 78: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/78.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 13 / 15Budapest, 23.04.2014.
• iPlots, playwith, ggvis etc.: good, but not efficient• Performance issues: a 10-sec part can contain
128 x 1000 x 10 = 1.280.000 data points• Candidates for line plots:
– Acinonyx– rCharts w. Dygraphs
Question IV.Interactive visualization?
![Page 79: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/79.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 14 / 15Budapest, 23.04.2014.
• Acinonyx (iPlots Extreme)– Pros: very fast, iContainer– Cons: very poor documentation, not on CRAN
• rCharts and other web-based tools, esp. JavaScript libraries
– Dygraphs: fast and nice, but no official port to rCharts– Communication between JS and R?
Question IV.Interactive visualization?
![Page 80: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/80.jpg)
Dénes Tóth / [email protected] MTA TTK KPI/ humlab.cogpsyphy.hu
BURN Meetup 15 / 15Budapest, 23.04.2014.
Thank you!
Q1: One package or related packages?
Q1a: What should the base package cover? Do I need S4 or R5?
Q2: User interface? → R AnalyticFlow, GUIs
Q3: How to handle out-of-memory data?
Q4: Interactive visualization?
![Page 81: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/81.jpg)
Dénes Tóth:Dilemmas in package development:
interactive visualization, GUIs, largish data, extensibility
rapporter.net
Gergely Daróczi:pander: Transforming R objects
to Pandoc’s markdown
![Page 82: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/82.jpg)
pander: A Pandoc writer in RTransforming R objects to Pandoc’s markdown
Gergely Daró[email protected]
Budapest Users of R Network
23 April 2014
![Page 83: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/83.jpg)
What is pander?A collection of helper functions to print markdown syntax
> ?pandoc.(footnote|header|horizontal.rule|image|link|p)(.return)?> ?pandoc.(emphasis|strikeout|strong|verbatim)(.return)?
> pandoc.strong(’foobar’)**foobar**
> pandoc.strong.return(’foobar’)[1] "**foobar**"
> pandoc.header(’foobar’, level = 2)
## foobar
> pandoc.header(’foobar’, style = ’setext’)
foobar======
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 2 / 15
![Page 84: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/84.jpg)
What is pander?Collection of helper functions to map R objects to markdown
> ?pandoc.(list|table)(.return)?
> pandoc.list(list(’foo’, list(’bar’)))
* foo* bar
> pandoc.table(head(iris, 2), split.table = Inf)
-------------------------------------------------------------------Sepal.Length Sepal.Width Petal.Length Petal.Width Species
-------------- ------------- -------------- ------------- ---------5.1 3.5 1.4 0.2 setosa
4.9 3 1.4 0.2 setosa-------------------------------------------------------------------
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 3 / 15
![Page 85: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/85.jpg)
What is pander?Collection of helper functions to map R objects to various markdown languages
> pandoc.table(head(iris, 2), split.table = Inf, style = ’rmarkdown’)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species ||:--------------:|:-------------:|:--------------:|:-------------:|:---------:|| 5.1 | 3.5 | 1.4 | 0.2 | setosa || 4.9 | 3 | 1.4 | 0.2 | setosa |
> pandoc.table(head(iris, 2), split.table = Inf, style = ’simple’)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species-------------- ------------- -------------- ------------- ---------
5.1 3.5 1.4 0.2 setosa4.9 3 1.4 0.2 setosa
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 4 / 15
![Page 86: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/86.jpg)
What is pander?Collection of helper functions to map R objects to various markdown languages
> iris$Species <- ’foos and bars’; names(iris) <- gsub(’.’, ’ ’, names(iris)> pandoc.table(head(iris, 4), split.table = Inf, style = ’grid’,+ split.cells = 5, justify = ’left’)
+----------+---------+----------+---------+------------+| Sepal | Sepal | Petal | Petal | Species || Length | Width | Length | Width | |+==========+=========+==========+=========+============+| 5.1 | 3.5 | 1.4 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.9 | 3 | 1.4 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.7 | 3.2 | 1.3 | 0.2 | setosa |+----------+---------+----------+---------+------------+| 4.6 | 3.1 | 1.5 | 0.2 | foos || | | | | and || | | | | bars |+----------+---------+----------+---------+------------+
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 5 / 15
![Page 87: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/87.jpg)
What is pander?S3 method to map R objects to markdown
> ?pander(.return)?
> methods(pander)
[1] pander.anova* pander.aov* pander.cast_df* pander.character*
[5] pander.data.frame* pander.default* pander.density* pander.evals*
[9] pander.factor* pander.glm* pander.htest* pander.image*
[13] pander.list* pander.lm* pander.logical* pander.matrix*
[17] pander.NULL* pander.numeric* pander.option pander.POSIXct*
[21] pander.POSIXt* pander.prcomp* pander.rapport* pander.table*
Non-visible functions are asterisked
> pander(head(iris, 1), split.table = Inf)
-------------------------------------------------------------------
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
-------------- ------------- -------------- ------------- ---------
5.1 3.5 1.4 0.2 setosa
-------------------------------------------------------------------
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 6 / 15
![Page 88: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/88.jpg)
What is pander?S3 method to map R objects to markdown
> pander(letters[1:7])
_a_, _b_, _c_, _d_, _e_, _f_ and _g_
> pander(ks.test(runif(50), runif(50))
---------------------------------------------------
Test statistic P value Alternative hypothesis
---------------- --------- ------------------------
0.18 _0.3959_ two-sided
---------------------------------------------------
Table: Two-sample Kolmogorov-Smirnov test: ‘runif(50)‘ and ‘runif(50)‘
> pander(chisq.test(table(mtcars$am, mtcars$gear)))
---------------------------------------
Test statistic df P value
---------------- ---- -----------------
20.94 2 _2.831e-05_ * * *
---------------------------------------
Table: Pearson’s Chi-squared test: ‘table(mtcars$am, mtcars$gear)‘
Warning message:In chisq.test(table(mtcars$am, mtcars$gear)) :
Chi-squared approximation may be incorrect
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 7 / 15
![Page 89: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/89.jpg)
What is pander?S3 method to map R objects to markdown
> pander(lm(mtcars$wt ~ mtcars$hp), summary = TRUE)
--------------------------------------------------------------
Estimate Std. Error t value Pr(>|t|)
----------------- ---------- ------------ --------- ----------
**mtcars$hp** 0.009401 0.00196 4.796 4.146e-05
**(Intercept)** 1.838 0.3165 5.808 2.389e-06
--------------------------------------------------------------
-------------------------------------------------------------
Observations Residual Std. Error $R^2$ Adjusted $R^2$
-------------- --------------------- ------- ----------------
32 0.7483 0.4339 0.4151
-------------------------------------------------------------
Table: Fitting linear model: mtcars$wt ~ mtcars$hp
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 8 / 15
![Page 90: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/90.jpg)
What is pander?S3 method to map R objects to pretty formatted markdown
> panderOptions(’table.split.table’, Inf)
> panderOptions(’table.style’, ’grid’)
> emphasize.cells(which(iris > 1.3, arr.ind = TRUE))
> pander(iris)
+----------------+---------------+----------------+---------------+------------+
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
+================+===============+================+===============+============+
| *5.1* | *3.5* | *1.4* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *4.9* | *3* | *1.4* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *4.7* | *3.2* | 1.3 | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *4.6* | *3.1* | *1.5* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *5* | *3.6* | *1.4* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *5.4* | *3.9* | *1.7* | 0.4 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *4.6* | *3.4* | *1.4* | 0.3 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *5* | *3.4* | *1.5* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
| *4.4* | *2.9* | *1.4* | 0.2 | setosa |
+----------------+---------------+----------------+---------------+------------+
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 9 / 15
![Page 91: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/91.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew
> ?Pandoc.brew
> Pandoc.brew(text = ’
+ Pi equals to <%= pi %>, and the best damn cars are:
+ <%= head(mtcars, 2) %>
+ ’)
Pi equals to _3.142_, and the best damn cars are:
--------------------------------------------------------
mpg cyl disp hp drat wt
------------------- ----- ----- ------ ---- ------ -----
**Mazda RX4** 21 6 160 110 3.9 2.62
**Mazda RX4 Wag** 21 6 160 110 3.9 2.875
--------------------------------------------------------
Table: Table continues below
--------------------------------------------------
qsec vs am gear carb
------------------- ------ ---- ---- ------ ------
**Mazda RX4** 16.46 0 1 4 4
**Mazda RX4 Wag** 17.02 0 1 4 4
--------------------------------------------------
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 10 / 15
![Page 92: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/92.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew
Features of Pandoc.brew:
brew loops and conditional parts of a report just like with brew,capturing plots and images with automatically applied theme,render all R objects automatically in Pandoc’s markdown,recording all warning/error messages plus the raw R objects alongwith anything printed to stdout and the printed results,custom caching mechanism to disk or RAM with auto-dependecy,convert to HTML/pdf/odt/docx at one go,no chunk options (only workaround),building reports also in interactive session with an R5 reference class.
http://rapporter.github.io/pander/#brew-to-pandoc
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 11 / 15
![Page 93: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/93.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew
Features of Pandoc.brew:
brew loops and conditional parts of a report just like with brew,capturing plots and images with automatically applied theme,render all R objects automatically in Pandoc’s markdown,recording all warning/error messages plus the raw R objects alongwith anything printed to stdout and the printed results,custom caching mechanism to disk or RAM with auto-dependecy,convert to HTML/pdf/odt/docx at one go,no chunk options (only workaround),building reports also in interactive session with an R5 reference class.
http://rapporter.github.io/pander/#brew-to-pandoc
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 12 / 15
![Page 94: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/94.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew – with global options
?panderOptions?evalsOptions
number formatting style (decimal mark, digits, trailing spaces etc.),date format,table formats (split, alignment, caption etc.),vector options (separator, copula, wrapper character),global graph settings for base, lattice and ggplot2 calls:
color palette, font settings, grid,legend poistion, axis labels angle etc.
plot dimensions, resolution,cache options, hooks, filter output etc.
http://rapporter.github.io/pander/#pander-options
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 13 / 15
![Page 95: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/95.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew – with global options
?panderOptions?evalsOptions
number formatting style (decimal mark, digits, trailing spaces etc.),date format,table formats (split, alignment, caption etc.),vector options (separator, copula, wrapper character),global graph settings for base, lattice and ggplot2 calls:
color palette, font settings, grid,legend poistion, axis labels angle etc.
plot dimensions, resolution,cache options, hooks, filter output etc.
http://rapporter.github.io/pander/#pander-options
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 14 / 15
![Page 96: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/96.jpg)
What is pander?Tool for literate programming like Sweave, knitr or brew – a quick comparison
> require(wordcloud)
> pkgs <- ctv:::.get_pkgs_from_ctv_or_repos(’ReproducibleResearch’)[[1]]
> wordcloud(pkgs, rep(1, times = length(pkgs)), colors = rainbow(length(pkgs)),
+ random.color = TRUE)
And pander is intended to be a wrapper around Pandoc,so transforming markdown files to other document formats:> ?Pandoc.convert
> Pandoc.brew(..., convert = ’(html|pdf|odt|docx)’, ...)
Gergely Daróczi (rapporter.net) pander: A Pandoc writer in R 23/4/2014 15 / 15
![Page 97: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/97.jpg)
Job advertismentsData Scientist Rails programmer
Requirements:* Data warehouse experience* SQL, NoSQL* Programming (e.g. Perl)* English
Advantages:* R programming* Math or insurance degree* German
Requirements:* 2 yrs of Rails experience* jQuery, Ajax* git* work without specs :)
Advantages:* stats knowledge* GH and SO activity* SaaS experience
![Page 98: English R Lightning Talks @ BURN (2014-04-22)](https://reader034.fdocuments.in/reader034/viewer/2022042821/55cf97cc550346d03393b0f6/html5/thumbnails/98.jpg)
01 László Gönczy Exploratory data analysis.
02 Gergely Horváth R workshop in Bucharest.
03 Imre Kocsis Bigvis: plotting large data in R.
04 András Tajti Changing User Roles in a Forum.
05 Dénes Tóth Dilemmas in package development.