Handbook for Electronic Return Originators and Transmitters of ...
14.2 Computer software Ross Ihaka is one the originators of R, a software package for statistical...
-
date post
20-Jan-2016 -
Category
Documents
-
view
214 -
download
1
Transcript of 14.2 Computer software Ross Ihaka is one the originators of R, a software package for statistical...
14.2 Computer software
Ross Ihaka is one the originators of R, a software package for statistical computing that has had phenomenal uptake internationally. It can be downloaded free and easily customised for a very wide variety of applications. The package and the paper introducing it have been cited over 1700 times, by far the highest for publications in the mathematical sciences over the last ten years, worldwide. It is now disseminated from over 75 internet sites in 30 countries.
R
CRAN
The package is used both for teaching and research by hundreds of universities around the world, including Stanford, Oxford, Cambridge and Berkeley. There are over 40 books written about, or featuring, the use of R.
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?scp=2&sq=ihaka&st=cse
Time series data.
t: time (but could be space)
0 t T (continuous)
t = 0,...,T-1 (discrete)
t = 1 ,..., N
(unique) value Y(t)
Y in R
{0,1}
Rp
N, category, [0, 2)
Useful when there is special period P
t=nP+p, p=0,...,P-1
Yn(p)=Y(nP+p)
Vector ts case matplot(type="l"
use several line types
colors
use to force comparison
Tufte (1983). 10th or 11th century movement of planets and sun
Connected. Good for smooth series
individual data points not unambiguously displayed
irregular sampling can be unclear
plot(type="l",
Symbol graph. Good for long term behavior
cannot appreciate middle and high frequency behavior
irregular sampling can be unclear
plot(type="p",...) seals T = 940
Both. Points and line segments
plot(type="b",
Both. Points on top of connected
plot(type="o",
plot(type="n"
points(pch="*"
plot(type="n"
lines(type=3
plot(type="h"
Good when need to see individual values
when series long
about centralvalue (next)
Not good when strong trend
plot(type="h",...) Amazon T = 32875
Good about central value
Box-Jenkins Model-building strategy.
1. model specification
2. model fitting
3. model diagnostics
Scientific method
EDA
I. "...three of the main strategies of data analysis are: 1. graphical presentation. 2. provision of flexibility in viewpoint and in facilities, 3. intensive search for parsimony and simplicity ..."
II. "In exploratory data analysis there can be no substitute for flexibility; for adapting what is calculated - and what we hope plotted - both to the needs of the situation and the clues that the data have already provided."
III. "I would like to convince you that the histogram is old-fashioned ..."
IV. "Exploratory data analysis ... does not need probability, significance or confidence."
V. "... I hope that I have shown that exploratory data analysis is actively incisive rather than passively descriptive, with real emphasis on the discovery of the unexpected ..."
VI. "'exploratory data analysis' is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as those we believe to be there."
VII. "Exploratory data analysis isolates patterns and features of the data and reveals these forcefully to the analyst."
VIII. "If we need a short suggestion of what exploratory data analysis is, I would suggest that: 1. it is an attitude, AND 2. a flexibility, AND 3. some graph paper (or transparencies, or both)."