R freeware statistics package Tara Jenson NCAR RAL JNT Tom Hopson.

14
R freeware statistics package Tara Jenson NCAR RAL JNT Tom Hopson

Transcript of R freeware statistics package Tara Jenson NCAR RAL JNT Tom Hopson.

R freeware statistics packageTara Jenson

NCAR RAL JNT

Tom Hopson

What is R?

• A statistical programming language• In part, developed from the S Programming

Language from Bell Labs (John Chambers)• Created to allow rapid development of

methods for use in different types of data.• Create new graphics. Many default

parameters are chosen, but users retaincomplete control.

Why R?

• R has become the dominant language in the statisticalresearch community.

• R is Open Source and free.• Runs on all operatingsystems• Nearly 2,400 packages contributed.• Packagesand applications in nearly every field of

science, business and economics.• See R Notes, R Journal and Journal of Statistical

Software www.jstatsoft.org• More than 100 books with accompanying code• Very large, active user base.

Why not R?

• NCL, IDL, Matlab, SAS, … are all viablealternatives to R. If you are a part of an activecommunity of researchers using anotherlanguage, do likewise.

• If we were biostatisticians we would be usingSAS. Book Title: “Analyzing Receiver Operating Characteristic Curves with SAS”

• Consider building verification functions andutilities as part of code development .Verification need not be an external process toforecasting.

The R Community• Developers

– R Core Group (17 members), only 2 have left since1997

– Major update in April/October (freeze dates, betaversions, bug tracking, ...)

• Mailing lists– Help list ~ 150 messages/day, archived,

searchable.• 5 International Conferences, 2 US, 1 China

• Source code• Binary compilations (Windows, Mac OS, Linux• Documentation ( Main documents, plus numerous

contributed. Some in foreign languages.)• Newsletter (replaced by R Journal.)• Mailing list (Several search engines)• Packages on every topic imaginable• Wiki with examples• Reference list of books using R. ( more than 100)• Task Manager

Everything about R is at www.r-project.org

Use R with scripts

• In Linux - Emacs Speaks Statistics– Provides syntax-based– Object name completion– Key strokeshort cuts– Commandhistory– Alt-x R to invoke R with Xemacs.

• In Windows, use editor– Added GUI features– <control>R sends a line or highlighted section into R.– Install package with GUIs– Save graphics by point and click.

• Mac OS– Similarto Windows with advantages of system calls.

Packages in R

• Contributed by people world wide.• Allow scientists or statisticians to push their

ideas.• Apply and extend R capabilities to meet the

needs of specific communities.• Accompany many statistical textbooks

A sample of useful packages

•••••

verificationfields (spatial stats)radiosondesextRemesBMA(BayesianModelAveraging)

• BMAensemble• circular• Rsqlite

• Rgis, spatstat (GIS)• ncdf ( support for

netcdf files )• Rcolorbrewer• randomForests

Packages

• Packages must be installed to call.• Packages must be called to use.• Base packages are installed by default.

10 most useful function in R

• aggregate - applies a function to groups ofdata subset by categories.

• apply - incredibly efficient in avoiding loops.Applies functions across dimensions of arrays.

• layout - creatively divide a print region.• xyplot (in the lattice package) slightly advance

graphic techniques• %in% returns logical showing which elements

in A are in B. (e.g A%in%B)

More top 10

table – create contingency tabel counts.boot – apply bootstrap function correctlyread.fwf – read fixed width format datapar – control everything in a graphsystem( ) – allows you to call systemcommand from R

• pairs – the most under utilized plot – plots amatrix of 4 columns in a 4x4 plot layout

Login, start your windowing system.

$ R

Start R as appropriate for your platform. The R program begins, with a banner. (Within R, the prompt on the left hand side will not be shown to avoid confusion.)

help.start()

Start the HTML interface to on-line help (using a web browser available at your machine). You should briefly explore the features of this facility with the mouse. In particular, work through 1.5, 2.1 – 2.3, and appendix A (just the first one or two sections)

R Exercises

Choose groups of 3-4 – find a computerLog onto machinesBring up at least 2 xterms>cd /home/user/Desktop/longlead>vi intro2R.2013.R

And work through the commands given …