From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databricks)
course structure what is R? · course structure •course split into 6 main components...
Transcript of course structure what is R? · course structure •course split into 6 main components...
1
Alex Douglas
An Introduction to
Scedacity
adouglasabdnacuk
(for robust and reproducible research practices)
1
goals
- introduce you to using R for data manipulation graphics and statistics
bull over the next three dayshelliphellip
- provide you with the practical skills necessary
to explore and visualise your data
- BUT cannot teach you everything there is to knowabout R thatrsquos down to you
- provide you with the practical skills necessary
to start your own data analysis
2
course structure
bull course split into 6 main components
- introduction to R environment- basic R operations- dataframes
- graphics and data exploration
- basic statistics using R
bull work through manual and associated exercises
- R programming (optional)
3
what is R
bull environment for statistical computing graphicsand programming
bull originally created by Ross Ihaka and RobertGentleman (1996)
bull currently maintained by international R-core development team
bull very similar to S
4
R-Project and CRAN
bull more information found at wwwr-projectcom
bull download R from httpcranukr-projectorg
5
why you shouldnrsquot bother with
relatively steep learning curve
itrsquos command line driven
you need to learn to lsquospeakrsquo R
sefnc lt- function(x) start function for sestdx lt- sd(x) calculate SDnosx lt- length(x) calculate number obssex lt- stdx(sqrt(nosx))calculate SEprint(sex) print SE
end of function
6
2
why you should learn
itrsquos free and platform independent
itrsquos the software of choice for many students academics industries and charities worldwide
highly flexible andextensive
encourages you think about your analyses
7
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
8
9
Why
fraud -gt QRPs
journal policy
structural pressures
closed science
10
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
11 12
3
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
13
Using R - GUIbull fairly spartan GUI ndash very few menus or buttons
console
tool bar menu bar
commandline prompt
R is not menu driven
14
using R
bull commands are typed (or sourced) into the consolewindow at the gt prompt
gt 2+2[1] 4gt
bull R is object orientated You can create variablesand assign value(s) to them
gt a lt- 2+2 gt a[1] 4gt
assign variables a value using the lsquogetsrsquo operator
display variable value by typingvariable name
15
using R
bull once created operations can be performed on variables
gt a lt- 2+2gt b lt- 32gt a+b[1] 10
bull this is very powerful and flexible (as you will see)
bull much of the functionality of R is enhanced byusing variables called functions
16
using R
bull functions contain a set of instructions that allow you to perform a specific task(s)
bull you can use functions that are inbuilt in R(or R packages)
gt numbers lt- c(23456)gt numbers[1] 2 3 4 5 6gt mean(numbers)[1] 4gt var(numbers)[1] 25
concatenate
calculate the mean of variable numbers
calculate the variance of variable numbers
17
using R
bull or write your own functions
bull function to calculate standard errorgt sefnc lt- function(x) start function for sestdx lt- sd(x) calculate SDnosx lt- length(x) calculate number obssex lt- stdx(sqrt(nosx))calculate SEprint(sex) print SE end of function
bull using your new function
gt sefnc(numbers)[1] 07071068
notice the use of comments
18
4
using R - syntax
bull R is case sensitive lsquoArsquo is not the same as lsquoarsquo
bull commands are generally separated by a new line occasionally you might see a semicolon
bull anything that follows the hash symbol () will be ignoredby R Use this to comment your code
bull a series of commands can be grouped using braces
bull donrsquot worry too much about spaces
bull previous commands can be recalled using uarr and darr keys
19
using R
bull some functions open other devices - ie many graphic functions open a plotting device
- plots can be copiedand pasted into word
- or saved as a pdfjpg etc
- you can open more than one device
20
getting help in Rbull R has extensive help facilities
bull from within R the main method of getting help is touse the help() function
gt help(plot)
orplot
to search for help
gt helpsearch(plot)
orplot
21
getting help in Rbull wealth of information on web
- R-project web site
- Rhelp mailing list with searchablearchives
- free pdfs from R-project website
- UoA Rusergroup mailing list
httpsmlisthostabdnacukmailmanlistinforusergroup
- wwwRseekorg
22
using an IDE RStudiobull Using an Integrated Development Environment (IDE)
bull RStudio (httpwwwrstudiocom)
editor
console
workspace
graphics
23
keep scripting
bull allows you to keep apermanent and reproducible record ofyour analyses
bull always use a script editor or IDE to write your R code rather
than typing directly into console
bull the more you use R the more complicated your code willbecome
sefnc lt- function(x)stdx lt- sd(x)nosx lt- length(x)sex lt- stdx(sqrt(nosx))print(sex)
24
5
course survival guide (and beyond)
bull keep a careful record of your code analysis amp plots- RStudio MS word R markdown() etc
bull annotate your R code with plenty of comments ()
bull remember R has an extensive help facility
bull ask plenty of questions
bull start using R to explore and analyse your own data as soon as possible
25
where to startbull work through sections of manual
bull complete exercises for each section
bull datafiles and exercises can be found at
httpsalexd106githubiointro2R
26
2
why you should learn
itrsquos free and platform independent
itrsquos the software of choice for many students academics industries and charities worldwide
highly flexible andextensive
encourages you think about your analyses
7
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
8
9
Why
fraud -gt QRPs
journal policy
structural pressures
closed science
10
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
11 12
3
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
13
Using R - GUIbull fairly spartan GUI ndash very few menus or buttons
console
tool bar menu bar
commandline prompt
R is not menu driven
14
using R
bull commands are typed (or sourced) into the consolewindow at the gt prompt
gt 2+2[1] 4gt
bull R is object orientated You can create variablesand assign value(s) to them
gt a lt- 2+2 gt a[1] 4gt
assign variables a value using the lsquogetsrsquo operator
display variable value by typingvariable name
15
using R
bull once created operations can be performed on variables
gt a lt- 2+2gt b lt- 32gt a+b[1] 10
bull this is very powerful and flexible (as you will see)
bull much of the functionality of R is enhanced byusing variables called functions
16
using R
bull functions contain a set of instructions that allow you to perform a specific task(s)
bull you can use functions that are inbuilt in R(or R packages)
gt numbers lt- c(23456)gt numbers[1] 2 3 4 5 6gt mean(numbers)[1] 4gt var(numbers)[1] 25
concatenate
calculate the mean of variable numbers
calculate the variance of variable numbers
17
using R
bull or write your own functions
bull function to calculate standard errorgt sefnc lt- function(x) start function for sestdx lt- sd(x) calculate SDnosx lt- length(x) calculate number obssex lt- stdx(sqrt(nosx))calculate SEprint(sex) print SE end of function
bull using your new function
gt sefnc(numbers)[1] 07071068
notice the use of comments
18
4
using R - syntax
bull R is case sensitive lsquoArsquo is not the same as lsquoarsquo
bull commands are generally separated by a new line occasionally you might see a semicolon
bull anything that follows the hash symbol () will be ignoredby R Use this to comment your code
bull a series of commands can be grouped using braces
bull donrsquot worry too much about spaces
bull previous commands can be recalled using uarr and darr keys
19
using R
bull some functions open other devices - ie many graphic functions open a plotting device
- plots can be copiedand pasted into word
- or saved as a pdfjpg etc
- you can open more than one device
20
getting help in Rbull R has extensive help facilities
bull from within R the main method of getting help is touse the help() function
gt help(plot)
orplot
to search for help
gt helpsearch(plot)
orplot
21
getting help in Rbull wealth of information on web
- R-project web site
- Rhelp mailing list with searchablearchives
- free pdfs from R-project website
- UoA Rusergroup mailing list
httpsmlisthostabdnacukmailmanlistinforusergroup
- wwwRseekorg
22
using an IDE RStudiobull Using an Integrated Development Environment (IDE)
bull RStudio (httpwwwrstudiocom)
editor
console
workspace
graphics
23
keep scripting
bull allows you to keep apermanent and reproducible record ofyour analyses
bull always use a script editor or IDE to write your R code rather
than typing directly into console
bull the more you use R the more complicated your code willbecome
sefnc lt- function(x)stdx lt- sd(x)nosx lt- length(x)sex lt- stdx(sqrt(nosx))print(sex)
24
5
course survival guide (and beyond)
bull keep a careful record of your code analysis amp plots- RStudio MS word R markdown() etc
bull annotate your R code with plenty of comments ()
bull remember R has an extensive help facility
bull ask plenty of questions
bull start using R to explore and analyse your own data as soon as possible
25
where to startbull work through sections of manual
bull complete exercises for each section
bull datafiles and exercises can be found at
httpsalexd106githubiointro2R
26
3
why you should learn
it allows you to keep an exact and reproducible record of your analyses
excellent graphics facilities
might get you that jobpost-doc
R is a community not just a bit of software
13
Using R - GUIbull fairly spartan GUI ndash very few menus or buttons
console
tool bar menu bar
commandline prompt
R is not menu driven
14
using R
bull commands are typed (or sourced) into the consolewindow at the gt prompt
gt 2+2[1] 4gt
bull R is object orientated You can create variablesand assign value(s) to them
gt a lt- 2+2 gt a[1] 4gt
assign variables a value using the lsquogetsrsquo operator
display variable value by typingvariable name
15
using R
bull once created operations can be performed on variables
gt a lt- 2+2gt b lt- 32gt a+b[1] 10
bull this is very powerful and flexible (as you will see)
bull much of the functionality of R is enhanced byusing variables called functions
16
using R
bull functions contain a set of instructions that allow you to perform a specific task(s)
bull you can use functions that are inbuilt in R(or R packages)
gt numbers lt- c(23456)gt numbers[1] 2 3 4 5 6gt mean(numbers)[1] 4gt var(numbers)[1] 25
concatenate
calculate the mean of variable numbers
calculate the variance of variable numbers
17
using R
bull or write your own functions
bull function to calculate standard errorgt sefnc lt- function(x) start function for sestdx lt- sd(x) calculate SDnosx lt- length(x) calculate number obssex lt- stdx(sqrt(nosx))calculate SEprint(sex) print SE end of function
bull using your new function
gt sefnc(numbers)[1] 07071068
notice the use of comments
18
4
using R - syntax
bull R is case sensitive lsquoArsquo is not the same as lsquoarsquo
bull commands are generally separated by a new line occasionally you might see a semicolon
bull anything that follows the hash symbol () will be ignoredby R Use this to comment your code
bull a series of commands can be grouped using braces
bull donrsquot worry too much about spaces
bull previous commands can be recalled using uarr and darr keys
19
using R
bull some functions open other devices - ie many graphic functions open a plotting device
- plots can be copiedand pasted into word
- or saved as a pdfjpg etc
- you can open more than one device
20
getting help in Rbull R has extensive help facilities
bull from within R the main method of getting help is touse the help() function
gt help(plot)
orplot
to search for help
gt helpsearch(plot)
orplot
21
getting help in Rbull wealth of information on web
- R-project web site
- Rhelp mailing list with searchablearchives
- free pdfs from R-project website
- UoA Rusergroup mailing list
httpsmlisthostabdnacukmailmanlistinforusergroup
- wwwRseekorg
22
using an IDE RStudiobull Using an Integrated Development Environment (IDE)
bull RStudio (httpwwwrstudiocom)
editor
console
workspace
graphics
23
keep scripting
bull allows you to keep apermanent and reproducible record ofyour analyses
bull always use a script editor or IDE to write your R code rather
than typing directly into console
bull the more you use R the more complicated your code willbecome
sefnc lt- function(x)stdx lt- sd(x)nosx lt- length(x)sex lt- stdx(sqrt(nosx))print(sex)
24
5
course survival guide (and beyond)
bull keep a careful record of your code analysis amp plots- RStudio MS word R markdown() etc
bull annotate your R code with plenty of comments ()
bull remember R has an extensive help facility
bull ask plenty of questions
bull start using R to explore and analyse your own data as soon as possible
25
where to startbull work through sections of manual
bull complete exercises for each section
bull datafiles and exercises can be found at
httpsalexd106githubiointro2R
26
4
using R - syntax
bull R is case sensitive lsquoArsquo is not the same as lsquoarsquo
bull commands are generally separated by a new line occasionally you might see a semicolon
bull anything that follows the hash symbol () will be ignoredby R Use this to comment your code
bull a series of commands can be grouped using braces
bull donrsquot worry too much about spaces
bull previous commands can be recalled using uarr and darr keys
19
using R
bull some functions open other devices - ie many graphic functions open a plotting device
- plots can be copiedand pasted into word
- or saved as a pdfjpg etc
- you can open more than one device
20
getting help in Rbull R has extensive help facilities
bull from within R the main method of getting help is touse the help() function
gt help(plot)
orplot
to search for help
gt helpsearch(plot)
orplot
21
getting help in Rbull wealth of information on web
- R-project web site
- Rhelp mailing list with searchablearchives
- free pdfs from R-project website
- UoA Rusergroup mailing list
httpsmlisthostabdnacukmailmanlistinforusergroup
- wwwRseekorg
22
using an IDE RStudiobull Using an Integrated Development Environment (IDE)
bull RStudio (httpwwwrstudiocom)
editor
console
workspace
graphics
23
keep scripting
bull allows you to keep apermanent and reproducible record ofyour analyses
bull always use a script editor or IDE to write your R code rather
than typing directly into console
bull the more you use R the more complicated your code willbecome
sefnc lt- function(x)stdx lt- sd(x)nosx lt- length(x)sex lt- stdx(sqrt(nosx))print(sex)
24
5
course survival guide (and beyond)
bull keep a careful record of your code analysis amp plots- RStudio MS word R markdown() etc
bull annotate your R code with plenty of comments ()
bull remember R has an extensive help facility
bull ask plenty of questions
bull start using R to explore and analyse your own data as soon as possible
25
where to startbull work through sections of manual
bull complete exercises for each section
bull datafiles and exercises can be found at
httpsalexd106githubiointro2R
26
5
course survival guide (and beyond)
bull keep a careful record of your code analysis amp plots- RStudio MS word R markdown() etc
bull annotate your R code with plenty of comments ()
bull remember R has an extensive help facility
bull ask plenty of questions
bull start using R to explore and analyse your own data as soon as possible
25
where to startbull work through sections of manual
bull complete exercises for each section
bull datafiles and exercises can be found at
httpsalexd106githubiointro2R
26