Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
-
Upload
prosper-chapman -
Category
Documents
-
view
215 -
download
0
Transcript of Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.
1
Basic R Programming for Life Science Undergraduate Students
Introductory Workshop(Session 1)
2
Scope of Introductory Workshop on• How to install R platform on your
machine
• How to install R packages and dependencies
• How to get help and instructions
• How to use a library
• Variables and assigning values to variables
• Data types which R accepts
• Arithmetic manipulations of variables (+ - * / % ** etc)
• Browsing and managing your variables (ls, rm)
• Assigning vectors - the c() command
• Vector manipulations and referencing
• Matrices – declaration and manipulation (rows/columns) – rbind
• Data frames – import from xls/csv/txt files and statistical manipulation
• Introducing data categorisation using R datatype - Factor
• Simple graph plotting
• More statistical analysis
• Simple example of linear regression
• Quick Revision
• Future classes on R
3
What is ?
• R = software and programming language• R is mainly used for statistical analysis and for
graphics generation• Free• Simple and intuitive ???• Available across difference platforms
( Mac, Unix/Linux/ Windows)
4
Starting with • Installation (administrator rights required)
http://www.r-project.org/Tip: install the latest version (or the last stable version)
5
Starting with • Installation
http://cran.bic.nus.edu.sg
6
Starting with • Installation
7
Your very first interface
Default prompt in R
8
Starting with Packages • Additional functions that are
not included within the “base package”
• Installation (additional packages) install.packages(“package name”)
• To use package, type “library(package name)”
9
Starting with
• Confused on R commands, get help On the GUI ?(function) or ??(function) Via WWW http://cran.r-project.org or
http://www.rseek.org/
10
Fundamentals of Programming
• Simple data input and manipulation• Declaration of object (variable)
Take note that object names are• case sensitive (i.e. x is
different from X)• do not contain spaces,
numbers or symbols • Comprehensible
11
Data typesRich set of datatypes in R
Commonly encountered datatypes in R• Scalars • Vectors (numerical, character and
logical)• Matrices (2D)• Arrays (can have more than 2
dimensions)• Data frames• Lists• Factors
Previous slide
See for example
http://www.statmet
hods.net/input/dat
atypes.html
for more details
12
• Perform simple manipulations e.g. arithmetic calculations
• For more built-in R arithmetic functions, visit http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html
Fundamentals of Programming
13
• Removing variables when they are not required
• Use “ls()” to check if object declared is still kept in memory
• To remove object from memory, do “rm(x)”
Fundamentals of Programming
14
• More complex data inputs• Data Vectors list of objects
1 2 3 4 5
X (object)X X (vector)
Fundamentals of Programming
15
• Assigning a data vector
1 2 3 4 5 1 2 3 4 5
Fundamentals of Programming
x <- c(1,2,3,4,5)
16
• Define a vector var1 with values 1,2,3
• Define a vector var2 with values 4,5,6
• What value is var2[4] ?
• What is the sum of var1 ?
• What is the R code to assign object subsetvar1 with the first element of var1.
• What is the product of var1 and var2 ?
Experiment for yourselfhttp://www.statmethods.net/input/datatypes.html
17
• Define a vector var1 with values 1,2,3 var1 <- c(1,2,3)
• Define a vector var2 with values 4,5,6 var2 <- c(4,5,6)
• What value is var2[4] ? NA
• What is the sum of var1 ? 6
• What is the R code to assign object subsetvar1 with the first element of var1. subsetvar1 <- var1[1]
• What is the product of var1 and var2 ? 4 10 18
Experiment for yourself
18
• More complex data structures Matrices
Fundamentals of Programming
1 3 8
6 9 5
4 1 7
6 5 1
19
• Declaring a matrix
Fundamentals of Programming
20
• Simple manipulations of data matrix
Fundamentals of Programming
1 3 8
6 9 5
4 1 7
6 5 1
1 2 31234
• > y [1,] – 1 3 8 • > y [,3] – 8 5 7 1
• Simple arithmetic manipulations mean (y) – 4.666667 sum(y[2,]) – 20
• Modify and add values y[4,] <- c(6,2,2) y <- rbind(y, c(3,9,8) )Tip: Think of rbind as “row combine”
1 3 8
6 9 5
4 1 7
6 2 2
1 3 8
6 9 5
4 1 7
6 2 2
3 9 85
21
• More complex data structures Data frames
Name Height
1 John 171cm
2 Mary 155cm
3 Peter 165cm
Fundamentals of Programming
22
• Data frames
Name Height
1 John 171cm
2 Mary 155cm
3 Peter 165cm
Fundamentals of Programming
23
• Reading in from input files
Fundamentals of Programming
24
• Simple manipulations with data frames
Fundamentals of Programming
head(hfile,1)
summary(hfile)
1 2
Name Height
1 John 171cm
2 Mary 155cm
3 Peter 165cm
Create subsets new <- hfile[1,]
25
• Simple statistics with R• Load file “Sampledata-1.txt” into R• studentprofile <-
read.table("B://Users/bchhuyng/Desktop/Sampledata-1.txt",sep="\t",header=TRUE)
• View the data loaded into R. studentprofile, head(studentprofile)
• How many categories are there in the field “Gender”?factor(studentprofile$Gender)
Fundamentals of Programming
26
• “factor” function in R store them as categorical variables
Fundamentals of Programming
MM
MM M
MM
M
M
M
M
M
MM
M
M M
M
M
FFF
F
F
FF
FF
F F
F
F
F F
FF
F
FF
F
F F
M
M
M
M M
27
• Usage of factor in plotting graphs
Fundamentals of Programming
Hu et. al, 2013
28
• Usage of factor in plotting graphs
Fundamentals of Programming
29
• Calculate the mean and the standard deviation of the height and weight of the students.E.g.mean(studentprofile$Weight)
median(studentprofile$Weight)
Fundamentals of Programming
30
• Simple graph plotting with R• View the distribution of height and weight of the
100 students ( data from “Sampledata-1.txt” )plot(studentprofile$Weight,
studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)
Fundamentals of Programming
31
Fundamentals of Programming
32
• What is the distribution of height and weight amongst students?
Fundamentals of Programming
hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51)
33
• What is the distribution of height and weight amongst students?
Fundamentals of Programming
hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51)
34
• Is height and weight of students sampled normally distributed?ks.test(studentprofile$Height, pnorm)ks.test(studentprofile$Weight, pnorm)
Fundamentals of Programming
H0: The data follow a specified distributionH1: The data do not follow the specified distribution
p-value ≤ 0.05 Reject H0
p-value > 0.05 Do not reject H1
CAVEAT!!!http
://www.r-bloggers.com/normali
ty-tests-don%E2%80%99t-do-
what-you-think-they-do/
35
• Are the height and weight of students linearly correlated?reg1 <- lm(studentprofile$Height~
studentprofile$Weight)
Fundamentals of Programming
36
• Are the height and weight of students linearly correlated?
Fundamentals of Programming
37
Fundamentals of Programming
plot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)
reg1 <- lm(studentprofile$Height~ studentprofile$Weight)
abline(reg1,col=2)
38
intro checklist: what have you learnt today?• How to install R platform on your
machine
• How to install R packages and dependencies
• How to get help and instructions
• How to use a library
• Variables and assigning values to variables
• Data types which R accepts
• Arithmetic manipulations of variables (+ - * / % ** etc)
• Browsing and managing your variables (ls, rm)
• Assigning vectors - the c() command
• Vector manipulations and referencing
• Matrices – declaration and manipulation (rows/columns) – rbind
• Data frames – import from xls/csv/txt files and statistical manipulation
• Introducing data categorization using R datatype - Factor
• Simple graph plotting
• More statistical analysis
• Simple example of linear regression
39
References• Crawley, M.J. (2007) The R book.• Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R –
an Example-based approach. • Kabacoff, R.I. (2012) Quick-R : Data types http://
www.statmethods.net/input/datatypes.html Accessed on 7/1/2014• King, W.B. (2010) Doing Arithmetic in R. http://
ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Accessed on 7/1/2014
• Ian (2011) Normality tests don’t do what you think they do. http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014
• Joris Meys and Andried de Vries. How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data-normality-in-a-formal-way-in-r.html Accessed on 7/1/2014
40
Future classes on and packages
• R has a very rich repertoire of packages• Statistical analysis• Microarray analysis• NGS• Etc etc.