Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

40
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1

Transcript of Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

Page 1: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

1

Basic R Programming for Life Science Undergraduate Students

Introductory Workshop(Session 1)

Page 2: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

2

Scope of Introductory Workshop on• How to install R platform on your

machine

• How to install R packages and dependencies

• How to get help and instructions

• How to use a library

• Variables and assigning values to variables

• Data types which R accepts

• Arithmetic manipulations of variables (+ - * / % ** etc)

• Browsing and managing your variables (ls, rm)

• Assigning vectors - the c() command

• Vector manipulations and referencing

• Matrices – declaration and manipulation (rows/columns) – rbind

• Data frames – import from xls/csv/txt files and statistical manipulation

• Introducing data categorisation using R datatype - Factor

• Simple graph plotting

• More statistical analysis

• Simple example of linear regression

• Quick Revision

• Future classes on R

Page 3: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

3

What is ?

• R = software and programming language• R is mainly used for statistical analysis and for

graphics generation• Free• Simple and intuitive ???• Available across difference platforms

( Mac, Unix/Linux/ Windows)

Page 4: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

4

Starting with • Installation (administrator rights required)

http://www.r-project.org/Tip: install the latest version (or the last stable version)

Page 5: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

5

Starting with • Installation

http://cran.bic.nus.edu.sg

Page 6: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

6

Starting with • Installation

Page 7: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

7

Your very first interface

Default prompt in R

Page 8: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

8

Starting with Packages • Additional functions that are

not included within the “base package”

• Installation (additional packages) install.packages(“package name”)

• To use package, type “library(package name)”

Page 9: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

9

Starting with

• Confused on R commands, get help On the GUI ?(function) or ??(function) Via WWW http://cran.r-project.org or

http://www.rseek.org/

Page 10: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

10

Fundamentals of Programming

• Simple data input and manipulation• Declaration of object (variable)

Take note that object names are• case sensitive (i.e. x is

different from X)• do not contain spaces,

numbers or symbols • Comprehensible

Page 11: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

11

Data typesRich set of datatypes in R

Commonly encountered datatypes in R• Scalars • Vectors (numerical, character and

logical)• Matrices (2D)• Arrays (can have more than 2

dimensions)• Data frames• Lists• Factors

Previous slide

See for example

http://www.statmet

hods.net/input/dat

atypes.html

for more details

Page 12: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

12

• Perform simple manipulations e.g. arithmetic calculations

• For more built-in R arithmetic functions, visit http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html

Fundamentals of Programming

Page 13: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

13

• Removing variables when they are not required

• Use “ls()” to check if object declared is still kept in memory

• To remove object from memory, do “rm(x)”

Fundamentals of Programming

Page 14: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

14

• More complex data inputs• Data Vectors list of objects

1 2 3 4 5

X (object)X X (vector)

Fundamentals of Programming

Page 15: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

15

• Assigning a data vector

1 2 3 4 5 1 2 3 4 5

Fundamentals of Programming

x <- c(1,2,3,4,5)

Page 16: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

16

• Define a vector var1 with values 1,2,3

• Define a vector var2 with values 4,5,6

• What value is var2[4] ?

• What is the sum of var1 ?

• What is the R code to assign object subsetvar1 with the first element of var1.

• What is the product of var1 and var2 ?

Experiment for yourselfhttp://www.statmethods.net/input/datatypes.html

Page 17: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

17

• Define a vector var1 with values 1,2,3 var1 <- c(1,2,3)

• Define a vector var2 with values 4,5,6 var2 <- c(4,5,6)

• What value is var2[4] ? NA

• What is the sum of var1 ? 6

• What is the R code to assign object subsetvar1 with the first element of var1. subsetvar1 <- var1[1]

• What is the product of var1 and var2 ? 4 10 18

Experiment for yourself

Page 18: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

18

• More complex data structures Matrices

Fundamentals of Programming

1 3 8

6 9 5

4 1 7

6 5 1

Page 19: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

19

• Declaring a matrix

Fundamentals of Programming

Page 20: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

20

• Simple manipulations of data matrix

Fundamentals of Programming

1 3 8

6 9 5

4 1 7

6 5 1

1 2 31234

• > y [1,] – 1 3 8 • > y [,3] – 8 5 7 1

• Simple arithmetic manipulations mean (y) – 4.666667 sum(y[2,]) – 20

• Modify and add values y[4,] <- c(6,2,2) y <- rbind(y, c(3,9,8) )Tip: Think of rbind as “row combine”

1 3 8

6 9 5

4 1 7

6 2 2

1 3 8

6 9 5

4 1 7

6 2 2

3 9 85

Page 21: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

21

• More complex data structures Data frames

Name Height

1 John 171cm

2 Mary 155cm

3 Peter 165cm

Fundamentals of Programming

Page 22: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

22

• Data frames

Name Height

1 John 171cm

2 Mary 155cm

3 Peter 165cm

Fundamentals of Programming

Page 23: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

23

• Reading in from input files

Fundamentals of Programming

Page 24: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

24

• Simple manipulations with data frames

Fundamentals of Programming

head(hfile,1)

summary(hfile)

1 2

Name Height

1 John 171cm

2 Mary 155cm

3 Peter 165cm

Create subsets new <- hfile[1,]

Page 25: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

25

• Simple statistics with R• Load file “Sampledata-1.txt” into R• studentprofile <-

read.table("B://Users/bchhuyng/Desktop/Sampledata-1.txt",sep="\t",header=TRUE)

• View the data loaded into R. studentprofile, head(studentprofile)

• How many categories are there in the field “Gender”?factor(studentprofile$Gender)

Fundamentals of Programming

Page 26: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

26

• “factor” function in R store them as categorical variables

Fundamentals of Programming

MM

MM M

MM

M

M

M

M

M

MM

M

M M

M

M

FFF

F

F

FF

FF

F F

F

F

F F

FF

F

FF

F

F F

M

M

M

M M

Page 27: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

27

• Usage of factor in plotting graphs

Fundamentals of Programming

Hu et. al, 2013

Page 28: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

28

• Usage of factor in plotting graphs

Fundamentals of Programming

Page 29: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

29

• Calculate the mean and the standard deviation of the height and weight of the students.E.g.mean(studentprofile$Weight)

median(studentprofile$Weight)

Fundamentals of Programming

Page 30: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

30

• Simple graph plotting with R• View the distribution of height and weight of the

100 students ( data from “Sampledata-1.txt” )plot(studentprofile$Weight,

studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

Fundamentals of Programming

Page 31: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

31

Fundamentals of Programming

Page 32: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

32

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51)

Page 33: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

33

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51)

Page 34: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

34

• Is height and weight of students sampled normally distributed?ks.test(studentprofile$Height, pnorm)ks.test(studentprofile$Weight, pnorm)

Fundamentals of Programming

H0: The data follow a specified distributionH1: The data do not follow the specified distribution

p-value ≤ 0.05 Reject H0

p-value > 0.05 Do not reject H1

CAVEAT!!!http

://www.r-bloggers.com/normali

ty-tests-don%E2%80%99t-do-

what-you-think-they-do/

Page 35: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

35

• Are the height and weight of students linearly correlated?reg1 <- lm(studentprofile$Height~

studentprofile$Weight)

Fundamentals of Programming

Page 36: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

36

• Are the height and weight of students linearly correlated?

Fundamentals of Programming

Page 37: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

37

Fundamentals of Programming

plot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

reg1 <- lm(studentprofile$Height~ studentprofile$Weight)

abline(reg1,col=2)

Page 38: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

38

intro checklist: what have you learnt today?• How to install R platform on your

machine

• How to install R packages and dependencies

• How to get help and instructions

• How to use a library

• Variables and assigning values to variables

• Data types which R accepts

• Arithmetic manipulations of variables (+ - * / % ** etc)

• Browsing and managing your variables (ls, rm)

• Assigning vectors - the c() command

• Vector manipulations and referencing

• Matrices – declaration and manipulation (rows/columns) – rbind

• Data frames – import from xls/csv/txt files and statistical manipulation

• Introducing data categorization using R datatype - Factor

• Simple graph plotting

• More statistical analysis

• Simple example of linear regression

Page 39: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

39

References• Crawley, M.J. (2007) The R book.• Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R –

an Example-based approach. • Kabacoff, R.I. (2012) Quick-R : Data types http://

www.statmethods.net/input/datatypes.html Accessed on 7/1/2014• King, W.B. (2010) Doing Arithmetic in R. http://

ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Accessed on 7/1/2014

• Ian (2011) Normality tests don’t do what you think they do. http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014

• Joris Meys and Andried de Vries. How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data-normality-in-a-formal-way-in-r.html Accessed on 7/1/2014

Page 40: Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

40

Future classes on and packages

• R has a very rich repertoire of packages• Statistical analysis• Microarray analysis• NGS• Etc etc.