Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

40
Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1

description

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1). Scope of Introductory Workshop on. Vector manipulations and referencing Matrices – declaration and manipulation (rows/columns) – rbind - PowerPoint PPT Presentation

Transcript of Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

Page 1: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

1

Basic R Programming for Life Science Undergraduate Students

Introductory Workshop(Session 1)

Page 2: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

2

Scope of Introductory Workshop on• How to install R platform on your

machine• How to install R packages and

dependencies• How to get help and instructions• How to use a library• Variables and assigning values to

variables• Data types which R accepts• Arithmetic manipulations of variables

(+ - * / % ** etc) • Browsing and managing your

variables (ls, rm)• Assigning vectors - the c() command

• Vector manipulations and referencing• Matrices – declaration and

manipulation (rows/columns) – rbind• Data frames – import from xls/csv/txt

files and statistical manipulation• Introducing data categorisation using

R datatype - Factor• Simple graph plotting• More statistical analysis • Simple example of linear regression• Quick Revision• Future classes on R

Page 3: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

3

What is ?

• R = software and programming language• R is mainly used for statistical analysis and for

graphics generation• Free• Simple and intuitive ???• Available across difference platforms

( Mac, Unix/Linux/ Windows)

Page 4: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

4

Starting with • Installation (administrator rights required)

http://www.r-project.org/Tip: install the latest version (or the last stable version)

Page 5: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

5

Starting with • Installation

http://cran.bic.nus.edu.sg

Page 6: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

6

Starting with • Installation

Page 7: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

7

Your very first interface

Default prompt in R

Page 8: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

8

Starting with Packages • Additional functions that are

not included within the “base package”

• Installation (additional packages) install.packages(“package name”)

• To use package, type “library(package name)”

Page 9: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

9

Starting with

• Confused on R commands, get help On the GUI ?(function) or ??(function) Via WWW http://cran.r-project.org or

http://www.rseek.org/

Page 10: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

10

Fundamentals of Programming• Simple data input and manipulation• Declaration of object (variable)

Take note that object names are• case sensitive (i.e. x is

different from X)• do not contain spaces,

numbers or symbols • Comprehensible

Page 11: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

11

Data typesRich set of datatypes in RCommonly encountered datatypes in R• Scalars • Vectors (numerical, character and

logical)• Matrices (2D)• Arrays (can have more than 2

dimensions)• Data frames• Lists• Factors

Previous slide

See for example

http://www.statmet

hods.net/input/dat

atypes.html

for more details

Page 12: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

12

• Perform simple manipulations e.g. arithmetic calculations

• For more built-in R arithmetic functions, visit http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html

Fundamentals of Programming

Page 13: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

13

• Removing variables when they are not required

• Use “ls()” to check if object declared is still kept in memory

• To remove object from memory, do “rm(x)”

Fundamentals of Programming

Page 14: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

14

• More complex data inputs• Data Vectors list of objects

1 2 3 4 5

X (object)X X (vector)

Fundamentals of Programming

Page 15: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

15

• Assigning a data vector

1 2 3 4 5 1 2 3 4 5

Fundamentals of Programming

x <- c(1,2,3,4,5)

Page 16: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

16

• Define a vector var1 with values 1,2,3

• Define a vector var2 with values 4,5,6

• What value is var2[4] ?

• What is the sum of var1 ?

• What is the R code to assign object subsetvar1 with the first element of var1.

• What is the product of var1 and var2 ?

Experiment for yourselfhttp://www.statmethods.net/input/datatypes.html

Page 17: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

17

• Define a vector var1 with values 1,2,3 var1 <- c(1,2,3)

• Define a vector var2 with values 4,5,6 var2 <- c(4,5,6)

• What value is var2[4] ? NA

• What is the sum of var1 ? 6

• What is the R code to assign object subsetvar1 with the first element of var1. subsetvar1 <- var1[1]

• What is the product of var1 and var2 ? 4 10 18

Experiment for yourself

Page 18: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

18

• More complex data structures Matrices

Fundamentals of Programming

1 3 86 9 54 1 76 5 1

Page 19: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

19

• Declaring a matrix

Fundamentals of Programming

Page 20: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

20

• Simple manipulations of data matrix

Fundamentals of Programming

1 3 86 9 54 1 76 5 1

1 2 31234

• > y [1,] – 1 3 8 • > y [,3] – 8 5 7 1

• Simple arithmetic manipulations mean (y) – 4.666667 sum(y[2,]) – 20

• Modify and add values y[4,] <- c(6,2,2) y <- rbind(y, c(3,9,8) )Tip: Think of rbind as “row combine”

1 3 86 9 54 1 76 2 2

1 3 86 9 54 1 76 2 23 9 85

Page 21: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

21

• More complex data structures Data frames

Name Height 1 John 171cm2 Mary 155cm3 Peter 165cm

Fundamentals of Programming

Page 22: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

22

• Data framesName Height

1 John 171cm2 Mary 155cm3 Peter 165cm

Fundamentals of Programming

Page 23: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

23

• Reading in from input files

Fundamentals of Programming

Page 24: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

24

• Simple manipulations with data frames

Fundamentals of Programming

head(hfile,1)

summary(hfile)

1 2Name Height

1 John 171cm2 Mary 155cm3 Peter 165cm

Create subsets new <- hfile[1,]

Page 25: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

25

• Simple statistics with R• Load file “Sampledata-1.txt” into R• studentprofile <-

read.table("B://Users/bchhuyng/Desktop/Sampledata-1.txt",sep="\t",header=TRUE)

• View the data loaded into R. studentprofile, head(studentprofile)

• How many categories are there in the field “Gender”?factor(studentprofile$Gender)

Fundamentals of Programming

Page 26: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

26

• “factor” function in R store them as categorical variables

Fundamentals of Programming

MM

MM M

MM

M

M

M

MM

M M

M

M M

M

M

FFF

FF

FF

F FF F

F

F

F F

FF

F

FF

F

F F

M

M

M

M M

Page 27: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

27

• Usage of factor in plotting graphs

Fundamentals of Programming

Hu et. al, 2013

Page 28: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

28

• Usage of factor in plotting graphs

Fundamentals of Programming

Page 29: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

29

• Calculate the mean and the standard deviation of the height and weight of the students.E.g.mean(studentprofile$Weight)

median(studentprofile$Weight)

Fundamentals of Programming

Page 30: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

30

• Simple graph plotting with R• View the distribution of height and weight of the

100 students ( data from “Sampledata-1.txt” )plot(studentprofile$Weight,

studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

Fundamentals of Programming

Page 31: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

31

Fundamentals of Programming

Page 32: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

32

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51)

Page 33: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

33

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51)

Page 34: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

34

• Is height and weight of students sampled normally distributed?ks.test(studentprofile$Height, pnorm)ks.test(studentprofile$Weight, pnorm)

Fundamentals of Programming

H0: The data follow a specified distributionH1: The data do not follow the specified distribution

p-value ≤ 0.05 Reject H0

p-value > 0.05 Do not reject H1

CAVEAT!!!http

://www.r-bloggers.com/normali

ty-tests-don%E2%80%99t-do-

what-you-think-they-do/

Page 35: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

35

• Are the height and weight of students linearly correlated?reg1 <- lm(studentprofile$Height~

studentprofile$Weight)

Fundamentals of Programming

Page 36: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

36

• Are the height and weight of students linearly correlated?

Fundamentals of Programming

Page 37: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

37

Fundamentals of Programmingplot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

reg1 <- lm(studentprofile$Height~ studentprofile$Weight)

abline(reg1,col=2)

Page 38: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

38

intro checklist: what have you learnt today?• How to install R platform on your

machine• How to install R packages and

dependencies• How to get help and instructions• How to use a library• Variables and assigning values to

variables• Data types which R accepts• Arithmetic manipulations of variables

(+ - * / % ** etc) • Browsing and managing your

variables (ls, rm)

• Assigning vectors - the c() command• Vector manipulations and

referencing• Matrices – declaration and

manipulation (rows/columns) – rbind• Data frames – import from xls/csv/txt

files and statistical manipulation• Introducing data categorization using

R datatype - Factor• Simple graph plotting• More statistical analysis • Simple example of linear regression

Page 39: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

39

References• Crawley, M.J. (2007) The R book.• Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R –

an Example-based approach. • Kabacoff, R.I. (2012) Quick-R : Data types http://

www.statmethods.net/input/datatypes.html Accessed on 7/1/2014• King, W.B. (2010) Doing Arithmetic in R. http://

ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Accessed on 7/1/2014

• Ian (2011) Normality tests don’t do what you think they do. http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014

• Joris Meys and Andried de Vries. How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data-normality-in-a-formal-way-in-r.html Accessed on 7/1/2014

Page 40: Basic R Programming for Life Science Undergraduate Students  Introductory Workshop (Session  1)

40

Future classes on and packages

• R has a very rich repertoire of packages• Statistical analysis• Microarray analysis• NGS• Etc etc.