Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

Post on 22-Feb-2016

32 views 1 download

Tags:

description

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1). Scope of Introductory Workshop on. Vector manipulations and referencing Matrices – declaration and manipulation (rows/columns) – rbind - PowerPoint PPT Presentation

Transcript of Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1)

1

Basic R Programming for Life Science Undergraduate Students

Introductory Workshop(Session 1)

2

Scope of Introductory Workshop on• How to install R platform on your

machine• How to install R packages and

dependencies• How to get help and instructions• How to use a library• Variables and assigning values to

variables• Data types which R accepts• Arithmetic manipulations of variables

(+ - * / % ** etc) • Browsing and managing your

variables (ls, rm)• Assigning vectors - the c() command

• Vector manipulations and referencing• Matrices – declaration and

manipulation (rows/columns) – rbind• Data frames – import from xls/csv/txt

files and statistical manipulation• Introducing data categorisation using

R datatype - Factor• Simple graph plotting• More statistical analysis • Simple example of linear regression• Quick Revision• Future classes on R

3

What is ?

• R = software and programming language• R is mainly used for statistical analysis and for

graphics generation• Free• Simple and intuitive ???• Available across difference platforms

( Mac, Unix/Linux/ Windows)

4

Starting with • Installation (administrator rights required)

http://www.r-project.org/Tip: install the latest version (or the last stable version)

5

Starting with • Installation

http://cran.bic.nus.edu.sg

6

Starting with • Installation

7

Your very first interface

Default prompt in R

8

Starting with Packages • Additional functions that are

not included within the “base package”

• Installation (additional packages) install.packages(“package name”)

• To use package, type “library(package name)”

9

Starting with

• Confused on R commands, get help On the GUI ?(function) or ??(function) Via WWW http://cran.r-project.org or

http://www.rseek.org/

10

Fundamentals of Programming• Simple data input and manipulation• Declaration of object (variable)

Take note that object names are• case sensitive (i.e. x is

different from X)• do not contain spaces,

numbers or symbols • Comprehensible

11

Data typesRich set of datatypes in RCommonly encountered datatypes in R• Scalars • Vectors (numerical, character and

logical)• Matrices (2D)• Arrays (can have more than 2

dimensions)• Data frames• Lists• Factors

Previous slide

See for example

http://www.statmet

hods.net/input/dat

atypes.html

for more details

12

• Perform simple manipulations e.g. arithmetic calculations

• For more built-in R arithmetic functions, visit http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html

Fundamentals of Programming

13

• Removing variables when they are not required

• Use “ls()” to check if object declared is still kept in memory

• To remove object from memory, do “rm(x)”

Fundamentals of Programming

14

• More complex data inputs• Data Vectors list of objects

1 2 3 4 5

X (object)X X (vector)

Fundamentals of Programming

15

• Assigning a data vector

1 2 3 4 5 1 2 3 4 5

Fundamentals of Programming

x <- c(1,2,3,4,5)

16

• Define a vector var1 with values 1,2,3

• Define a vector var2 with values 4,5,6

• What value is var2[4] ?

• What is the sum of var1 ?

• What is the R code to assign object subsetvar1 with the first element of var1.

• What is the product of var1 and var2 ?

Experiment for yourselfhttp://www.statmethods.net/input/datatypes.html

17

• Define a vector var1 with values 1,2,3 var1 <- c(1,2,3)

• Define a vector var2 with values 4,5,6 var2 <- c(4,5,6)

• What value is var2[4] ? NA

• What is the sum of var1 ? 6

• What is the R code to assign object subsetvar1 with the first element of var1. subsetvar1 <- var1[1]

• What is the product of var1 and var2 ? 4 10 18

Experiment for yourself

18

• More complex data structures Matrices

Fundamentals of Programming

1 3 86 9 54 1 76 5 1

19

• Declaring a matrix

Fundamentals of Programming

20

• Simple manipulations of data matrix

Fundamentals of Programming

1 3 86 9 54 1 76 5 1

1 2 31234

• > y [1,] – 1 3 8 • > y [,3] – 8 5 7 1

• Simple arithmetic manipulations mean (y) – 4.666667 sum(y[2,]) – 20

• Modify and add values y[4,] <- c(6,2,2) y <- rbind(y, c(3,9,8) )Tip: Think of rbind as “row combine”

1 3 86 9 54 1 76 2 2

1 3 86 9 54 1 76 2 23 9 85

21

• More complex data structures Data frames

Name Height 1 John 171cm2 Mary 155cm3 Peter 165cm

Fundamentals of Programming

22

• Data framesName Height

1 John 171cm2 Mary 155cm3 Peter 165cm

Fundamentals of Programming

23

• Reading in from input files

Fundamentals of Programming

24

• Simple manipulations with data frames

Fundamentals of Programming

head(hfile,1)

summary(hfile)

1 2Name Height

1 John 171cm2 Mary 155cm3 Peter 165cm

Create subsets new <- hfile[1,]

25

• Simple statistics with R• Load file “Sampledata-1.txt” into R• studentprofile <-

read.table("B://Users/bchhuyng/Desktop/Sampledata-1.txt",sep="\t",header=TRUE)

• View the data loaded into R. studentprofile, head(studentprofile)

• How many categories are there in the field “Gender”?factor(studentprofile$Gender)

Fundamentals of Programming

26

• “factor” function in R store them as categorical variables

Fundamentals of Programming

MM

MM M

MM

M

M

M

MM

M M

M

M M

M

M

FFF

FF

FF

F FF F

F

F

F F

FF

F

FF

F

F F

M

M

M

M M

27

• Usage of factor in plotting graphs

Fundamentals of Programming

Hu et. al, 2013

28

• Usage of factor in plotting graphs

Fundamentals of Programming

29

• Calculate the mean and the standard deviation of the height and weight of the students.E.g.mean(studentprofile$Weight)

median(studentprofile$Weight)

Fundamentals of Programming

30

• Simple graph plotting with R• View the distribution of height and weight of the

100 students ( data from “Sampledata-1.txt” )plot(studentprofile$Weight,

studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

Fundamentals of Programming

31

Fundamentals of Programming

32

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51)

33

• What is the distribution of height and weight amongst students?

Fundamentals of Programming

hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51)

34

• Is height and weight of students sampled normally distributed?ks.test(studentprofile$Height, pnorm)ks.test(studentprofile$Weight, pnorm)

Fundamentals of Programming

H0: The data follow a specified distributionH1: The data do not follow the specified distribution

p-value ≤ 0.05 Reject H0

p-value > 0.05 Do not reject H1

CAVEAT!!!http

://www.r-bloggers.com/normali

ty-tests-don%E2%80%99t-do-

what-you-think-they-do/

35

• Are the height and weight of students linearly correlated?reg1 <- lm(studentprofile$Height~

studentprofile$Weight)

Fundamentals of Programming

36

• Are the height and weight of students linearly correlated?

Fundamentals of Programming

37

Fundamentals of Programmingplot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5)

reg1 <- lm(studentprofile$Height~ studentprofile$Weight)

abline(reg1,col=2)

38

intro checklist: what have you learnt today?• How to install R platform on your

machine• How to install R packages and

dependencies• How to get help and instructions• How to use a library• Variables and assigning values to

variables• Data types which R accepts• Arithmetic manipulations of variables

(+ - * / % ** etc) • Browsing and managing your

variables (ls, rm)

• Assigning vectors - the c() command• Vector manipulations and

referencing• Matrices – declaration and

manipulation (rows/columns) – rbind• Data frames – import from xls/csv/txt

files and statistical manipulation• Introducing data categorization using

R datatype - Factor• Simple graph plotting• More statistical analysis • Simple example of linear regression

39

References• Crawley, M.J. (2007) The R book.• Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R –

an Example-based approach. • Kabacoff, R.I. (2012) Quick-R : Data types http://

www.statmethods.net/input/datatypes.html Accessed on 7/1/2014• King, W.B. (2010) Doing Arithmetic in R. http://

ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Accessed on 7/1/2014

• Ian (2011) Normality tests don’t do what you think they do. http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014

• Joris Meys and Andried de Vries. How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data-normality-in-a-formal-way-in-r.html Accessed on 7/1/2014

40

Future classes on and packages

• R has a very rich repertoire of packages• Statistical analysis• Microarray analysis• NGS• Etc etc.