Introduction to R 21/11/2016 · Thefewcommandsyoumust know Command Whatitdoes read.table...

Post on 07-Aug-2020

0 views 0 download

Transcript of Introduction to R 21/11/2016 · Thefewcommandsyoumust know Command Whatitdoes read.table...

Introduction to R 21/11/2016

C3BI

Vincent Guillemot & Anne Biton

R: presentation and installation

Where?

https://cran.r-project.org/

How to install and use it?

I Follow the steps: you don’t need advanced rights to install it!I Open the R GUI.I Test a command: plot(-10:10, (-10:10)ˆ2).I Open an R script and save it in your working directory.

Rstudio

https://www.rstudio.com/

. . .

The few commands you must know

Command What it does

read.table Read a tabulated file.write.table Write a matrix or data frame.plot Command for graphical representation.x <- 1 Assign sthg (here 1) to object x.1:10 Create a vector containing integers 1 to 10.x[1:10] Extract a subvector from x.

. . .

Command What it does

c(2, 5) Create a vector containing 2 and 5.A[, 2:5] Extract columns 2 to 5 of matrix A.DF$variable Extract from data frame DF its column called variable.?rnorm Get help on the function called rnorm.??gaussian Get help on the topic gaussian.

A beginner’s test

I If you already know the previous commands → move to theback of the room, you can work independently on the handoutand the exercises and go home whenever you are finished ;

I If you are not familiar with these commands, move to the frontof the room.

I In any case, please ask us any R related question during theclass!

Basic commands

Prompt

A prompt is a special character that appears in the R console:

I > means that R is awaiting for an R command ;I + means that R is awaiting for the end of the current command;I A blank prompt means that R is computing something.

E.g., type:

1+12*3Sys.sleep(10)

Brackets

Brackets Use

() In functions, e.g. sin(2*pi).[] While indexing, e.g. x[1:2].{} In code blocks, e.g.

{x <- rnorm(10)y <- x[1:2]mean(y)

}

Writing your scripts: survival tips

1. Use spaces:

I x <- -1 is OK,I x<--1 not so much. . .

2. Indent!3. Save your scripts, it’s so easy with RStudio.4. Comment, comment, comment (use #).

“You are collaborating with at least one person: yourfuture self!”

– ∼ Hadley Wickham

What this course is about

I Write short R programsI Read and predict the outcome of simple R functions / programsI Make graphical representationsI Read data and write tables

To go beyond (or slower), there’s a lot of material available online:Quick-R, TryR, Data Camp, cookbook-r etc.

Types of exercises

Three levels:

1. Copy & paste some code and see what it does.2. Read some code and explain what it does.3. Create your own code to answer a question.

Ex. 1

1. Copy and execute the following command: log(exp(2)).2. What does this code do: log10(10ˆ3) ?3. Find a function to run a t-test.

R Objects

Types. . .

The type of an object is directly associated to the way it is stored inmemory:

I character : let <- "a"I double : nbr <- 2.0I integer : intg <- 1LI logical : TRUE or T or FALSE or FI Particular values: NA, +Inf, NaN

Types. . . and classes

The class of an object describes how different values are structuredwithin the object:

I vector: v <- c("a", "b", "a")I factor: fac <- factor(v)I matrix: M <- matrix(1:4, 2, 2)I data.frame: D <- data.frame(v, fac)I list,I etc.

Transformations

I as.integerI as.numericI as.characterI as.factorI as.vectorI . . .

Classes

Here are the classes that you need to know of:

I vectors and factors,I matrices,I data-frames,I lists,I functions.

FAMuSSS

FAMuSSS : The Functional Single Nucleotide PolymorphismsAssociated with Human Muscle Size and Strength Study

Load an RData file

In the RData file famusss.RData, there is an example of each ofthe 5 R classes we mentioned:

Name Class Content

ndrm.diff Vector Difference in strength in thenon-dominant arm

snp1 Factor SNP rs577x located in the geneACTN3

M Matrix Matrix containing the Age, heightand weight of the individuals

D Data-frame Sample data extracted from theFAMuSSS data

L List List containing various objectsbmi Function Computes the BMI of an

individual from their weight (lb)and height (in)

Ex. 2

1. Load the objects with the following command

load("famusss.RData")

2. Print all the objects: what type of data do they contain?3. What is the BMI of a person 70 inch tall person weighting 150

lb?4. What does L$Dimensions do? What does names(L) do?5. Extract the element called GenderTable from L?

VectorsCreate them with the combine function c or with the : operator:

x <- c(1, 10, -4, 5.0)i <- 1:10

Access elements from a vector with the square brackets

x[1]

## [1] 1

x[3:4]

## [1] -4 5

Factors

You can create factors in a number of ways, one of them is withfunction gl:

f0 <- gl(n = 3, k = 6, labels = c("CRTL", "A", "B"))

Ex. 3

1. What does f0 == "A" do?2. What do rep and seq do?3. Create a vector called v of length 18.4. What does v[f0 == "A"] do?5. Extract from v the values for which f0 is equal to B?

Matrices

A matrix is a two-dimensional kind of vector:

A <- matrix(0, 2, 2)B <- matrix(c("un","deux","trois","quatre"), 2, 2)A[1,]

## [1] 0 0

B[2,2]

## [1] "quatre"

Data framesA data frame is a two-dimensional structure that allows differenttypes for its columns:

D <- data.frame(a=1:10, b=letters[1:10], cos=cos(1:10))D[1:2,2:3]

## b cos## 1 a 0.5403023## 2 b -0.4161468

D$a[3]

## [1] 3

D[[1]]

## [1] 1 2 3 4 5 6 7 8 9 10

Lists

In R, data frames are special lists:

L <- list(1:10, b=3, f=cos, char=letters[5:7])names(L)

## [1] "" "b" "f" "char"

Block of code

A block allows to gather several commands in order to execute all ofthem at once!

{a <- 1b <- 2

}

It is used in

I functions,I loops (for, while. . . )I Control-flow constructs (?Control).

Functions

I Syntax : f <- function(arg1=, ...) {Commands}.I f ends with a return.I What can f return? Whatever you like (e.g. in a list).I Indent!

Ex. 4

I Create a matrix filled with random numbers (rnorm).I Compute the sum of each column (colSums).I Which elements are > 0?I Create a second matrix filled with 1s. It should have the same

dimensions as the first matrix.I Combine it with the first matrix (rbind or cbind).I Write a function returning the square and the square root of a

positive real number.

for loops

Repeat a block, depending on an iterator i, n times.

for (i in 1:10) {j <- i^2 + i + 1print(j)

}

In general, we want to save the result:

s <- rep(NA, 10)for (i in 1:10) {

s[i] <- i^2 + i + 1}s

Ex. 5

What does this loop do?

library(tm)library(stringr)aveu <- removePunctuation(scan("phedre.txt",what = ""))nba <- 0 ; nbe <- 0 ; nbi <- 0 ; nbo <- 0 ; nbu <- 0

for (mot in aveu) {nba <- nba + str_count(mot, "a")nbe <- nbe + str_count(mot, "e")nbi <- nbi + str_count(mot, "i")nbo <- nbo + str_count(mot, "o")nbu <- nbu + str_count(mot, "u")

}

c(a=nba, e=nbe, i=nbi, o=nbo, u=nbu)

if, else

The random p-value generator:

r <- runif(1)

if (r < 0.05) {print("Youpi !")

} else if (r < 0.1) {print("I still trust my result!")

} else {print(" :'( ")

}

Read and write data

Many available commands

Command Read Save

data Yes Noload Yes Nosave No Yesread.table Yes Yeswrite.table No Yesread.* Yes Yeswrite.* No Yes

Correspondance

Figure 1: diagrammer

data

I Example: data(cars).I Before and after: ls().I Class of the loaded object: class(cars).I Quick object exploration: str(cars).I Only the beginning of the table: head(cars).

Working directory

You may (will) want to change the working directory in which yourcommands will look for data and save your outputs.

You can do this:

I with the commands setwd and getwd,I in a much simpler way with RStudio : Session → Set working

directory → . . .

Tabulated data

I Column names,I lines separated with and EOL (end of line),I column separator (tab, ;, etc.),I the same number of columns per line.

long and wide formats: a wide table

## ctrl trt1 trt2## 1: 4.17 4.81 6.31## 2: 5.58 4.17 5.12## 3: 5.18 4.41 5.54## 4: 6.11 3.59 5.50## 5: 4.50 5.87 5.37## 6: 4.61 3.83 5.29## 7: 5.17 6.03 4.92## 8: 4.53 4.89 6.15## 9: 5.33 4.32 5.80## 10: 5.14 4.69 5.26

long and wide formats: a long table

## values ind## 1: 4.17 ctrl## 2: 5.58 ctrl## 3: 5.18 ctrl## 4: 6.11 ctrl## 5: 4.50 ctrl## ---## 26: 5.29 trt2## 27: 4.92 trt2## 28: 6.15 trt2## 29: 5.80 trt2## 30: 5.26 trt2

read.table

5 important parameters:

I file → where the file is,I header → whether the first line contains the names of the

columns,I sep → column separator,I dec → decimal point (3, 1419 or 3.1419 ?),I skip → how many lines should be skipped.

write.table

4 important parameter:

I x → matrix or data.frame to save,I file → where the file should be stored,I sep → column separator,I dec → decimal point (3, 1419 or 3.1419 ?),

save and load

I save can write any R object into an RData file.I load reads RData files.

Example :

x <- 1:10 ; a <- "toto" ; objetaunomtreslong <- pisave(x, a, objetaunomtreslong, file="Sauvegarde.RData")rm(list=ls())load("Sauvegarde.RData")

Plots

plot

Syntax : plot(objet, ...) !

Parameter Role

main Main titlexlab & ylab Axis titlexlim & ylim Axis limitstype Type of graph : points, lines etc. . .col Color, e.g. “black”, “red”, “green”. . .

Ex. 6

Apply plot to any function, e.g. choose one among the alreadybuilt-in functions: sin, cos, exp, log, sqrt. . .

Ex. 7With plot and grid, reproduce this plot:

Figure 2: image

Add points, and lines or a function

You can draw a graph on an existing plot with the followingcommands:

I points to add points,I lines to add lines,I plot(f, add=TRUE, ...) to add a function.

Ex. 8

1. Generate two variables, x and y, linearly linked to one another.(do not forget to add some noise)

2. Represent the scatter-plot of the two variables with plot.3. Add to the plot the underlying linear model with lines or

plot.

Colors, dashes, symbols and width

4 important parameters :

I pch : to choose the type of point (circle, triangle, etc.),I lty : (line type) to choose the line type,I col : (color) to choose the color,I lwd : (line width) to set the width.

legend

Argument Meaning

x, y Legend position. . .legend Legend text.‘bty Type of box = "o" (with) or "n" (without).

Ex. 9

Add a legend to this graph

plot(1:10, type="b", col="steelblue", lwd=2)

.

1. Add a legend at the following coordinates: (1, 7).2. Add a legend without a box around it, in the upper left corner

of the graph.3. Add the legend wherever you want it with locator(1).

Combining plots. . .

. . . is easy with layout!

1. Create the layout, a matrix indicating the positions andorders of the plots.

2. plot the graphs to populate the layout.

Ex:

x <- rnorm(100) # DataM <- rbind(1, 2:3) # 3 graphs in the layoutlayout(M) # Create the layout and put theplot(x) # 1st ...hist(x) # 2nd ...boxplot(x) # and 3rd graphs

Here is the layout we used:

1

2 3

The resulting plot

0 20 40 60 80 100

−2

−1

01

2

Index

x

Histogram of x

x

Fre

quen

cy

−3 −2 −1 0 1 2

05

1015

20

−2

−1

01

2