R language introduction

27
R Language [email protected]

description

Basic introduction to R Language. A quick guide

Transcript of R language introduction

Page 1: R language introduction

R Language

[email protected]

Page 2: R language introduction

• 1991: Created by Ross Ihaka and Robert Gentleman

• 2000: R version 1.0.0 is released• Latest version is 2.15.2 released in Oct ‘12• R version 2.15.3 is scheduled to release in Mar

‘12 and 3.0.0 is scheduled to be released in Apr ’13

• http://www.r-project.org (basic information about R)

• http://www.cran.r-project.org (base system and additional packages)

• help() or ?help, help.search() or ??help

Background

Page 3: R language introduction

04/12/2023

• R is a free software environment for statistical computing and graphics

• Very active and vibrant user community

• Graphical capabilities• Physical memory• Base R and around 4000 packages

Background

Page 4: R language introduction

Introduction

memory.limit(): To find out maximum amount of available physical memory

memory.size(): To find out how much memory is in use

getwd(): Shows the path of your current working directory

setwd(path): Allows you to set a new path for your current working directory

dir(): List down all the files in your working directory

Program Editor (open, load, run, save)

• ls(): List all objects in your workspace

• rm(): Removes object from your workspace

Page 5: R language introduction

Commands to R are expressions (4/3) or assignments (x <- 4/3) R is case sensitive Everything in R is a object Normally R objects are accessed by their names which is made up

from letters, and digits (0 to 9) or a period (“.”) in non-initial positions. Every object has a class R has 5 basic classes of objects character

numeric (real numbers)

integers

complex

logical (True / False)

• The most basic object is a vector A vector can only contains objects of the same class

Introduction

Page 6: R language introduction

• Ex. x <- 1 # assignment Print(x) # explicit printing X # auto printing

• Ex. x <- c(0.5, 0.6) # numeric X <- c(TRUE, FALSE) # logical X <- c(T, F) # logical X <- c(“a”, “b”, “c”, “d”) # character X <- 1:20 # integer X <- c(1+0i, 2+4i) #complex seq(from=1, to=10, by=1) rep(c(1,2,3,4,5), times=2, each=2)

• Q. X <- c(1.7, “a”) X <- c(TRUE, 2) X <- c(“a”, TRUE)

• When different objects are mixed in a vector, coercion occurs so that every element in the vector is of same class

Background

Page 7: R language introduction

• rep(x, times, length.out, each)

• rm()

• rm(list=ls(pattern=“^test”))

• rm(list=ls(pattern=“test”))

• rm(list=setdiff(ls(), “test”))

• rm(list=ls())

Session 1 Remaining

Page 8: R language introduction

• Ex. X <- 0:6

X <- c(“a”, “b”, “c”)

X <- c(1, 2, 3)

Numbers in R generally treated as numeric (i.e. double precision real numbers)

If you explicitly wants an integer then you need to specify L suffix

Special number Inf (1/0), it actually a real number, 1/Inf will give you 0

Undefined value NaN (0/0) Not a Number, it can be though of as missing value

• # indicates comments

Introduction

Page 9: R language introduction

R objects can have attributes (attributes()) Class (class())

Length (length())

names (colnames for a matrix), dimnames (rownames, colnames for a matrix)

dimensions (dim())

other user defined attributes

Various data types in R Vectors

Vector(mode, length)

• Lists: Special type of vectors which can contain objects of different classes.

x <- list(1,2,3,“a”,”b”,”c”)

x <- list(a=c(1,2,3), b=1:4, c=c(“a”,”b”,”c”))

Data Types

Page 10: R language introduction

Matrix: vectors with dimension attribute. Dimension itself is an integer vector of length 2 (nrow, ncol). Matrices are constructed column wise.

m <- matrix(nrow=2, ncol=3)

m <- matrix(1:6, nrow=2, ncol=3)

x <- 1:3

y <- 10:12

cbind(x, y)

rbind(x,y)

Data frames (data.frame()) https://stat.ethz.ch/pipermail/r-help/attachments/20101027/05

a229bb/attachment.pl Factors: Used for categorical data i.e. Male & Female or

analyst, senior analyst, manager etc. x <- factor(c(“a”, “b”, “b”, “c”, “c”, “c”, “d”))

levels()

unclass(x)

levels([4:6])

Levels([4:6, drop=TRUE])

Data Types

Page 11: R language introduction

Converting a character variable to a date variable as.Date(variable_name, input_format)

strptime(variable_name, input_format)

Output will be %Y-%m-%d %H:%M:%S

%Y: Year with century

%m: Month as decimal number (01-12)

%d: Day of the month as decimal number(01-31)

%H: Hrs as decimal numbers (00-23)

%M: Minutes as decimal numbers (00-59)

%S: Seconda as decimal numbers (00-59)

Converting a date variable to a character variable / formatting a date variable strftime(date_variable_name, output_format)

format(data_variable_name, output_format)

as.character(date_variable_name, output_format)

Date & Time

Page 12: R language introduction

[ always returns an object of the same class as the original; can be used to select more than one element

[[ is used to extract elements of list or data frames; it can only be used to extract single element and the class of the returned object will not necessarily be a list or data frame

$ is used to extract elements of a list or data frames by names; semantics are similar to [[

Sub-setting

Page 13: R language introduction

<: Less than

<=: Less than equals to

>: Greater than

>=: Greater than equals to

==: Exactly equals to

!=: Not equal to

| or II: OR

& or &&: AND

!: NOT

Operators

Page 14: R language introduction

x <- c(“a”, “b”, “c”, “c”, “d”, “a”) x[1], x[1:4], x[x > “a”], u <- x >”a”

x <- matrix(1:6,2,3) x[1,2], x[1,], x[,1], x[1,2, drop=FALSE]

x <- list(var_1=c(1:10), var_2=c(“a”, “b”, “c”), var_3=0.6) x[1], x[[1]], x$var_1

name <- “var_1”, x[name], x[[name]], x$name

x[c(1,3)], x[[c(1,3)]], x[[1]][[3]]

Produce a character vector containing var_1, var_2, var_3… var_999 Remove missing values from x <- c(1, 2, 3, NA, 4, 5, NA, 6) y <- c(“a”, “b”, NA, NA, “c”, “d”, “e”, “f”), prepare a matrix containing

two columns x & y and does not have any missing value What is the sum & mean of Wind for the observations which has

temperature greater then 60 & month equals to 5 How to create a new directory with a given name

Some Examples

Page 15: R language introduction

Principle functions for reading data into R read.table(), read.csv(): Used for reading tabular data

readLines(): For reading lines of a text file

source(): For reading in R code file

dget(): For reading in R code file

load(): For reading in saved workspaces

unserialize(): For reading single R objects in binary form

• Principle functions for writing data to files write.table()

writeLines()

dump()

dput()

save()

serialize()

Reading / Writing Data Set

Page 16: R language introduction

Read.table() is one of the most commonly used function for reading data. Few important arguments; file, name of the file to be read,

header, logical indicating if the file has a header line

sep, a string indicating how the columns are separated

colClasses, a character vector indicating class of each column in the dataset

nrows, the maximum number of rows to be read in the dataset

na.strings, a character vector of strings which are to be interpreted as NA values

comment.char, a character string indicating the comment character

skip, number of lines to skip from beginning

stringAsFactors, logical indicating should character variables be codes as factors

Write.table() X, the object to be written, preferable a matrix or a data frame

File, path and name of the file to be created

Sep, a string indicating how the columns are separated

Row.names, col.names, logical indicating whether the row names or col names to be written along with x

Importing / Exporting Data

Page 17: R language introduction

attach(x): For attaching a file

detach(x): For detaching a file

• summary(x): For displaying summary statistics of a data set

• str(x): For displaying summary statistics of a data set in a different manner then summary()

• sort(): For sorting a vector or factor

• order(): For ordering along more than one variable

• merge(): Merge two data frames by common columns or row names, or do other versions of database join operations

• cut(x, breaks, labels): Divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on. cut(x, 10, 1:10)

Data Summary / Manipulation

Page 18: R language introduction

• pretty(x, n): Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x.

pretty(x, 100)

• substr(x, start, stop) <- value: Extract or replace substrings in a character vector.

• strsplit(): Split the elements of a character vector x into substrings according to the matches to substring split within them.

• rank(): Returns the sample ranks of the values in a vector. Ties (i.e., equal values) and missing values can be handled in several ways

• aggregate(): Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

ddply(): For each subset of a data frame, apply function then combine results into a data frame.

Data Summary / Manipulation

Page 19: R language introduction

Allows you to control the flow of execution of the program if, else (testing a condition)

if (condition) {do something} else if {do something different} else {do something different}

for (executing a loop fixed number of times) for (i in 1:10) { do something}

while (executing a loop while a condition is true) while (condition) { do something}

repeat (execute a infinite loop)

break (break the execution of a loop)

next (skip a iteration of a loop)

return (exit a function)

Create a vector with all integers from 1 to 1000 and replace all even number by their inverse

Control Structures

Page 20: R language introduction

lapply: Returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X lapply(airquality, mean)

Calculate sum of all the variables of the airquality dataset excluding NAs

sapply: Sapply is a user-friendly version of lapply by default returning a vector or matrix if appropriate sapply(airquality, mean)

Repeat the problem present in lapply using sapply and see the difference

apply: Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix apply(airquality, 1, sum)

Calculate deciles including min and max of all the variables of the dataset airquality excluding NAs

Calculate square of each element of a matrix with dimensions 10 & 2 and entries 1 to 20

Loop Functions

Page 21: R language introduction

tapply: Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors tapply(airquality$Ozone, aiqruality$Month, sum)

Calculate sum of Ozone variable for observations having month equals to 5

mapply: mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each argument, the second elements, the third elements, and so on mapply(rep, 1:4, 4:1)

Calculate sum of two lists with dimensions 10 & 2 and having entries 1 to 20, 101 to 120, 201 to 220 & 301 to 320

Loop Functions

Page 22: R language introduction

plot(x,y)

hist(x)

par() pch: plotting symbol

lty: line type

lwd: line width

col: plotting color

las: axis label orientation

bg: background color

mar: margin size

oma: outer margin size

mfrow: number of plots per row, column (plots are filled row-wise)

mfcol: number of plots per row, column (plots are filled column-wise)

Plotting Functions

Page 23: R language introduction

lines: add lines to the plot

points: add points to the plot

text: add text labels to the plot

title: add annotations to x, y axis labels, title, subtitle, outer margin

mtext: add text to the margins of the plot

axis: adding axis ticks/labels

Plotting Functions

Page 24: R language introduction

function () Exact match –> Partial match –> Positional match

Return value of a function is the last expression in the function body to be evaluated

Functions can be nested, so that a function can be defined inside another function

Functions can be passed as arguments to other functions

Functions

Page 25: R language introduction

• Primary tools for debugging functions in R traceback: prints out the function call stack after an error occurs;

does nothing if there is no error

debug: flags a function for debug mode which allows you to step through execution of a function one line at a time

browser: suspends the execution of a function whenever it is called and puts the function in debug mode

trace: allows you to insert debugging code into a function at specific places

recover: allows you to modify the error behavior so that you can browse the function call stack

Debugging

Page 26: R language introduction

Indications that something’s is not right message: a generic notification/diagnostic message produced

by the message function; execution of the function continues

warning: an indication that something is wrong but not necessarily fatal produced by warning function’ execution of the function continues

error: an indication that a fatal problem has occurred produced by stop function; execution stops

condition: a generic concept for indicating that something unexpected can occur; programmers can create their own conditions

Debugging

Page 27: R language introduction

Thanks a lot For Question Read more

[email protected]