R part iii

9
Data available in R The command data() loads data-sets available in R head() and tail() command displays first few or last few values str() shows the structure of an R object class() shows the class of an R object What does “ts” stand for? > data() > data("AirPassengers") > head(AirPassengers) [1] 112 118 132 129 121 135 > tail(AirPassengers) [1] 622 606 508 461 390 432 > str(AirPassengers) Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ... > class(AirPassengers) [1] "ts" > help(ts)

description

Praxis Weekend Business Analytics

Transcript of R part iii

Page 1: R part iii

Data available in R

• The command data() loads data-sets available in R• head() and tail() command displays first few or last few

values• str() shows the structure of an R object• class() shows the class of an R object• What does “ts” stand for?

> data()> data("AirPassengers")> head(AirPassengers)[1] 112 118 132 129 121 135> tail(AirPassengers)[1] 622 606 508 461 390 432> str(AirPassengers) Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ... > class(AirPassengers)[1] "ts"> help(ts)

Page 2: R part iii

Try runif() and plot() commands ….runif(10) [1] 0.14350413 0.54293576 0.62881627 0.30278850 0.28030129 0.03784996 0.49483957 [8] 0.23571517 0.40072956 0.20327478> plot(runif(10))

The runif() command generates U(0,1)10 random numbers between 0 and 1.

These numbers have been plotted by the plot() function.

Page 3: R part iii

A dataset in R: iris> head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa

The iris dataset contains measurement of 150 flowers, 50 each from 3 species : iris setosa, versicolor and virginica.

> tail(iris)

Sepal.Length Sepal.Width Petal.Length Petal.Width Species145 6.7 3.3 5.7 2.5 virginica146 6.7 3.0 5.2 2.3 virginica147 6.3 2.5 5.0 1.9 virginica148 6.5 3.0 5.2 2.0 virginica149 6.2 3.4 5.4 2.3 virginica150 5.9 3.0 5.1 1.8 virginica

Page 4: R part iii

Data frame in R: iris

• As you see, iris is not a simple vector but a composite “data frame” object made up of several component vectors as you can see in the output of class(iris)

• You can think of a data frame as a matrix-like object - each row for each observational unit (here, a flower) - each column for each measurement made on the unit• But the str() function gives you more concise description

on iris.

> str(iris)'data.frame':150 obs. of 5 variables: $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...> class(iris)[1] "data.frame"

Page 5: R part iii

Use of $ operator: iris

Note that $-operator extracts individual components of a data frame. Try summary() and IQR() commands on iris$Sepal.Length and study the data

> iris$Sepal.Length [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7 5.1 [21] 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 [41] 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 [61] 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 [81] 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7[101] 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0[121] 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9[141] 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9

Page 6: R part iii

summary() command: iris

• Note the different output formats of using summary()• Species is summarized (by frequency distribution) as it is a

categorical variable • The entire data frame iris is summarized by combining the

summaries of its components

> summary(iris$Sepal.Length) Min. 1st Qu. Median Mean 3rd Qu. Max. 4.300 5.100 5.800 5.843 6.400 7.900 > summary(iris$Species) setosa versicolor virginica 50 50 50 > summary(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50 Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50 Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500

Page 7: R part iii

class() command: iris

• Note that each R object has a class (“numeric”, “factor” etc.) • summary() is referred to as a generic function• When summary() is applied, R figures out the appropriate

method and calls it

> class(iris$Sepal.Length)[1] "numeric"> class(iris$Species)[1] "factor"> class(iris)[1] "data.frame"

Page 8: R part iii

More on summary() command

• Objects of class “factor” are handled by summary.factor() • “data.frame”s are handled by summary.data.frame()• Numeric vectors are handled by summary.default()

> methods(summary) [1] summary.aov summary.aovlist summary.aspell* [4] summary.connection summary.data.frame summary.Date [7] summary.default summary.ecdf* summary.factor [10] summary.glm summary.infl summary.lm [13] summary.loess* summary.manova summary.matrix [16] summary.mlm summary.nls* summary.packageStatus* [19] summary.PDF_Dictionary* summary.PDF_Stream* summary.POSIXct [22] summary.POSIXlt summary.ppr* summary.prcomp* [25] summary.princomp* summary.srcfile summary.srcref [28] summary.stepfun summary.stl* summary.table [31] summary.tukeysmooth*   Non-visible functions are asterisked

Page 9: R part iii

Try the following ….

• attach() and detach() with iris• xx <- 1:12 and then dim(xx) <- c(3,4)• apply nrow(xx) and ncol(xx)• dim(xx) <- c(2,2,3)• yy <- matrix(1:12, nrows=3, byrow=TRUE• rownames(yy) <- LETTERS[1:3]• use colnames()• zz <- cbind(A=1:4, B=5:8, C=9:12)• rbind(zz,0)