Data Manipulation and Statistical analysis - Analysis of Variance

8
Presentation Title Goes Here presentation subtitle. Introduction to R: Data Manipulation and Statistical analysis Violeta I. Bartolome Senior Associate Scientist Crop Research Informatics Laboratory International Rice Research Institute Analysis of Variance Sample Data from RCB Experiment Save data in comma delimited format Import file to R mydata <- read.table( “RCBdata.csv", # name of csv file sep = ",", # separator header = TRUE)

Transcript of Data Manipulation and Statistical analysis - Analysis of Variance

Page 1: Data Manipulation and Statistical analysis - Analysis of Variance

Presentation Title Goes Here…presentation subtitle.

Introduction to R:

Data Manipulation and Statistical analysis

Violeta I. BartolomeSenior Associate Scientist

Crop Research Informatics Laboratory

International Rice Research Institute

Analysis of Variance

Sample Data from RCB Experiment

Save data in comma

delimited format

Import file to R

mydata <- read.table(

“RCBdata.csv", # name of csv file

sep = ",", # separator

header = TRUE)

Page 2: Data Manipulation and Statistical analysis - Analysis of Variance

factor ()

• factor() is used to encode a vector as a factor

• Example

forward

Factor

• A vector object used to specify a

discrete classification (grouping) of the

components of other vectors of the

same length

• Two types of factoro Unordered – levels are binary or nominal

o Ordered – levels are quantitative

back

levels()

• provides access to the levels

attribute of a variable.

• Usage:

levels(x) # returns the value of the levels

levels(x) <- value # sets the attribute

• Example

aov()

• Fit an analysis of variance model

• Usage

aov(formula, data = NULL)

• formula

response ~ mean.formula + Error(strata.formula)

• Model for an RCB single factor design:

Page 3: Data Manipulation and Statistical analysis - Analysis of Variance

aov()

• RCB factorial design

• Split-plot design

• Split-split-plot design

• Strip-plot design

Defining ANOVA Models

• a + b # additive effects of a and b

• a:b # interaction between a and b

• a*b # same as a + b + a:b

• ^n # includes all interactions up to level n

Ex. (a+b+c)^2 is identical to

a + b + c + a:b + a:c + b:c

• b%in%a # effects of b are nested in a

a/b # identical to a+a:b

• a-b # removes the effect of b

Ex. y~x-1 forces a model without an intercept

summary ()

• generic function used to produce

summaries of the results of various

model fitting functions

• Example:

residuals()

• generic function which extracts

model residuals from objects

returned by modeling functions

• Example:

Page 4: Data Manipulation and Statistical analysis - Analysis of Variance

predict()

• generic function for predictions from

the results of various model fitting

functions

• Example

coef()

• generic function which extracts

model coefficients from objects

returned by modeling functions

• Example

df.residual()

• Returns the residual degrees-of-

freedom extracted from a fitted

model object

• Example

deviance()

• Returns the deviance or residual sum

of squares of a fitted model object

• Example

Page 5: Data Manipulation and Statistical analysis - Analysis of Variance

names()

• Function to list the elements in an

object

• Example

Residual plot

Comparing treatment means

• Using agricolae package

library (agricolae)

LSD.test (y, # response variable

trt, # variable whose levels # are to be compared

DFerror, # error df

MSerror, # Mean Square of error

alpha = 0.05, # level of sig

group=TRUE, # TRUE or FALSE

main = NULL) # Title

HSD.test() can be use if number of treatments is 6 or more.

Page 6: Data Manipulation and Statistical analysis - Analysis of Variance

Bar Graph with mean comparison Split-plot

Source of Variation df

Block r-1

Factor A (A) a-1

Error (a) (r-1)(a-1)

Factor B (B) b-1

A x B (a-1)(b-1)

Error (b) a(r-1)(b-1)

Total rab-1

ANOVA for Split-plot Design

Error (a)

Error (b)

Type of pair comparison

sed t-value

Number Between

1 Two main-plot means

(averaged over all subplot

treatments)

2 Two subplot means

(averaged over all main-plot

treatments)

3 Two subplot means at the

same main-plot treatment

4 Two main-plot means at the

same or different subplot

treatment

SEDs for Split-plot

),( bdfαtinv

),( bdfαtinv

rb

E2 a ),( adfαtinv

ra

E2 b

r

E2 b

( )[ ]rb

EE1b2 ab+− ),( abdfαtinv

Page 7: Data Manipulation and Statistical analysis - Analysis of Variance

Satterthwaite degrees of freedom

[ ][ ]

b

b

a

a

baab

df

Eb

df

E

EbEdf

2

2

)1(

)1(2

−+

−+=

Degrees of freedom and Error MS

Strip-plot

Source of Variation df

Block r-1

Horizontal Factor (H) h-1

Error (a) (r-1)(h-1)

Vertical Factor (V) v-1

Error (b) (r-1)(v-1)

H x V (h-1)(v-1)

Error (c) (r-1)(h-1)(v-1)

Total rhv-1

ANOVA for Strip plot design

Error(a)

Error(b)

Error(c)

Page 8: Data Manipulation and Statistical analysis - Analysis of Variance

SEDs for Strip-plot

Type of pair comparisonsed t-value

Number Between

1 Two horizontal means

(averaged over all vertical

treatments)

2 Two vertical means

(averaged over all horizontal

treatments)

3 Two vertical means at the

same horizontal treatment

4 Two horizontal means at the

same vertical treatment

),( adfαtinv

),( bdfαtinv

),( bcdfαtinv

rb

E2 a

ra

E2 b

( )[ ]ra

EE1a2 bc+−

( )[ ]rb

EE1b2 ac+−),( acdfαtinv

Satterthwaite degrees of

freedom

[ ][ ]

c

c

a

a

caac

df

Eb

df

E

EbEdf

2

2

)1(

)1(2

−+

−+=

[ ][ ]

dfc

Eca

dfb

Eb

EcaEbdfbc

22

2

)1(

)1(

−+

−+=

Degrees of freedom and EMS