Regression and Correlation Analysis - Regression and Correlation Analysis

Regression and Correlation

Analysis

Leilani NoraAssistant Scientist

Introduction to R:

Data Management and Statistical

Analysis CORRELATION ANALYSIS

DATAFRAME : corr.csv

• Consider the data for

grain yield and N,P,K

content of the plant taken from several samples.

DATA FRAME:GYNPK

Read data file corr.csv

> GYNPK <- read.table(“corr.csv",header=T,

sep=",")

> GYNPK

GY14 N P K

1 1678 1.0 0.1 0.4

2 4265 1.2 0.1 0.4

3 2431 1.1 0.1 0.4

4 2431 1.0 0.1 0.4

5 4461 1.2 0.1 0.4

. . .

48 5483 1.7 0.2 0.3

Usage

> correlation(x, y=NULL, method=“pearson”,

alternative=“two.sided”,…)

# x and y – table, matrix or vector

# method – “pearson”, “kendall”, “spearman”

# alternative – “two.sided”, “less”, “greater”

CORRELATION ANALYSIS : correlation()

• correlation() obtains the coefficients of correlation and

p-value between all the variables of a data table. The

results are similar to SAS.

• Required package is agricolae.

> library(agricolae)

> corrGY <- correlation(GYNPK)

> corrGY

CORRELATION ANALYSIS : correlation()

$correlation

GY14 N P K

GY14 1.00 0.72 0.38 -0.40

N 0.72 1.00 0.02 -0.34

P 0.38 0.02 1.00 -0.35

K -0.40 -0.34 -0.35 1.00

$pvalue

GY14 N P K

GY14 1.000000e+00 1.084596e-08 0.007979778 0.005289776

N 1.084596e-08 1.000000e+00 0.868611288 0.017985414

P 7.979778e-03 8.686113e-01 1.000000000 0.016134208

K 5.289776e-03 1.798541e-02 0.016134208 1.000000000

$n.obs

[1] 48

• Package ‘Deducer’ is an intuitive graphical data

analysis for use with JGR.

• JGR is a Java Gui for R, a cross platform, universal

and unified Graphical User Interface for R

• This package was released last August 2, 2009 with 33

functions.

• One of the functions in package Deducer is the

cor.matrix()

CORRELATION ANALYSIS : cor.matrix()

Usage

> cor.matrix(variables, data,

test=cor.test, method …)

# variables – an expression denoting a set of variable

# test – a function to test significance of the correlation

coefficient

CORRELATION ANALYSIS : cor.matrix()

# data – a data frame

• cor.matrix() creates a correlation matrix with a function

to test the significance of the correlation coefficient, r.

# method – “pearson”, “kendall”, “spearman”

CORRELATION ANALYSIS: cor.matrix()

> library(Deducer)

> corrGY2 <- cor.matrix(GY14:K,data=GYNPK)

> corrGY2

Pearson's product-moment correlation

GY14 N P K

GY14 cor 1 0.7157 0.3785 -0.3964

N 48 48 48 48

CI* (0.5417,0.8309) (0.1058,0.5983) (-0.6116,-0.1265)

stat** 6.95 (46) 2.774 (46) -2.928 (46)

p-value 0.0000 0.0080 0.0180

-----------

N . . .

P . . .

K . . .

-----------

HA: two.sided

Usage

> print.cor.matrix(x, digits=4, N=TRUE,

CI=TRUE, stat=TRUE, p.value=TRUE,…)

# x - object of class “cor.matrix”

# digits - Number of digits to round

# N - logical, prints a row for sample size

# CI - logical, prints a row for CI if they exist

# stat - logical, prints a row for test statistics

# p.value - logical, prints a row for p-values

CORRELATION ANALYSIS : print.cor.matrix()

• print.cor.matrix() print object “cor.matrix” in a nice layout

CORRELATION ANALYSIS: cor.matrix()

> print.cor.matrix(corrGY2, digits=4,

N=FALSE, CI=FALSE, stat=FALSE)

Pearson's product-moment correlation

GY14 N P K

GY14 cor 1 0.7157 0.3785 -0.3964

p-value 0.0000 0.0080 0.0053

-----------

N cor 0.7157 1 0.02452 -0.3402

p-value 0.0000 0.8686 0.0180

-----------

P cor 0.3785 0.02452 1 -0.3456

p-value 0.0080 0.8686 0.0161

-----------

K cor -0.3964 -0.3402 -0.3456 1

p-value 0.0053 0.0180 0.0161

----------- HA: two.sided

REGRESSION ANALYSIS

DATAFRAME : SRATE.csv

• Consider grain yield data for six levels of rates of seedlings.

DATA FRAME:SRATE

Read data file corr.csv

> SRATE <- read.table(“SRATE.csv",

header=T, sep=",")

> SRATE

Seedrate GYield

1 25 5.30425

2 50 5.12400

3 75 5.07025

4 100 4.84775

5 125 4.70800

6 150 4.70325

REGRESSION ANALYSIS : lm()

• lm() which stands for Linear Model, fits linear models

which can be used to carry out regression, single stratum

ANOVA, ANACOVA and multiple linear regression

Usage

> lm(formula, data, na.action, model=TRUE,…)

# formula – a model formula. A typical model has the

form “response ~ terms”

# data – dataframe

# na.action – when the data contains NAs the default

is “na.omit” and “na.exclude” can also be useful

# model – logical, if TRUE the corresponding components of the fit are returned.

Call:

lm(formula = SRATE$GYield ~ SRATE$Seedrate)

Coefficients:

(Intercept) SRATE$Seedrate

5.324283 -0.004168

> ModelGY <- lm(SRATE$GYield~SRATE$Seedrate)

> ModelGY

REGRESSION ANALYSIS : lm()

• The result of lm is model object.

REGRESSION ANALYSIS : summary()

• The function summary is used to obtain and print a

summary and ANOVA table of the results.

> summary(ModelGY)

Residuals:

1 2 3 4 5 6

0.292567 -0.096083 -0.045633 -0.059733 -0.095283 0.004167

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.324283 0.154081 34.555 4.18e-06 ***

SRATE$Seedrate -0.004168 0.001583 -2.634 0.058 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1655 on 4 degrees of freedom

Multiple R-squared: 0.6342, Adjusted R-squared: 0.5428

F-statistic: 6.936 on 1 and 4 DF, p-value: 0.05796

SCATTERPLOT : plot() and abline()

> plot(SRATE$Seedrate, SRATE$GYield,

main="ScatterPlot of Mean Yield",

xlab=“Seedrate", ylab=“Mean

Yield", col="Red")

> abline(ModelGY, col="blue", lty=3)

• abline(lm.object) displays a fitted line which draw

lines of the intercept(a) and slope(b) from the lm

object.

• lm.object – regression object where the first two

values are taken to be the intercept and slope.

SCATTERPLOT : mtext()

> mtext(“GYield=(5.324-0.0042Seedrate) with

r=-0.7964", side=3, cex=0.7)

• mtext(text, side=3…) displays text on top of the plot

# text – a character expression specifying the text to be

written

# side – on which side of the plot you want to display a

text

1 – bottom 2 – left

3 – top 4 – right

SCATTERPLOT : title() and mtext()

> plot(…) # same as previous slide

> abline(…) # same as previous slide

> mtext(“GYield=(5.324-0.0042Seedrate)

with r=-0.9773", side=3, cex=0.7)

SCATTERPLOT

20 40 60 80 100 120 140

4.7

4.8

4.9

5.0

5.1

5.2

5.3

Seedrate

Mean of yield

ScatterPlot of Mean Yield

GYield=(5.324-0.0042Seedrate) with r=-0.7964

RESIDUAL PLOT

> plot(ModelGY$fitted.values,

ModelGY$residual, main=

“Residual Plot”, xlab="Fitted",

ylab="Residuals", col="red")

> abline(h=0, col="blue", lty=3)

# draws a horizontal line at Y=0 with colored blue

dotted line

RESIDUAL PLOT

THANK YOU! ☺☺☺☺

Please do Exercise E

Regression and Correlation Analysis - Regression and Correlation Analysis

Documents

Transcript of Regression and Correlation Analysis - Regression and Correlation Analysis