Regression and Correlation Analysis - Regression and Correlation Analysis
-
Upload
vivay-salazar -
Category
Documents
-
view
777 -
download
17
Transcript of Regression and Correlation Analysis - Regression and Correlation Analysis
Regression and Correlation
Analysis
Leilani NoraAssistant Scientist
Introduction to R:
Data Management and Statistical
Analysis CORRELATION ANALYSIS
DATAFRAME : corr.csv
• Consider the data for
grain yield and N,P,K
content of the plant taken from several samples.
DATA FRAME:GYNPK
Read data file corr.csv
> GYNPK <- read.table(“corr.csv",header=T,
sep=",")
> GYNPK
GY14 N P K
1 1678 1.0 0.1 0.4
2 4265 1.2 0.1 0.4
3 2431 1.1 0.1 0.4
4 2431 1.0 0.1 0.4
5 4461 1.2 0.1 0.4
. . .
48 5483 1.7 0.2 0.3
Usage
> correlation(x, y=NULL, method=“pearson”,
alternative=“two.sided”,…)
# x and y – table, matrix or vector
# method – “pearson”, “kendall”, “spearman”
# alternative – “two.sided”, “less”, “greater”
CORRELATION ANALYSIS : correlation()
• correlation() obtains the coefficients of correlation and
p-value between all the variables of a data table. The
results are similar to SAS.
• Required package is agricolae.
> library(agricolae)
> corrGY <- correlation(GYNPK)
> corrGY
CORRELATION ANALYSIS : correlation()
$correlation
GY14 N P K
GY14 1.00 0.72 0.38 -0.40
N 0.72 1.00 0.02 -0.34
P 0.38 0.02 1.00 -0.35
K -0.40 -0.34 -0.35 1.00
$pvalue
GY14 N P K
GY14 1.000000e+00 1.084596e-08 0.007979778 0.005289776
N 1.084596e-08 1.000000e+00 0.868611288 0.017985414
P 7.979778e-03 8.686113e-01 1.000000000 0.016134208
K 5.289776e-03 1.798541e-02 0.016134208 1.000000000
$n.obs
[1] 48
• Package ‘Deducer’ is an intuitive graphical data
analysis for use with JGR.
• JGR is a Java Gui for R, a cross platform, universal
and unified Graphical User Interface for R
• This package was released last August 2, 2009 with 33
functions.
• One of the functions in package Deducer is the
cor.matrix()
CORRELATION ANALYSIS : cor.matrix()
Usage
> cor.matrix(variables, data,
test=cor.test, method …)
# variables – an expression denoting a set of variable
# test – a function to test significance of the correlation
coefficient
CORRELATION ANALYSIS : cor.matrix()
# data – a data frame
• cor.matrix() creates a correlation matrix with a function
to test the significance of the correlation coefficient, r.
# method – “pearson”, “kendall”, “spearman”
CORRELATION ANALYSIS: cor.matrix()
> library(Deducer)
> corrGY2 <- cor.matrix(GY14:K,data=GYNPK)
> corrGY2
Pearson's product-moment correlation
GY14 N P K
GY14 cor 1 0.7157 0.3785 -0.3964
N 48 48 48 48
CI* (0.5417,0.8309) (0.1058,0.5983) (-0.6116,-0.1265)
stat** 6.95 (46) 2.774 (46) -2.928 (46)
p-value 0.0000 0.0080 0.0180
-----------
N . . .
P . . .
K . . .
-----------
HA: two.sided
Usage
> print.cor.matrix(x, digits=4, N=TRUE,
CI=TRUE, stat=TRUE, p.value=TRUE,…)
# x - object of class “cor.matrix”
# digits - Number of digits to round
# N - logical, prints a row for sample size
# CI - logical, prints a row for CI if they exist
# stat - logical, prints a row for test statistics
# p.value - logical, prints a row for p-values
CORRELATION ANALYSIS : print.cor.matrix()
• print.cor.matrix() print object “cor.matrix” in a nice layout
CORRELATION ANALYSIS: cor.matrix()
> print.cor.matrix(corrGY2, digits=4,
N=FALSE, CI=FALSE, stat=FALSE)
Pearson's product-moment correlation
GY14 N P K
GY14 cor 1 0.7157 0.3785 -0.3964
p-value 0.0000 0.0080 0.0053
-----------
N cor 0.7157 1 0.02452 -0.3402
p-value 0.0000 0.8686 0.0180
-----------
P cor 0.3785 0.02452 1 -0.3456
p-value 0.0080 0.8686 0.0161
-----------
K cor -0.3964 -0.3402 -0.3456 1
p-value 0.0053 0.0180 0.0161
----------- HA: two.sided
REGRESSION ANALYSIS
DATAFRAME : SRATE.csv
• Consider grain yield data for six levels of rates of seedlings.
DATA FRAME:SRATE
Read data file corr.csv
> SRATE <- read.table(“SRATE.csv",
header=T, sep=",")
> SRATE
Seedrate GYield
1 25 5.30425
2 50 5.12400
3 75 5.07025
4 100 4.84775
5 125 4.70800
6 150 4.70325
REGRESSION ANALYSIS : lm()
• lm() which stands for Linear Model, fits linear models
which can be used to carry out regression, single stratum
ANOVA, ANACOVA and multiple linear regression
Usage
> lm(formula, data, na.action, model=TRUE,…)
# formula – a model formula. A typical model has the
form “response ~ terms”
# data – dataframe
# na.action – when the data contains NAs the default
is “na.omit” and “na.exclude” can also be useful
# model – logical, if TRUE the corresponding components of the fit are returned.
Call:
lm(formula = SRATE$GYield ~ SRATE$Seedrate)
Coefficients:
(Intercept) SRATE$Seedrate
5.324283 -0.004168
> ModelGY <- lm(SRATE$GYield~SRATE$Seedrate)
> ModelGY
REGRESSION ANALYSIS : lm()
• The result of lm is model object.
REGRESSION ANALYSIS : summary()
• The function summary is used to obtain and print a
summary and ANOVA table of the results.
> summary(ModelGY)
Residuals:
1 2 3 4 5 6
0.292567 -0.096083 -0.045633 -0.059733 -0.095283 0.004167
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.324283 0.154081 34.555 4.18e-06 ***
SRATE$Seedrate -0.004168 0.001583 -2.634 0.058 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1655 on 4 degrees of freedom
Multiple R-squared: 0.6342, Adjusted R-squared: 0.5428
F-statistic: 6.936 on 1 and 4 DF, p-value: 0.05796
SCATTERPLOT : plot() and abline()
> plot(SRATE$Seedrate, SRATE$GYield,
main="ScatterPlot of Mean Yield",
xlab=“Seedrate", ylab=“Mean
Yield", col="Red")
> abline(ModelGY, col="blue", lty=3)
• abline(lm.object) displays a fitted line which draw
lines of the intercept(a) and slope(b) from the lm
object.
• lm.object – regression object where the first two
values are taken to be the intercept and slope.
SCATTERPLOT : mtext()
> mtext(“GYield=(5.324-0.0042Seedrate) with
r=-0.7964", side=3, cex=0.7)
• mtext(text, side=3…) displays text on top of the plot
# text – a character expression specifying the text to be
written
# side – on which side of the plot you want to display a
text
1 – bottom 2 – left
3 – top 4 – right
SCATTERPLOT : title() and mtext()
> plot(…) # same as previous slide
> abline(…) # same as previous slide
> mtext(“GYield=(5.324-0.0042Seedrate)
with r=-0.9773", side=3, cex=0.7)
SCATTERPLOT
20 40 60 80 100 120 140
4.7
4.8
4.9
5.0
5.1
5.2
5.3
Seedrate
Mean of yield
ScatterPlot of Mean Yield
GYield=(5.324-0.0042Seedrate) with r=-0.7964
RESIDUAL PLOT
> plot(ModelGY$fitted.values,
ModelGY$residual, main=
“Residual Plot”, xlab="Fitted",
ylab="Residuals", col="red")
> abline(h=0, col="blue", lty=3)
# draws a horizontal line at Y=0 with colored blue
dotted line
RESIDUAL PLOT
THANK YOU! ☺☺☺☺
Please do Exercise E