Corellation analysis notes
-
Upload
muthama-jae -
Category
Education
-
view
119 -
download
0
Transcript of Corellation analysis notes
COMPILED BY MUTHAMA, JAPHETH MUTINDA
CORRELATION
INTRODUCTION
Objectives of the presentation
After going through this presentation, the listener is expected to:1.Be able to present the results of analysed research data.2.Make effective interpretation of the relationship between research variables3.Draw implications or inferences from the variables in the study model
Definition Correlation (r) is the statistical measure of how two Variables move
in relation to each other.
It measures the relative strength of the relationship between two
variables
Correlation is computed into what is known as the correlation
coefficient, which ranges between -1 and +1.
Coefficient of correlation
Coefficient of correlation is the technique of determining the degree of
correlation between two or more variables in different values of the study
variables
The correlation, if any, found through this approach is applied in a
statistical method to deal with the formulation of mathematical model
depicting relationship amongst variables which can be used for the purpose
of prediction of the values of dependent variable, given the values of the
independent variable
The sample correlation coefficient (r) measures the degree of linearity in the relationship between X and Y.
-1 < r < +1
r = 0 : Indicates no linear relationship between the research variables
-1 < r < +1 The + and – signs are used for explaining the positive linear correlations
and negative linear correlations respectively
Coefficient of Correlation Analysis
Strong negative relationship Strong positive
relationship
Interpreting Correlation Coefficient (r)
1) Strong correlation: r > 0.70 or r < –0.70
2) Moderate correlation: r is between 0.30 and 0.70
or r is between –0.30 and –0.70
3) Weak correlation: r is between 0 and 0.30 or r is between 0 and –0.30 .
Methods of studying Correlation
Correlation can be determined by use of the following method;
1.A Scatter Diagram Method
2.Karl Pearson Coefficient Correlation of Method
3.Spearman’s Rank Correlation Method
SCATTER DIAGRAMS This is a graph in which the individual data points are plotted in two-dimensions as presented below;
Very good fit Moderate fit
Points clustered closely around a line show a strong correlation. The line is a good predictor (good fit) with the data. The more spread out the points, the weaker the correlation, and the less good the fit.
The line is a REGRESSSION line (Y = a + bX)
Strong relationship simply means a good linear fit
Coefficient of determination and the regression line
NOTE:
1. The coefficient of determination is a measure of how well the
regression line represents the data and therefore represents the percent of
the data that is the closest to the line of best fit2. If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variation
3. The further the line is away from the points, the less it is able to explain the variation
Cont…
For example in the case of variables X and Y:If the r = 0.922, then r 2 = 0.850
Which means that 85% of the total variation in y can be explained by the linear relationship between x and y (as described by the regression equation)
This therefore means that, the other 15% of the total variation in y remains unexplained
Karl Pearson’s coefficient of correlation (or simple correlation)
This is the most widely used method of measuring the degree of relationship between two variables. Its defined as the measure of the strength of the linear relationship between two variables that is defined in terms of the (sample) covariance of the variables divided by their (sample) standard deviations.This coefficient assumes the following:(i) that there is linear relationship between the two variables;(ii) that the two variables are casually related which means that one of the variables is independent and the other one is dependent (iii) A large number of independent causes are operating in both variables so as to produce a normal distribution.
2222 )Y(Yn )X(Xn
YXXYn
r xy
- Shared variability of X and Y variables - on the top
- Individual variability of X and Y variables- At the bottom
Karl Pearson’s coefficient of correlation can be worked out thus
OR
yxryx .
),cov(
Illistration
From the following data find the coefficient of correlation by Karl Pearson method
X: 6, 2, 10, 4, 8Y: 9, 11, 5, 8, 7
Sol.cont.
92.080026
20.4026
.
.
8540
65
30
22
yx
yxr
NY
Y
NX
X
Spearman's rank coefficient
This is the technique of determining the degree of correlation between two
variables incase of ordinal data where ranks are given to different values of
the variables.
The main objective of the coefficient is to determine the extend to which
the two sets of ranking are similar or dissimilar.
This method is only used to determine correlation when the data is not
available in numerical form
Thus when the values of the two variables are converted to their ranks and
the correlation is obtained, the correlation is known as rank correlation
Computation of Rank Correlation
Spearman’s rank correlation coefficient ρ can be calculated when
• Actual ranks given
• Ranks are not given but grades are given but not repeated
• Ranks are not given and grades are given and repeated
yofrankRXofrankR
RRDwhere
NND
R
y
x
yx
..
..
)1(6
1 2
2
Illustration
Calculate the spearman’s rank correlation coefficient between advertisement cost and sales from the following data
Advertisement cost : 39, 65, 62, 90, 82, 75, 25, 98, 36, 78Sales(Shs): 47, 53, 58, 86, 62, 68, 60, 91, 51, 84
X Y R-x R-y D39 47 8 10 -2 465 53 6 8 -2 462 58 7 7 0 090 86 2 2 0 082 62 3 5 -2 475 68 5 4 1 125 60 10 6 4 1698 91 1 1 0 036 51 9 9 0 078 84 4 3 1 1
30
2D
Cont….
82.09901801
1010)30(61
61
3
3
2
R
R
R
NND
R
Nonlinear Relationships
In correlation analysis, not all relationships are linear.
In cases where there is clear evidence of a nonlinear relationship DO NOT use Pearson’s Product Moment Correlation ( r ) to summarize the strength of the relationship between Y and X.
Non linear correlation Scatter graph
Conclusions
Correlation is the linear association between two numeric variables e.g variables X and Y.The correlation (r) ranges from -1 to +1where-1 < r < 1If r < 0 then there is a negative correlation between X and Y, i.e. as X increases Y generally decreasesIf r > 0 then there is a positive correlation between X and Y, i.e. as X increases Y generally increasesThe close r is to 0 the weaker the linear association between X and Y.
A diagram explaining different strengths of correlationsThe value of r ranges between ( -1) and ( +1)The value of r denotes the strength of the association as illustratedby the following diagram.
-1 10-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no relation
perfect correlation
perfect correlation
Directindirect
Example of graphs and their interpretationNegative and positive correlations
No Relationship (r = .00)
Information about Explanatory Flexibility tells you nothing about Emotional Insight
Explanatory Flexibility
3.53.02.52.01.51.0.50.0-.5
AS
IS -
Em
otio
nal I
nsig
ht
8
7
6
5
4
3
2
1
REFERENCESDhrymes, P. J.: Econometrics: Statistical Foundations and Applications, Harper & Row, New York, 1970.Fomby, Thomas B., Carter R. Hill, and Stanley R. Johnson: Advanced Econometric Methods, Springer-Verlag, New York, 1984.Goldberger, A. S.: A Course in Econometrics, Harvard University Press, Cambridge, Mass., 1991.Harvey, A. C.: The Econometric Analysis of Time Series, 2d ed., MIT Press, Cambridge, Mass., 1990.Kothari CR, Research methodology: an introduction. New Delhi, Vikas publishing house Pvt ltd 2000Emory C William, Business research methods. Illinois: Richard D. Irwin, Inc. Homewood 2001
THANK YOU