Prediction. Prediction is a scientific guess of unknown by using the known data. Statistical...
-
Upload
oscar-barnett -
Category
Documents
-
view
218 -
download
0
Transcript of Prediction. Prediction is a scientific guess of unknown by using the known data. Statistical...
Prediction
Prediction
• Prediction is a scientific guess of unknown by using the known data.
• Statistical prediction is based on correlation. If the correlation among variables is high, the success in prediction will be high.– So, if there is a perfect correlation among variables,
then the prediction will be perfect. • Today, we will study on the bivariate linear
correlation and regression. That is, linear relation between two variables.
Prediction
• Today, we will study on the bivariate linear correlation and regression. That is, linear relation between two variables. – In a graduate level of statistic class, you will learn curvilinear
correlation/regression for U-shaped relations and Cannonical Correlation for the relations among three or more variables
• By using statistical predictions (regression), we can answer such questions:– What is the height of a person whose weight is 80 kg?– If a students’ OSS score is 168, what will be his/her GPA in
university?
Prediction
• As you remember, correlation coefficient provides us a measure of how strongly two variables are related. – Pearson r is one of the correlation coefficients and
it is calculated on the least squared deviations. • By using that method, we can draw a
hypothetic line between the scores on a scatter diagram.– This line represents the best fit of the predicted
scores
• In this example, the correlation between weight and height is perfect (r= +1). So, all the dots in the diagram are on the hypothetic line (the line of best fit).
• The formula of the line is Y=aX+c. For these distributions, it is Y=1x + 100.
Weight HeightAli 54 154Kezban 55 155Can 65 165Sevgi 45 145Kenan 65 165Arda 66 166Zehra 54 154Selin 56 156Nalan 67 167Serap 67 167
r=1 40 45 50 55 60 65 70130
135
140
145
150
155
160
165
170
Terminology:• The line in this diagram is regression line. • The equation of the line is regression equation (Y=aX+c).
– For correlation, it is not important which axis represents which variable, since there is no direction in correlation.
– For regression, Y axis is always shows the predicted (dependent) variable, and X axis represents the predictor (independent variable)
• Be careful about the language of regression. The equation is always read as regression of Y on X.
Weight Height
Ali 54 154
Kezban 55 155
Can 65 165
Sevgi 45 145
Kenan 65 165
Arda 66 166
Zehra 54 154
Selin 56 156
Nalan 67 167
Serap 67 167
r=1 40 45 50 55 60 65 70130
135
140
145
150
155
160
165
170
• Now let’s focus on the other examples in which the relation between two variables is less perfect.
• As you can see, when the correlation become less perfect, dots in the diagram starts to diverge from the regression line
Weight HeightAli 53 154Kezban 54 155Can 65 165Sevgi 43 145Kenan 68 165Arda 67 166Zehra 53 154Selin 54 156Nalan 65 167Serap 66 167
r=.986 40 45 50 55 60 65 70130
135
140
145
150
155
160
165
170
• In this example, the correlation is far from being perfect. – So, the dots are far away from the regression line.
Weight HeightAli 54 154Kezban 61 155Can 45 165Sevgi 46 145Kenan 72 165Arda 68 166Zehra 57 154Selin 71 156Nalan 47 167Serap 49 167
r=.08640 45 50 55 60 65 70 75
130
135
140
145
150
155
160
165
170
The Criterion of Best Fit• By using the regression equation,
we can calculate predicted values of Y. – Let’s say these predicted values
are Yl
• As you can see in the scatter diagram, these predicted values of Yl are not the same of Y.– The discrepancy with Y and Yl is
the error of prediction. • These discrepancies are
presented as vertical lines in the figure. – As you can see, the higher the r,
the lower the error is.
The Regression Equation
• The regression of Y on X (Standart-Score Formula)– zl
y = rzx • zl
y = the predicted standart-score value of Y• r = correlation coefficient• rzx = the standart-score value of X from which zl
y is predicted
• As you can see, – 1. zl
y = rzx when r=1
– 2. zly is 0 when zx is 0. That is mean of X overlaps with mean of
Yl
• To calculate the predicted values of Y from raw scores of X, we can use a simpler formula
The Regression Equation
• In this formula– Yl = the predicted raw
score in Y– Sx and Sy= the two
standart deviations– r= correlation coefficient
• Now, lets try to calculate Yl for the distribution presented in Table 1.
Note that this form of the formula is similar toY=aX+c
Error of PredictionThe Standard Error of Estimate
• The regression line is a kind of a mean of the score in the scatter plot
• The discrepancy from regression line is then a kind of a deviation form mean (line)
• So, we can use a similar formula to SD in order to calculate standard discrepancy
Error of PredictionThe Standard Error of Estimate
• A preferred formula for Syx is Syx = Sy
Limits of Error in Estimating Y from X
• Assume that actual scores of Y are normally distributed about predicted scores– So, we can use normal curve to calculate the limits of
error in prediction• In a Normal Distribuion – 68% of the Y scores fall within the limits Y(mean) +/- 1 SD– 95 % of the Y scores fall within the limits Y(mean) +/- 1.96
SD– 99 % of the Y scores fall within the limits Y(mean) +/- 2.58
SD
Limits of Error in Estimating Y from X
• Given that regression line is a kind of a mean and standard error is a kind of a SD
• We can see that – 68% of actual Y values fall within the limits
Y(mean) +/- 1 Syx – 95 % of actual Y values fall within the limits
Y(mean) +/- 1.96 Syx – 99 % of actual Y values fall within the limits
Y(mean) +/- 2.58 Syx