MARE 250 Dr. Jason Turner Correlation & Linear Regression.

Post on 22-Dec-2015

218 views 1 download

Tags:

Transcript of MARE 250 Dr. Jason Turner Correlation & Linear Regression.

MARE 250Dr. Jason Turner

Correlation & Linear Regression

Means Tests Vs. AssociationsMeans tests – t-test, ANOVA – test for differences between/among means (Responses among/between factors)

Associations – tests for relationships between/among variables (responses)

Linear RegressionLinear regression investigates and models the linear relationship between a response (Y) and predictor(s) (X)Both the response and predictors are continuous variables (“Responses”)

Linear regression analysis is used to: - determine how the response variable changes as

a particular predictor variable changes

- predict the value of the response variable for any value of the predictor variable

Regression vs. CorrelationLinear regression investigates and models the linear relationship between a response (Y) and predictor(s) (X)Both the response and predictors are continuous variables (“Responses”)

Correlation coefficient (Pearson) – measures the extent of a linear relationship between two continuous variables (“Responses”)

When Regression vs. Correlation?Linear regression - used to predict relationships, extrapolate data, quantify change in one versus other is weighted direction

Correlation coefficient (Pearson) – used to determine whether there is a relationship or not

IF Regression – then it matters which variable is the Response (Y) and which is the predictor (X)

Y – (Dependent variable) X – (Independent)X causes change in Y (Y outcome dependent upon X)Y Does Not cause change in X (X –Independent)

Linear RegressionRegression provides a line that "best" fits the data (from response & predictor)

The least-squares criterion (method used to draw this "best line“) requires that the best-fitting regression line is the one with the smallest sum of the squared error terms (the distance of the points from the line).

Linear Regression

The R2 and adjusted R2 values represent the proportion of variation in the response data explained by the predictors

Adjusted R2 is a modified R2 that has been adjusted for the number of terms in the model. If you include unnecessary terms, R2 can be artificially high

y

Is This Them? Are These They?

y = b0 + b1xy = dependent variable

b0 + b1 = are constants

b0 = y intercept

b1 = slope

x = independent variable

Urchin density = b0 + b1(salinity)

Effects of OutliersOutliers may be influential observations

A data point whose removal causes the regression equation (line) to change considerably

Consider removal much like an outlier

If no explanation – up to researcher

Warning on Regression

Regression is based upon assumption that data points are scattered about a straight line

What can we do to determine if a Regression is warranted?

Correlation Coefficient (r)(Pearson) – measures the extent of a linear relationship between two continuous variables (responses)

Pearson correlation of cexa Ant and cexa post = 0.811P-Value = 0.000

IF p < 0.05 THEN the linear correlation between the two variables is significantly different than 0

IF p > 0.05 THEN you cannot assume a linear relationship between the two variables

Correlation Coefficient

Correlation Coefficient

“R2 D2 it is you, it is you”

Coefficient of Determination (R2) - Expression of the proportion of the total variability in the response (s) attributable to the dependence of all of the factors

R2 – used for assessing the “goodness of fit” of a regression model

Should use Adjusted R2 as it is a more conservative measure

R2 values range from 0 to 100%. An R2 of 100% means that all of the variability in the data can be explained by the model

Coefficient Relationships

The coefficient of determination (r2) is the square of the linear correlation coefficient (r)

Next Week

Regression Analysis: _ Urchins versus % Rock

The regression equation is_ Urchins = - 0.557 + 0.0361 % Rock

Predictor Coef SE Coef T PConstant -0.5569 0.3820 -1.46 0.146% Rock 0.036116 0.0062 5.80 0.000

S = 3.27363 R-Sq = 11.0% R-Sq(adj) = 10.6%