Class27_RegressionNCorrHypoTest

download Class27_RegressionNCorrHypoTest

of 64

Transcript of Class27_RegressionNCorrHypoTest

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    1/64

    SW318Social WorkStatisticsSlide 1

    Regression Analysis

    We have previously studied the Pearsons r

    correlation coefficient and the r2 coefficient of

    determination as measures of association for

    evaluating the relationship between an interval level

    independent variable and an interval level

    dependent variable.

    These statistics are components of a broader set of

    statistical techniques for evaluating the relationship

    between two interval level variables, called

    regression analysis (sometimes referred to in

    combination as correlation and regression analysis).

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    2/64

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    3/64

    SW318Social WorkStatisticsSlide 3

    Elements of Regression Analysis

    We will first review previous material on regressionand correlation:

    The scatterplot or scattergram

    The regression equation

    Then, we will examine the statistical evidence to

    determine whether or not, the relationships found in

    our sample data are applicable to the population

    represented by the sample using a hypothesis test.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    4/64

    SW318Social WorkStatisticsSlide 4

    Purpose of Regression Analysis

    The purpose of regression analysis is to answer thesame three questions that have been identified as

    requirements for understanding the relationships

    between variables:

    Is there a relationship between the twovariables?

    How strong is the relationship?

    What is the direction of the relationship?

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    5/64

    SW318Social WorkStatisticsSlide 5

    Scatterplots - 1

    The relationship between two interval variables can begraphed as a scatterplot or a scatter diagram which showsthe position of all of the cases in an x-y coordinate system.

    The independent variable is plotted on the x-axis, or thehorizontal axis.

    The dependent variable is plotted on the y-axis, or thevertical axis.

    A dot in the body of the chart represented theintersection of the data on the x-axis and the y-axis

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    6/64

    SW318Social WorkStatisticsSlide 6

    Scatterplots - 2

    The trendline or regression line is plotted on thechart in a contrasting color

    The overall pattern of the dots, or data points,succinctly summarizes the nature of the relationship

    between the two variables. The clarity of the pattern formed by the dots can be

    enhanced by drawing a straight line through thecluster such that the line touches every dot or comes

    as close to doing so as possible. This summarizing line is called the regression line.

    We will see later how this line is obtained, but fornow, we will look at how it helps us understand the

    scatterplot.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    7/64

    SW318Social WorkStatisticsSlide 7

    Scatterplots - 3

    The pattern of the points on the scatterplot gives usinformation about the relationship between the variables.The regression line, drawn in red, makes it easier for usto understand the scatterplot.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    8/64

    SW318Social WorkStatisticsSlide 8

    The Uses of Scatterplots

    Scatterplots give us information about our threequestions about the relationship between twointerval variables:

    Is there a relationship between the two variables?

    How strong is the relationship? What is the direction of the relationship?

    In addition, the regression line on the scatterplot can

    be used to estimate the value of the dependentvariable for any value of the independent variable.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    9/64

    SW318Social WorkStatisticsSlide 9

    Scatterplots: Evidence of a Relationship

    -4

    -3

    -2

    -1

    0

    1

    2

    -3 -2 -1 0 1 2 3

    SES scale score

    Workorientationscalescore

    .00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    70.00

    80.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude s core

    Compositeaptitudescore

    When there is norelationship between twovariables, the regressionline is parallel to thehorizontal axis.

    When there is a relationshipbetween two variables, theregression line lies at an angleto the horizontal axis, slopingeither upward or downward.

    The angle between the regression line and thehorizontal x-axis provides evidence of a relationship. If

    there is no relationship, the regression line will be

    parallel to the axis.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    10/64

    SW318Social WorkStatisticsSlide 10

    Scatterplots: Strength of a Relationship

    -4

    -3

    -2

    -1

    0

    1

    2

    -3 -2 -1 0 1 2 3

    SES scale score

    Workorientationscalescore

    .00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    70.00

    80.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude score

    Compositeaptitudescore

    In this scatterplot, thepoints are very spread outaround the regression line.The relationship is weak.

    The spread of the points around

    the regression line is narrow,indicating a stronger relationship.

    We should check the scale of thevertical axis to make sure thenarrow band is not the result of anexcessively large scale.

    The strength of a relationship is indicated by thenarrowness of the band of points spread around the

    regression line: the tighter the band, the stronger the

    relationship.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    11/64

    SW318Social WorkStatisticsSlide 11

    When the regression line slopes upward to the right,there is a positive, or direct, relationship between the

    variables. When the regression line slopes downward,

    the relationship is negative, or inverse.

    Scatterplots: Direction of Relationship

    .000

    10.000

    20.000

    30.000

    40.000

    50.000

    60.000

    70.000

    80.000

    -2.00 -1.00 .00 1.00 2.00 3.00

    Self concept scale score

    Ma

    thematicsaptitudescore

    .00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    70.00

    80.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude score

    C

    ompositeaptitudescore

    In this scatterplot, theregression line slopes upwardto the right, indicating apositive or direct relationship.The values of both variablesincrease and decrease at thesame time.

    In this scatterplot, theregression line slopesdonward to the right,indicating a negative orinverse relationship. Thevalues of the variables movein opposite directions.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    12/64

    SW318Social WorkStatisticsSlide 12

    Scatterplots: Predicting Scores

    .00

    10.00

    20.00

    30.00

    40.00

    50.00

    60.00

    70.00

    80.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude score

    Compositeaptitudesco

    re

    For any value of the independent variable on thehorizontal x-axis, the predicted value for the dependent

    variable will be the corresponding value on the vertical

    y-axis.

    For the value of theindependent variableon the horizontal axis,we draw a line upwardto the regression line,e.g. 52.

    We draw aperpendicular line

    from the value on thex-axis to theregression line.

    The estimate for the dependentvariable is obtained by drawinga line parallel to the x-axis fromthe regression line to thevertical y-axis and reading thevalue where this line crosses the

    y-axis, e.g. 50.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    13/64

    SW318Social WorkStatisticsSlide 13

    The Effect of Scaling on the Scatterplot

    The scale used for the verticaly-axis can change the appearanceof the scatterplot and alter ourinterpretation of the strength of therelationship. The three scatterplotson this slide all use the same data. 0

    10

    20

    30

    40

    50

    60

    70

    80

    0 20 40 60 80

    Vocabulary aptitude score

    Compositeaptitudescore

    .00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    140.00

    160.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude s core

    Compos

    iteaptitudescore

    In this plot, I doubled therange of the y-axis scale to 0to 160, drawing the pointscloser together, and makingthe relationship appearstronger.

    25.00

    30.00

    35.00

    40.0045.00

    50.00

    55.00

    60.00

    65.00

    70.00

    75.00

    .000 20.000 40.000 60.000 80.000

    Vocabulary aptitude score

    Composite

    aptitudescore

    In this plot, I have narrowedthe range of the y-axis scaleto 25 to 75, spreading thepoints, and making therelationship appear weaker.

    In the original plot, they-axis is scaled from 0

    to 80.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    14/64

    SW318Social WorkStatisticsSlide 14

    The Assumption of Linearity

    An underlying assumption of regression analysis isthat the relationship between the variables is linear,meaning that the points in the scatterplot must forma pattern that can be approximated with a straightline.

    While we could test the assumption of linearity witha test of statistical significance of the correlationcoefficient, we will make a visual assess tor

    scatterplots.

    If the scatterplot indicates that the points do notfollow a linear pattern, the techniques of linearcorrelation and regression should not be applied.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    15/64

    SW318Social WorkStatisticsSlide 15

    Examples of Linear Relationships

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0 20 40 60 80

    Male Life Expectancy

    FemaleLifeExpectancy

    These two scatterplots are for data on poverty of

    nations. The plots below show strong linear

    relationships. The points are evenly distributed on

    either side of the regression line.

    -50

    0

    50

    100

    150

    200

    0 20 40 60 80 100

    Female Life Expectancy

    InfantMo

    rtalityRate

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    16/64

    SW318Social WorkStatisticsSlide 16

    These scatterplots show a non-linear relationship. The points arenot evenly distributed on eitherside of the regression line. We willoften see a concentration of pointson one side of the regression line

    and an absence of points on theother side.

    Examples of Non-linear Relationships

    0

    5

    10

    15

    20

    25

    30

    0 50 100 150 200

    Death Rate

    BirthRate

    -10000

    -5000

    0

    5000

    10000

    15000

    20000

    25000

    30000

    35000

    40000

    0 20 40 60 80 100

    Male Life Expectancy

    FemaleLifeExpect

    ancy

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0 10000 20000 30000 40000

    Gross National Product

    MaleLifeExpectancy

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    17/64

    SW318Social WorkStatisticsSlide 17

    The Regression Equation

    The regression equation is the algebraic formula forthe regression line, which states the mathematicalrelationship between the independent and thedependent variable.

    We can use the regression line to estimate the valueof the dependent variable for any value of theindependent variable.

    The stronger the relationship between theindependent and dependent variables, the closerthese estimates will come to the actual score thateach case had on the dependent variable.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    18/64

    SW318Social WorkStatisticsSlide 18

    Components of the Regression Equation

    The regression equation has two components.

    The first component is a number called the y-intercept that defines where the line crosses thevertical y axis.

    The second component is called the slope of theline, and is a number that multiplies the value ofthe independent variable.

    These two elements are combined in the generalform for the regression equation:

    the estimated score on the dependent variable

    = the y-intercept + the slope the score on theindependent variable

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    19/64

    SW318Social WorkStatisticsSlide 19

    The Standard Form of the Regression Equation

    The standard form for the regression equation orformula is:

    Y = a + bX

    where

    Y is the estimated score for the dependentvariable

    X is the score for the independent variable

    b is the slope of the regression line, or themultiplier of X

    a is the intercept, or the point on the vertical axiswhere the regression line crosses the vertical y-

    axis

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    20/64

    SW318Social WorkStatisticsSlide 20

    Depicting the Regression Equation

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    4.5

    5.0

    0

    .0

    0

    .5

    1

    .0

    1

    .5

    2

    .0

    2

    .5

    3

    .0

    3

    .5

    4

    .0

    4

    .5

    5

    .0

    x

    y

    y = 1.0 + 0.5 x

    The y-intercept is thepoint on the vertical y-axis where theregression line crossesthe axis, i.e. 1.0.

    The slope is the multiplierof x. It is the amount ofchange in y for a changeof one unit in x.

    If x changes one unit from2.0 to 3.0, depicted by theblue arrow, y will changeby 0.5 units, from 2.0 to2.5 as depicted by the redarrow.

    The regression equation includes both the y-intercept and the slope of the line. The y-intercept is 1.0 and the slope is 0.5.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    21/64

    SW318Social WorkStatisticsSlide 21

    Deriving the Regression Equation

    In this plot, none of the points fall on theregression line.

    The difference between the actual value for

    the dependent variable and the predicted

    value for each point is shown by the red

    lines. This difference is called the residual,and represents the error between the actual

    and predicted values.

    The regression equation is computed to

    minimize the total amount of error in

    predicting values for the dependent variable.The method for deriving the equation is

    called the "method of least squares,"

    meaning that the regression line minimizes

    the sum of the squared residuals, or errors

    between actual and predicted values.

    y = 0.8 + 0.6 x

    0

    1

    2

    3

    4

    5

    0 1 2 3 4 5

    x

    y

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    22/64

    SW318Social WorkStatisticsSlide 22

    Interpreting the Regression Equation:the Intercept

    The intercept is the point on the vertical axis where

    the regression line crosses the axis. It is the

    predicted value for the dependent variable when the

    independent variable has a value of zero.

    This may or may not be useful information depending

    on the context of the problem.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    23/64

    SW318Social WorkStatisticsSlide 23

    Interpreting the Regression Equation:the Slope

    The slope is interpreted as the amount of change inthe predicted value of the dependent variableassociated with a one unit change in the value of theindependent variable.

    If the slope has a negative sign, the direction of therelationship is negative or inverse, meaning that thescores on the two variables move in oppositedirections.

    If the slope has a positive sign, the direction of therelationship is positive or direct, meaning that thescores on the two variables move in the same

    direction.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    24/64

    SW318Social WorkStatisticsSlide 24

    Interpreting the Regression Equation:when the Slope equals 0

    If there is no relationship between two variables, the

    slope of the regression line is zero and the regression

    line is parallel to the horizontal axis.

    A slope of zero means that the predicted value of

    the dependent variable will not change, no matter

    what value of the independent variable is used.

    If there is no relationship, using the regression

    equation to predict values of the dependent variable

    is no improvement over using the mean of the

    dependent variable.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    25/64

    SW318Social WorkStatisticsSlide 25

    Assumptions Required for Utilizinga Regression Equation

    The assumptions required for utilizing a regression

    equation are the same as the assumptions for the test

    of significance of a correlation coefficient.

    Both variables are interval level.

    Both variables are normally distributed.

    The relationship between the two variables is

    linear.

    The variance of the values of the dependentvariable is uniform for all values of the

    independent variable (equality of variance).

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    26/64

    SW318Social WorkStatisticsSlide 26

    Assumption of Normality

    Strictly speaking, the test requires that the two

    variables be bivariate normal, meaning that the

    combined distribution of the two variables is normal.

    It is usually assumed that the variables are bivariate

    normal if each variable is normally distributed, so

    this assumption is tested by checking the normality

    of each variable.

    Each variable will be considered normal if its

    skewness and kurtosis statistics fall between 1.0 and

    +1.0 or if the sample size is sufficiently large to

    apply the Central Limit theorem.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    27/64

    SW318Social WorkStatisticsSlide 27

    Assumption of Linearity

    Linearity means that thepattern of the points in ascatterplot form a band,like the pattern in the charton the right:

    -10000

    -5000

    0

    5000

    10000

    15000

    20000

    25000

    30000

    35000

    40000

    0 20 40 60 80 100

    Male Life Expectancy

    FemaleLifeExpe

    ctancy

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0 50 100 150 200

    Infant Mortality Rate

    FemaleLifeExpectancy

    When the pattern of thepoints follows a curve, likethe scatterplot on the right,the correlation coefficientwill not accuratelymeasure the relationship.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    28/64

    SW318Social WorkStatisticsSlide 28

    Test of Linearity

    The test of linearity is a diagnostic statistical test ofthe null hypothesis that the linear model is anappropriate fit for the data points. The desiredoutcome for this test is to fail to reject the nullhypothesis.

    If the probability for the test of statistic is less thanor equal to the level of significance for the problem,we reject the null hypothesis, concluding that thedata is not linear and the Regression Analysis is notappropriate for the relationship between the twovariables.

    If the probability for the test of linearity statistic isgreater than the level of significance for the problem,we fail to reject the null hypothesis and conclude

    that we satisfy the assumption of linearity.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    29/64

    SW318Social WorkStatisticsSlide 29

    Assumption of Homoscedasticity

    Homoscedasticity (equality of variances) means thatthe points are evenly dispersed on either side of theregression line for the linear relationship.

    -50

    0

    50

    100

    150

    200

    0 10 20 30 40 50 60

    Birth Rate

    InfantMortalityRate

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0 50 100 150 200

    Infant Mortality Rate

    FemaleLifeExpe

    ctancy

    In this scatterplot, the points extendabout the same distance above and

    below the regression line for most of thelength of the regression line.

    This scatterplot meets the assumption ofhomoscedasticity.

    In this scatterplot, the spread of thepoints around the regression line is

    narrower at the left end of the regressionline than at the right end of the regressionline.

    This funnel shape is typical of ascatterplot showing violations of theassumption of homoscedasticity.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    30/64

    SW318Social WorkStatisticsSlide 30

    Test of Homoscedasticity

    When we compared groups, we used the Levene testof population variances to test for the assumptionthat the group variances were equal.

    In order to use this test for the assumption ofhomoscedasity, we will convert the interval levelindependent variable into a dichotomous variablewith low scores in one group and high scores in theother group. We can then compare the variances ofthe two groups derived from the independentvariable.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    31/64

    SW318Social WorkStatisticsSlide 31

    Levene Test of Homogeneity of Variances

    The Levene test of equality of population variances

    tests whether or not the variances for the two groupsare equal. It is a test of the research hypothesis thatthe variance (dispersion) of the group with low scoresis different from the variance of the group with highscores. The null hypothesis that the variance(dispersion) of both groups are equal.

    If the probability of the test statistic is greater than0.05, we do not reject the null hypothesis andconclude that the variances are equal. This is the

    desired outcome. If the probability of the test statistic is less than or

    equal to 0.05, we conclude the variances aredifferent and the Regression Analysis is not anappropriate test for the relationship between the two

    variables.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    32/64

    SW318Social WorkStatisticsSlide 32

    The hypothesis test of r2

    The purpose of the hypothesis test of r2 is a test ofthe applicability of our findings to the populationrepresented by the sample.

    When we studied association between two intervalvariables, we stated that the Pearson r correlationcoefficient and its square, the coefficient ofdetermination measure the strength of therelationship between two interval variables. Whenthe correlation coefficient and coefficient ofdetermination are zero (0), there is no relationship.

    The hypothesis test of r2 is a test of whether or not

    r2 is larger than zero in the population.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    33/64

    SW318Social WorkStatisticsSlide 33

    The hypothesis test of r2

    The research hypothesis states that r2

    is larger thanzero. (a relationship exists)

    The null hypothesis states that r2 is equal to zero.(no relationship)

    Recall that we interpreted the coefficient ofdetermination r2 as the reduction in errorattributable to with the relationship between thevariables.

    The test statistic is an ANOVA F-test which testswhether or not the reduction in error associated withusing the regression equation is really greater thanzero.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    34/64

    SW318Social WorkStatisticsSlide 34

    How the regression ANOVA test works?

    We are interested inthe relationshipbetween family sizeand number ofcredit cards.

    We will use the sample data we used forcorrelation and regression to examine howthe hypothesis test for r2 works.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    35/64

    SW318Social WorkStatisticsSlide 35

    The scatter diagram or scatterplot

    The independent variable isplotted on the x or

    horizontal axis.

    The dependentvariable is plotted

    on the Y orvertical axis.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    36/64

    SW318Social WorkStatisticsSlide 36

    The mean as the best guess

    Without taking into accountthe independent variable, ourbest guess for the number ofcredit cards for any subject isthe mean, 7.0.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    37/64

    SW318Social WorkStatisticsSlide 37

    Errors using the mean as estimate

    Errors are measured by computing the differencebetween the mean and each Y value, squaringthe differences, and then summing them.

    When we compute the answer in SPSS, it will tellus that the total amount of error is 22.0.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    38/64

    SW318Social WorkStatisticsSlide 38

    The regression line

    The regression lineminimizes the error(the best fitting or least squares line)

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    39/64

    SW318Social WorkStatisticsSlide 39

    The equation for the regression line

    SPSS will give us the formula for the regression line in theform Y = a + bX, or for these variables:

    Number of Credit Cards = 2.871 + .971 x Family Size

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    40/64

    SW318Social WorkStatisticsSlide 40

    PRE reduction in error

    Error using mean only (total) 22.000

    Error using regression line 5.486

    Reduction in error associated with

    the regression

    16.514

    PRE measure (r2) 22.0-5.486 = .751

    22.0

    SPSS also tells us the amount of error using only the meanand using the regression line.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    41/64

    SW318Social WorkStatisticsSlide 41

    The ANOVA test for the regression

    The F statistic is calculated as the ratio of error reduced by regressionsdivided the error remaining.

    If the ratio were 1 and these two numbers were the same, we would not

    have reduced any error, there would be no relationship, and the p-valuewould not let us reject the null hypothesis.

    In this problem, the amount of error reduced by the regression is largerelative to the amount remaining, so the F statistic is large, the p-value(0.005) is smaller than the alpha level of significance, so we reject the

    null hypothesis.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    42/64

    SW318Social WorkStatisticsSlide 42

    Interpreting Pearsons r correlationcoefficient

    The square root of r2is Pearsonsr, the correlation coefficient.

    If we want to characterize thestrength of the relationship, wecompare the size of r to the

    interpretive guidelines formeasures of association.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    43/64

    SW318Social WorkStatisticsSlide 43

    Interpreting the direction of the relationship

    To interpret the direction of the relationship between thevariables, we look at the coefficient for the independentvariable.

    In this example, the coefficient of 0.971 is positive, so wewould interpret this relationship as:

    Families with more members had more credit cards.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    44/64

    SW318Social WorkStatisticsSlide 44

    Testing Assumptions in Homework Problems

    The process of testing assumptions can easily

    overwhelm the task of testing the significance of the

    relationship.

    Since our emphasis here is testing the hypothesis

    that the relationship is generalizable to the

    population represented by the sample data, we will

    assume that our data satisfies the assumptions

    without explicitly testing assumptions.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    45/64

    SW318Social WorkStatisticsSlide 45

    Homework Problem Questions

    The question in the homework problems requires us

    to look at three things:

    Does the hypothesis test support the existence of

    a relationship in the population?

    Is the strength of the relationship characterized

    correctly?

    Is the direction of the relationship between the

    variables correctly stated?

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    46/64

    SW318Social WorkStatisticsSlide 46

    Practice Problem 1

    This question asks you to use linear regression to examine the relationship

    between [marital] and [age]. Linear regression requires that the dependentvariable and the independent variables be interval. Ordinal variables may beincluded as interval variables if a caution is added to any true findings.

    The dependent variable [marital] is nominal level which does not satisfy therequirement for a dependent variable. The independent variable [age] isinterval level, satisfying the requirement for an independent variable.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    47/64

    SW318Social WorkStatisticsSlide 47

    Practice Problem - 2

    This question asks you to use linear regression to examine the relationshipbetween [fund] and [attend]. The level of measurement requirements formultiple regression are satisfied: [fund] is ordinal level, and [attend] is ordinallevel.A caution is added because ordinal level variables are included in theanalysis.

    Given the assumption that the distributional requirements for linear regressionare satisfied, you can conduct a linear regression using SPSS without examining

    distributional assumptions for the variables.

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    48/64

    SW318Social WorkStatisticsSlide 48

    Linear Regression Hypothesis Test in SPSS (1)

    You can conduct a linearregression using:

    Analyze > Regression >Linear

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    49/64

    SW318Social WorkStatisticsSlide 49

    Linear Regression Hypothesis Test in SPSS (2)

    Move the dependent variableto Dependent: and theindependent variable to

    Independent(s): boxes andthen click OK button.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    50/64

    SW318Social WorkStatisticsSlide 50

    Linear Regression Hypothesis Test in SPSS (3)

    Based on the ANOVA table for the linear regression (F(1, 604) = 70.579, p

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    51/64

    SW318Social WorkStatisticsSlide 51

    Linear Regression Hypothesis Test in SPSS (4)

    Given the significant F-test result, the correlation coefficient (R) can beinterpreted.

    The correlation coefficient for the relationship between the independent variableand the dependent variable was 0.323, which would be characterized as a weakrelationship using the rule of thumb that a correlation between 0.0 and 0.20 isvery weak; 0.20 to 0.40 is weak; 0.40 to 0.60 is moderate; 0.60 to 0.80 is

    strong; and greater than 0.80 is very strong.

    The relationship between the independent variables and the dependent variablewas incorrectly characterized as a moderate relationship. The relationship shouldhave been characterized as a weak relationship.

    The answer to the problem is false.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    52/64

    SW318Social WorkStatisticsSlide 52

    Practice Problem 3

    This question asks you to use linear regression to examine the relationship

    between [educ] and [age]. [educ] and [age] are interval level, satisfying thelevel of measurement requirements for regression.

    Given the assumption that the distributional requirements for linearregression are satisfied, you can conduct a linear regression using SPSSwithout examining distributional characteristics of variables.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    53/64

    SW318Social WorkStatisticsSlide 53

    Linear Regression Hypothesis Test in SPSS (5)

    You can conduct a linearregression using:

    Analyze > Regression >Linear

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    54/64

    SW318Social WorkStatisticsSlide 54

    Linear Regression Hypothesis Test in SPSS (6)

    Move the dependent variableto Dependent: and theindependent variable toIndependent(s): boxes andthen click OK button.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    55/64

    SW318Social WorkStatisticsSlide 55

    Linear Regression Hypothesis Test in SPSS (7)

    Based on the ANOVA table for the linear regression (F(1, 659) = 9.983, p=0.002),

    there was an relationship between the dependent variable "highest year of schoolcompleted" and the independent variable "age". Since the probability of the Fstatistic (p=0.002) was less than or equal to the level of significance (0.05), the nullhypothesis that correlation coefficient (R) was equal to 0 was rejected.

    The research hypothesis that there was a relationship between the variables wassupported.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    56/64

    SW318Social WorkStatisticsSlide 56

    Linear Regression Hypothesis Test in SPSS (8)

    Given the significant F-test result, thecorrelation coefficient (R) can beinterpreted.

    The correlation coefficient for therelationship between the independent

    variable and the dependent variable was0.122, which can be characterized as a veryweak relationship. .

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    57/64

    SW318Social WorkStatisticsSlide 57

    Linear Regression Hypothesis Test in SPSS (9)

    The b coefficient for the independent variable "age" was -.021,indicating an inverse relationship with the dependent variable.

    Higher numeric values for the independent variable "age" [age]are associated with lower numeric values for the dependentvariable "highest year of school completed" [educ].

    The statement in the problem that "survey respondents whowere older had completed more years of school" is incorrect.The direction of the relationship is stated incorrectly.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    58/64

    SW318Social WorkStatisticsSlide 58

    Practice Problem 4

    This question asks you to use linear regression to examine the relationship

    between [sei] and [age]. [sei] and [age] are interval level, satisfying thelevel of measurement requirements for regression.

    Given the assumption that the distributional requirements for linearregression are satisfied, you can conduct a linear regression using SPSSwithout examining distributional characteristics of variables.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    59/64

    SW318Social WorkStatisticsSlide 59

    Linear Regression Hypothesis Test in SPSS (10)

    You can conduct a linearregression using:

    Analyze > Regression >Linear

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    60/64

    SW318Social WorkStatisticsSlide 60

    Linear Regression Hypothesis Test in SPSS (11)

    Move the dependent variableto Dependent: and theindependent variable toIndependent(s): boxes andthen click OK button.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    61/64

    SW318Social WorkStatisticsSlide 61

    Linear Regression Hypothesis Test in SPSS (12)

    Based on the ANOVA table for the linear regression (F(1, 629) = .266, p=0.606),there was no relationship between the dependent variable "socioeconomic index" and

    the independent variable "age". Since the probability of the F statistic (p=0.606) wasgreater than the level of significance (0.05), the null hypothesis that correlationcoefficient (R) was equal to 0 was not rejected.

    The research hypothesis that there was a relationship between the variables was notsupported.

    SW318 Steps in solving Linear Regression Hypothesis

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    62/64

    SW318Social WorkStatisticsSlide 62

    Steps in solving Linear Regression HypothesisTest Problems - 1

    The following is a guide to the decision process for answeringhomework problems about Linear Regression Hypothesis Test

    problems:

    Are the dependent and

    independent variablesordinal or interval level?

    Incorrect

    application ofa statistic

    Yes

    No

    Make sure that the assumption that the

    distributional requirements for linearregression are satisfied is made. Otherwise,you have to check the assumption first.

    Our regression problemswill assume that theassumptions are met.

    SW318

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    63/64

    Social WorkStatisticsSlide 63

    Steps in solving Linear Regression HypothesisTest Problems - 2

    Conduct the linear regression analysis

    False

    Is the p-value in theANOVA table for the Fratio test

  • 7/31/2019 Class27_RegressionNCorrHypoTest

    64/64

    Social WorkStatisticsSlide 64

    Steps in solving Linear Regression HypothesisTest Problems - 3

    Is the direction of therelationship correctly stated?

    Yes

    NoFalse

    Are either of thevariables ordinal level?

    Yes

    No

    True with caution

    True