Mediha Ass of Spss

download Mediha Ass of Spss

of 23

Transcript of Mediha Ass of Spss

  • 8/6/2019 Mediha Ass of Spss

    1/23

    SUBMITTED TO:

    MS.FARHAT IQBAL

    SUBMITTED BY:MEDIHA WAHEED

    MAJOR:

    BBA (SEM-8)

    DATE:

    12TH, MAY, 2011

    Data AnalysisCorrelation & Regression Tests

  • 8/6/2019 Mediha Ass of Spss

    2/23

    Data Analysis

    Correlation & Regression Tests Page 2

    QUESTION NO. 1The chief of police has given the department statistician the task of determining whether air

    temperature is related to the number of traffic accidents in the city each day. On each of 8

    randomly selected days the statistician records the maximum temperature and the number ofaccidents took place in the city. The results are:

    Maximum Temperature (C) 14 23 16 22 30 34 19 27

    No. of traffic accidents 4 6 3 6 8 11 5 7

    Find the correlation coefficient. Test the hypothesis that air temperature is related to the number

    of accidents.

    SOLUTION

    STEP 1

    Hypothesis formulation:

    Symbolically:

    HO: r = 0

    HA: r 0

    Theoretically:HO: Number of accidents is dependent on air temperature

    HA : Number of accidents is not dependent on air temperature

    STEP 2

    Determine the level of significance:

    Alpha () = 5% or 0.05

  • 8/6/2019 Mediha Ass of Spss

    3/23

    Data Analysis

    Correlation & Regression Tests Page 3

    STEP 3

    Correlations

    Max.temp No.of.traffic.accidents

    Max.temp Pearson Correlation 1 .962**

    Sig. (2-tailed) .000

    N 8 8

    No.of.traffic.accide

    nts

    Pearson Correlation .962**

    1

    Sig. (2-tailed) .000

    N 8 8

    **. Correlation is significant at the 0.01 level (2-tailed).

    STEP 4

    Decision making;

    As correlation coefficient( r ) is 0.962, it shows perfect positive relationship between X and Y. it

    means that there is a perfect positive relation relationship between air temperature and number of

    accidents. Numbers of accidents are dependent on air temperature and as air temperature

    increases or decreases, number of accidents also increases or decreases.

    QUESTION NO.2:

    Checkout operators to be employed by a super market chain are given a one week training

    period, and then given speed and accuracy test. Separate scores are recorded for the two aspects.

    For ten randomly chosen operators, the results were as follows (recorded on scale of 0 to 100)

    Speed score (X) 81 90 62 80 43 76 58 82 90 36

    Accuracy Score(Y) 27 38 37 60 65 52 82 47 58 18

  • 8/6/2019 Mediha Ass of Spss

    4/23

    Data Analysis

    Correlation & Regression Tests Page 4

    a) Draw a scatter diagram and decide which correlation test should be employed on this

    data.

    b) Calculate the value of correlation coefficient and interpret

    c) Test the value of r for significance

    SOLUTION

    PART A:

    As the following scatter plot shows a non linear relationship, we go for a spearman correlation

    ( r ) Coefficient test. The given data of scores is non directional.So,we opt for the two tail

    spearman correlation ( r )

  • 8/6/2019 Mediha Ass of Spss

    5/23

    Data Analysis

    Correlation & Regression Tests Page 5

    PART B:

    STEP 1

    Hypothesis formulation:

    Symbolically:

    HO: r = 0

    HA: r 0

    Theoretically:

    HO: The accuracy of scores is dependent upon the speed of the operators.

    HA : The accuracy of scores is not dependent upon the speed of the operators

    STEP 2

    Determine the level of significance:

    Alpha() = 5% or 0.05

    STEP 3

    Spss test of correlation and its result

    Correlations

    Speed. Score Accuracy. Score

    Spearman's

    rho

    Speed. Score Correlation

    Coefficient

    1.000 -.061

    Sig. (2-tailed) . .868

    N 10 10

    Accuracy.

    Score

    Correlation

    Coefficient

    -.061 1.000

    Sig. (2-tailed) .868 .

    N 10 10

  • 8/6/2019 Mediha Ass of Spss

    6/23

    Data Analysis

    Correlation & Regression Tests Page 6

    STEP 4

    As the value of correlation coefficient ( r ) = -0.061,its shows a week negative relationship

    between the two variables of speed and accuracy scores. It shows that the speed of the operators

    in a super chain is negatively related to the accuracy of keeping records. The accuracy of scores

    is not very much dependent upon the speed of operators.

    PART C:

    Value of r for significance;

    In order to check the value of r for significance we compare the given p value in the above

    table with the level of significance Alpha() = 5% or 0.05

    P value = 0.868

    Alpha () = 5% or 0.05

    P value > Alpha ()

    0.868 > 0.05

    As our p value is greater than the level of significance, we accept the null hypothesis (HO) and

    reject the alternative hypothesis (HA). It gives evidence against HA and in favor ofHO.

    This shows that our spearman Correlation coefficient ( r ) is not considered as significant, it is

    insignificant for the given variables of speed and accuracy scores.

    QUESTION NO. 3.

    The expenditure on child care facilities in the previous year by a random sample of 6 local

    council, and the number of children under age 5living in the electorates are shown below:

    Council 1 2 3 4 5 6

    Expenditures (000Rs.) 125 180 154 90 102 63

    Number of Children 1723 2510 1856 1525 1624 920

    a) Draw a scatter diagram of the data

    b) Find the least square regression line of expenditures on number of children

    c) Interpret the four tables of results

  • 8/6/2019 Mediha Ass of Spss

    7/23

    Data Analysis

    Correlation & Regression Tests Page 7

    d) Draw the line on the scatter diagram. Comment on whether you feel the line is good fit of

    the data.

    e) Using the estimated line, predict the expenditures of a local council that has 1250

    children under the age of 5.

    f) Confirm the significance of alpha?

    SOLUTION

    PART A

    Scatter diagram

  • 8/6/2019 Mediha Ass of Spss

    8/23

    Data Analysis

    Correlation & Regression Tests Page 8

    PART B

    STEP 1

    Hypothesis formulation:

    Symbolically:

    HO: = 0

    HA: 0

    Theoretically:

    HO: The Expenditure on child care facilities is dependent on the number of childrens.

    HA: The Expenditure on child care facilities is not dependent on the number of childrens

    STEP 2

    Determine the level of significance:

    Alpha() = 5% or 0.05

    STEP 3

    Compute the spss,regression test

    Variables Entered/RemovedbModel

    Variables

    Entered

    Variables

    Removed Method

    1 No.of.Childrena

    . Entera. All requested variables entered.

    b. Dependent Variable: Expenditures

  • 8/6/2019 Mediha Ass of Spss

    9/23

    Data Analysis

    Correlation & Regression Tests Page 9

    Model SummaryModel R R Square

    Adjusted R

    Square

    Std. Error of

    the Estimate

    1 .949a

    .900 .875 15.17804

    a. Predictors: (Constant), No.of.Children

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0%

    Confidence

    Interval for B

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound

    1 (Constant) -15.185 23.164 -.656 .548 -79.498 49.128

    No.of.Children .079 .013 .949 6.012 .004 .043 .116

    a. Dependent Variable: Expenditures

    ANOVAbModel

    Sum of

    Squares df

    Mean

    Square F Sig.

    1 Regression 8326.508 1 8326.508 36.144 .004a

    Residual 921.492 4 230.373

    Total 9248.000 5

    a. Predictors: (Constant), No.of.Children

    b. Dependent Variable: Expenditures

  • 8/6/2019 Mediha Ass of Spss

    10/23

    Data Analysis

    Correlation & Regression Tests Page 10

    Least square regression line of

    expenditures on numberof children

    Yi = + xi= (-15.185) + (0.079) xi

    S.E = (23.164) (0.013)

    t-value = (-0.656) (6.012)

    p-value = (0.548) (0.004)

    R = 0.949, R2

    = 0.900

    STEP 4

    Decision making

    As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null

    hypothesis(HO)

    and accept the alternative hypothesis(HA)

    .It gives evidence against HO

    and in

    favor ofHA.

    P-value = 0.004 < = 0.05

    It confirms the significance of HO, concluding that expenditure is significantly related to the

    number of children. We reject HO and also conclude thatBeta (), which is a partial slope

    coefficient is significant.

    PART CInterpretation of above results from the table:

  • 8/6/2019 Mediha Ass of Spss

    11/23

    Data Analysis

    Correlation & Regression Tests Page 11

    Interpretation of Alpha ():

    As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-

    axis.

    Here, it shows that when number of children is (x) = 0, the expenditure (y) is -15.185(000Rs.).It

    means that when there is no children, then the expenditure reduces to 15.185(000Rs.).

    Interpretation of Beta ()

    As beta () is the slope of the regression line, it means that one unit change in x will leads to

    0.097 units increase in Y. If there is an increase of one child, expenditure will increase by

    0.097(000Rs.).

    Interpretation of correlation coefficient (R)As our correlation coefficient (r) is 0.949, it shows a strong positive linear relationship between

    the number of children and expenditure. Expenditure is strongly related to the number of

    children. It shows that when number of children increases or decreases, the expenditure will also

    increase or decrease in a same manner.

    Interpretation of coefficient of determination (R2)

    As our coefficient of determination (R2) is 0.900 or 90%,it shows that the 90% variation in y is

    due to the x variable.90% of the variations in the expenditure is due to the number of children.

    Interpretation of Anova Table

    Total sum of square = regression sum of square + residual sum of square

    (Yi Y)2 (Yi Y)

    2 (Yi Y)

    2

    9248.00 = 8326.508 + 921.492

    Interpretation of Total Deviation in Y (Yi Y)2

    As total deviation is the difference between the best prediction and the actual value. Here the

    value of 9248.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the

    total deviation between the estimated and actual of expenditure is 9648(000Rs.).

    Interpretation of regression sum of square (Yi Y)2

  • 8/6/2019 Mediha Ass of Spss

    12/23

    Data Analysis

    Correlation & Regression Tests Page 12

    This shows that out of total deviation, 8326.508 =8327 units shows that the deviation explained

    by the estimated regression line of yon x .in the given data, 8327 units of deviation is explained

    by the estimated regression line of expenditures on number of children.

    Interpretation of residual sum of square (Yi Y)2

    This shows that out of total deviation, 921.492 =921 units shows that the deviation not

    explained by the estimated regression line of yon x .in the given data, 921 units of deviation is

    not explained by the estimated regression line of expenditures on number of children.

    PART D

    Determination of whether the estimated line is a good

    fit of the data.

  • 8/6/2019 Mediha Ass of Spss

    13/23

    Data Analysis

    Correlation & Regression Tests Page 13

    Interpretation:

    As the above scatter plot depicts that there is a linear relationship, the estimated line or the

    regression model is good fit of the given data regarding the expenditure and number of children.

    It is a good fit because the above calculated values and diagram depicts that coefficient of

    determination (R2

    ) is 0.900 or 90%,which is quite high to support the good fitted model, as data

    points are also closer to each other.

    PART E

    Using the estimated line, prediction of the

    Expenditures of a local council that has 1250 children

    Putting the value of x = 1250 in the estimated regression line to find out the estimated value of y.

    Yi = + xi= (-15.185) + (0.079) xi

    = (-15.185) + (0.079)(1250)

    = 83.565.

    This shows that when the number of children increases to 1250, the expenditure on the child

    facilities will increase to 83.565(000rs.).The predicted expenditure will be 83.565(000rs.), when

    number of children will increased to 1250.

    PART F

    Checking the significance of Alpha()

    STEP 1

    Hypothesis formulation:

    Symbolically:

    HO: = 0

    HA: 0

  • 8/6/2019 Mediha Ass of Spss

    14/23

    Data Analysis

    Correlation & Regression Tests Page 14

    Theoretically:

    HO: The Expenditure on child care facilities is dependent on the number of childrens.

    HA: The Expenditure on child care facilities is not dependent on the number of childrens

    STEP 2

    Determine the level of significance:

    Alpha() = 5% or 0.05

    STEP 3

    Checking the p value for alpha from the above tables

    and estimated regression model

    Yi = + xi= (-15.185) + (0.079) xi

    S.E = (23.164) (0.013)

    t-value = (-0.656) (6.012)

    p-value = (0.548) (0.004)

    As, it is shown that

    t-value = -0.656

    p-value = 0.548

    STEP 4

    Decision making

    As our p-value is greater than the level of significance, alpha() = 5% or 0.05.we will accept the

    null hypothesis(HO) and reject the alternative hypothesis(HA).It gives evidence against HA and in

    favor ofHO.

    P-value = 0.548 > = 0.05

  • 8/6/2019 Mediha Ass of Spss

    15/23

    Data Analysis

    Correlation & Regression Tests Page 15

    It confirms the insignificance of HO, concluding that alpha ( ) which is the intercept of the

    simple linear regression is statistically insignificant.

    QUESTION NO. 4Consider the following data for the variables X and Y.

    X 1 2 3 4 5 6 7 8 9

    Y 1 4 9 16 25 49 64 81 93

    a) Plot the data on a scatter diagram.

    b) Find the least square regression line on Y and X.

    c) Draw the line on the diagram.d) Interpret the results

    e) Use the line found in (b) to predict the value of y for an X value of 6. Can you make

    the better Prediction without using this line? Why or Why not?

    f) Confirm the significance of alpha?

    SOLUTION

    PART A

  • 8/6/2019 Mediha Ass of Spss

    16/23

    Data Analysis

    Correlation & Regression Tests Page 16

    Scatter diagram

    PART B

    STEP 1

    Hypothesis formulation:

    Symbolically:

    HO: = 0

    HA: 0

    Theoretically:

  • 8/6/2019 Mediha Ass of Spss

    17/23

    Data Analysis

    Correlation & Regression Tests Page 17

    HO: The Y variable is dependent on the variable X

    HA: The Y variable is not dependent on the variable X

    STEP 2

    Determine the level of significance:

    Alpha() = 5% or 0.05

    STEP 3

    Compute the spss, regression test

    Variables Entered/RemovedbModel

    Variables

    Entered

    Variables

    Removed Method

    1 Xa

    . Enter

    a. All requested variables entered.

    b. Dependent Variable: Y

    ANOVAbModel

    Sum of

    Squares df

    Mean

    Square F Sig.

    1 Regression 9176.067 1 9176.067 124.982 .000a

    Residual 513.933 7 73.419

    Total 9690.000 8

    a. Predictors: (Constant), X

    Model SummaryModel R R Square

    Adjusted R

    Square

    Std. Error of

    the Estimate

    1 .973a .947 .939 8.56849

    a. Predictors: (Constant), X

  • 8/6/2019 Mediha Ass of Spss

    18/23

    Data Analysis

    Correlation & Regression Tests Page 18

    ANOVAbModel

    Sum of

    Squares df

    Mean

    Square F Sig.

    1 Regression 9176.067 1 9176.067 124.982 .000a

    Residual 513.933 7 73.419

    Total 9690.000 8

    a. Predictors: (Constant), X

    b. Dependent Variable: Y

    Least square regression line of Y on X

    Yi = + xi= (-23.833) + (12.367) xi

    S.E = (6.225) (1.106)

    t-value = (-3.829) (11.180)

    p-value = (0.006) (0.000)

    R = 0.973, R2

    = 0.947

    STEP 4

    Decision making

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound

    1 (Constant) -23.833 6.225 -3.829 .006 -38.553 -9.114

    X 12.367 1.106 .973 11.180 .000 9.751 14.982

    a. Dependent Variable: Y

  • 8/6/2019 Mediha Ass of Spss

    19/23

    Data Analysis

    Correlation & Regression Tests Page 19

    As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null

    hypothesis(HO) and accept the alternative hypothesis(HA).

    P-value = 0.000 < = 0.05

    It confirms the significance of HO,

    concluding that variable Y is significantly related to the

    variable X. We reject HO and also conclude thatBeta (), which is a partial slope coefficient is

    significant.

    PART C

    Determination of whether the estimated line is a good

    fit of the data.

  • 8/6/2019 Mediha Ass of Spss

    20/23

    Data Analysis

    Correlation & Regression Tests Page 20

    Interpretation:

    As the above scatter plot depicts that there is a linear relationship, the estimated line or the

    regression model is good fit of the given data regarding the Y and X variable. It is a good fit

    because the above calculated values and diagram depicts that coefficient of determination (R2 ) is

    0.947 or 95%,which is quite high to support the good fitted model, as data points are also closer

    to each other.

    PART D

    Interpretation of above results from the table:

    Interpretation of Alpha ():As alpha () is the intercept and it shows the point where x = 0 and regression line touches the y-

    axis.

    Here, it shows that when variable (x) = 0, then variable (y) is -23.833 units.It means that when

    there is no value of X or it becomes zero, then the variable (y) reduces to 23.833 units.

    Interpretation of Beta ()

    As beta () is the slope of the regression line, it means that one unit change in x will leads to

    12.367 units increase in Y. If there is an increase of X by one unit, expenditure then Y will

    increase by 12.367 units.

    Interpretation of correlation coefficient (R)

    As our correlation coefficient (r) is 0.973, it shows a strong positive linear relationship between

    the X and Y. Variable X is strongly related to the variable Y. It shows that when units of X

    increases or decreases, the unit of Y will also increase or decrease in a same manner.

    Interpretation of coefficient of determination (R2

    )As our coefficient of determination (R2 ) is 0.947 or 94.7%, = 95 % , it shows that the 95%

    variation in Y is due to the X variable.95% of the variations in Y is caused by X.

    Interpretation of Anova Table

  • 8/6/2019 Mediha Ass of Spss

    21/23

    Data Analysis

    Correlation & Regression Tests Page 21

    Total sum of square = regression sum of square + residual sum of square

    (Yi Y)2 (Yi Y)

    2 (Yi Y)

    2

    9690.00 = 9176.067 + 513.933

    Interpretation of Total Deviation in Y (Yi Y)2As total deviation is the difference between the best prediction and the actual value. Here the

    value of 9690.00 represents the Total Deviation in Y (Yi Y)2.The above table shows that the

    total deviation between the estimated and actual of expenditure is 9690units.

    Interpretation of regression sum of square (Yi Y)2

    This shows that out of total deviation, 9176.067 =9176 units shows that the deviation explained

    by the estimated regression line of y

    on x .in the given data, 8327 units of deviation is explainedby the estimated regression line of variable Y on variable X.

    Interpretation of residual sum of square (Yi Y)2

    This shows that out of total deviation, 513.933 =514 units shows that the deviation not

    explained by the estimated regression line of yon x .in the given data, 514 units of deviation is

    not explained by the estimated regression line of variable Y on variable X.

    PART E

    Using the estimated line, prediction of the estimated

    value of Y for an X value of 6.

    Putting the value of x = 6 in the estimated regression line to find out the estimated value of y.

    Yi = + xi= (-23.833) + (12.367) xi

    == (-23.833) + (12.367) (6)

    = 50.369

    This shows that when the X units increases to 6, the number of Y units will increase to 50units,

    When the value of X is 6, the predicted or the estimated value of Y is 50.

  • 8/6/2019 Mediha Ass of Spss

    22/23

    Data Analysis

    Correlation & Regression Tests Page 22

    Prediction without using the estimated regression line.

    Yes, we can do better prediction without using this line, as in this case average Y value would be

    used as the predicted value.

    PART F

    Checking the significance of Alpha()

    STEP 1

    Hypothesis formulation:

    Symbolically:HO: = 0

    HA: 0

    Theoretically:

    HO: The Y variable is dependent on the variable X

    HA: The Y variable is not dependent on the variable X

    STEP 2

    Determine the level of significance:

    Alpha() = 5% or 0.05

    STEP 3

    Checking the p value for alpha from the above tables

    and estimated regression model

    Yi = + xi= (-23.833) + (12.367) xi

    S.E = (6.225) (1.106)

    t-value = (-3.829) (11.180)

  • 8/6/2019 Mediha Ass of Spss

    23/23

    Data Analysis

    Correlation & Regression Tests Page 23

    p-value = (0.006) (0.000)

    As, it is shown that

    t-value = -3.829

    p-value = 0.006

    STEP 4

    Decision making

    As our p-value is less than the level of significance, alpha() = 5% or 0.05.we will reject the null

    hypothesis(HO) and accept the alternative hypothesis(HA).It gives evidence against HO and in

    favor ofHA.

    P-value = 0.006 < = 0.05

    It confirms the significance of HO, concluding that alpha ( ) which is the intercept of the simple

    linear regression is statistically significant.