MATH30-6 Lecture 4.pptx

download MATH30-6 Lecture 4.pptx

of 32

Transcript of MATH30-6 Lecture 4.pptx

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    1/32

    MATH30-6

    Probability and Statistics

    Multivariate Analysis

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    2/32

    Objectives

    At the end of the lesson, the students are expected to

    Construct a scatter diagram;

    Use simple linear regression for building empirical

    models to engineering and scientific data; Understand how the method of least squares is used to

    estimate the parameters in a linear regression model;

    and

    Interpret the different values obtained.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    3/32

    Deterministic Relationship

    A model that predicts variable perfectly

    Example:

    The displacement (dt) of a particle at a certain time is

    related to its velocity. + where

    d0= displacement of the particle from the origin at time

    t= 0; andv= velocity.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    4/32

    Regression Analysis

    The collection of statistical tools that are used to model

    and explore relationships between variables that are

    related in a nondeterministic manner

    Used because there are many situations where therelationship between variables is not deterministic

    Examples:

    - The electrical energy consumption of a house (y) is

    related to the size of the house (x, in ft

    2

    ).- The fuel usage of an automobile (y) is related to the

    vehicle weight (x).

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    5/32

    Simple Linear Regression

    Single regressor variable or predictor variable x and a

    dependent or response variable Y

    The expected value of Y for each value of x is

    | + , where the intercept and slopeare unknown regression coefficients. We assume Y can be described by the model + + (Equation 11-2), where is a

    random error with mean zero and (unknown) variance

    .

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    6/32

    Simple Linear Regression

    The random errors corresponding to different

    observations are also assumed to be uncorrelated

    random variables.

    Regression model may be thought as an empiricalmodel.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    7/32

    Method of Least Squares

    Suppose that we have n pairs of observations, , , , , , . See Fig. 11-3. The estimates of and should result in a line that is

    (in some sense) a bestfitto the data. German scientist Karl Gauss (1777-1855) proposed

    estimating the parameters and in Equation 11-2to minimize the sum of squares of the vertical

    deviationsin Fig. 11-3.

    This criterion for estimating the regression coefficients

    is called the method of least squares.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    8/32

    Method of Least Squares

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    9/32

    Method of Least Squares

    Using Equation 11-2 ( + + ), we may expressthe nobservations in the sample as + + , 1, 2, ,

    Equation (11-3)

    and the sum of the squares of the deviations of the

    observations from the true regression line is

    = =

    Equation (11-4)

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    10/32

    Method of Least Squares

    The least squares estimators of and , say and ,must satisfy

    , 2

    = 0

    ,

    2

    = 0

    Equations (11-5)

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    11/32

    Method of Least Squares

    Simplifying Equations (11-5)

    +

    =

    =

    =+

    =

    =

    Equations 11-6 (least squares normal equations)

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    12/32

    Least Squares Estimates

    Equation 11-7

    = = = = =

    Equation 11-8

    where 1 = and 1 = .

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    13/32

    Least Squares Estimates

    Notationally, it is occasionally convenient to give special

    symbols to the numerator and denominator of Equation

    11-8. Given data , , , , , , , let

    =

    = = Equation 11-10 (denominator) and

    =

    =

    = = Equation 11-11 (numerator)

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    14/32

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    15/32

    Fitted or Estimated Regression Line

    11.2/398 The grades of a class of 9 students on a midtermreport () and on the final examination () are asfollows:

    (a) Estimate the linear regression line.

    (b) Estimate the final examination grade of a student who

    received a grade of 85 on the midterm report.

    77 50 71 72 81 94 96 99 67

    82 66 78 34 47 85 99 99 68

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    16/32

    Fitted or Estimated Regression Line

    10-11/424 An article in the Journal of MonetaryEconomics assesses the relationship between percentage

    growth in wealth over a decade and a half of savings for

    baby boomers of age 40 to 55 with these peoplesincome

    quartiles. The article presents a table showing five income

    quartiles, and for each quartile there is a reported

    percentage growth in wealth. The data are as follows.

    Run a simple linear regression of these five pairs of

    numbers and estimate a linear relationship between

    income and percentage growth in wealth.

    Income quartile 1 2 3 4 5

    Wealth growth (%) 17.3 23.6 40.2 45.8 56.8

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    17/32

    Fitted or Estimated Regression Line

    10-12/424 A financial analyst at Goldman Sachs ran a

    regression analysis of monthly returns on a certain

    investment () versus returns for the same month on theStandard & Poors index (

    ). The regression results

    included 765.98and 934.49. Give the least-squares estimate of the regression slope parameter.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    18/32

    Correlation

    The degree of linear association between the two

    random variablesXand Y

    Indicated by the correlation coefficient

    is the population (true) correlation coefficient,estimated by r, the sample correlation coefficient or

    Pearson product-moment correlation coefficient

    can take on any value from 1, through 0, to 1.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    19/32

    Possible Interpretations of

    1. When is equal to zero, there is no correlation. Thatis, there is no linear relationship between the tworandom variables.

    2. When

    1, there is a perfect, positive, linear

    relationship between the two variables. That is,whenever one of the variables, or , increases, theother variable also increases; and whenever one ofthe variables decreases, the other one must alsodecrease.

    3. When 1 , there is a perfect negative linearrelationship betweenand . Whenor increases,the other variable decreases; and when onedecreases, the other one must increase.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    20/32

    Possible Interpretations of

    4. When the value of is between 0 and 1 in absolutevalue, it reflects the relative strength of the linear

    relationship between the two variables. For example,

    a correlation of 0.90 implies a relatively strong

    positive, relationship between the two variables. A

    correlation of 0.70 implies a weaker, negative (as

    indicated by the minus sign), linear relationship. A

    correlation

    0.30 implies a relatively weak

    (positive) linear relationship betweenand .

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    21/32

    Correlation

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    22/32

    Correlation

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    23/32

    Sample Correlation Coefficient

    The estimate of

    Also referred to as the Pearson product-moment

    correlation coefficient

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    24/32

    Sample Correlation Coefficient

    Interpretations of r

    1.00 perfect positive (negative) correlation

    0.91 - 0.99 very high positive (negative) correlation

    0.71 - 0.90 high positive (negative) correlation

    0.51 - 0.70 moderate positive (negative) correlation

    0.31 - 0.50 low positive (negative) correlation

    0.01 - 0.30 negligible positive (negative) correlation0.00 no correlation

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    25/32

    Coefficient of Determination

    Denoted by r2

    A descriptive measure of the strength of the regression

    relationship, a measure of how well the regression line

    fits the data

    Ordinarily, we do not use r2for inference about 2.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    26/32

    Coefficient of Determination11-13/400 A study of the amount of rainfall and the

    quantity of air pollution removed produced thefollowing data:

    Daily Rainfall, (0.01cm)

    Particulate Removed,(g/m3)

    4.3 1264.5 121

    5.9 116

    5.6 118

    6.1 114

    5.2 118

    3.8 132

    2.1 141

    7.5 108

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    27/32

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    28/32

    Coefficient of Determination

    11-43/436 With reference to Exercise 11.13 on page 400,assume a bivariate normal distribution for and .

    (a) Calculate .(b) Test the null hypothesis that

    0.5 against the

    alternative that

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    29/32

    Summary

    A scatter diagram displays observations on two

    variables,xand y. Each observation is represented by a

    point showing its x-y coordinates. The scatter diagram

    can be very effective in revealing the joint variability of

    xand yor the nature of relationship between them.

    The method of least squares is used to estimate the

    parameters of a system by minimizing the sum of the

    squares of the differences between the observed

    values and the fitted or predicted values from thesystem.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    30/32

    Summary

    Generally, correlation is a measure of the

    interdependence among data. The concept may

    include more than two variables. The term is most

    commonly used in a narrow sense to express the

    relationship between quantitative variables or ranks.

    The correlation coefficient (r) is a dimensionless

    measure of the linear association between two

    variables, usually lying in the interval from 1 to +1,

    with zero indicating the absence of correlation (but notnecessarily the independence of the two variables.)

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    31/32

    Summary

    The coefficient of determination (r2) is often used to

    judge the adequacy of a regression mode. Its value

    tells that the model accounts for r2% of the variability

    in the data.

  • 8/10/2019 MATH30-6 Lecture 4.pptx

    32/32

    References

    Aczel-Sounderpandian. Business Statistics, 7th Ed.

    2008

    Montgomery and Runger. Applied Statistics and

    Probability for Engineers, 5thEd. 2011

    Walpole, et al. Probability and Statistics for Engineers

    and Scientists 9thEd. 2012, 2007, 2002