Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.

Applied Quantitative Analysis and Practices

LECTURE#25

By

Dr. Osman Sadiq Paracha

Previous Lecture Summary Inference about the slope t test for slope Inference about the slope (t test example) F test for significance Confidence Interval estimate for slope Confidence Interval concept and calculation

Inferences About the Slope: t Test

t test for a population slope Is there a linear relationship between X and Y?

Null and alternative hypotheses H0: β1 = 0 (no linear relationship) H1: β1 ≠ 0 (linear relationship does exist)

Test statistic

1b

11STAT S

βbt

2nd.f.

where:

b1 = regression slope coefficient

β1 = hypothesized slope

Sb1 = standard error of the slope

Inferences About the Slope: t Test Example

House Price in $1000s

(y)

Square Feet (x)

245 1400

312 1600

279 1700

308 1875

199 1100

219 1550

405 2350

324 2450

319 1425

255 1700

(sq.ft.) 0.1098 98.25 price house

Estimated Regression Equation:

The slope of this model is 0.1098

Is there a relationship between the square footage of the house and its sales price?


H0: β1 = 0

H1: β1 ≠ 0

Coefficients Standard Error t Stat P-value

Intercept 98.24833 58.03348 1.69296 0.12892

Square Feet 0.10977 0.03297 3.32938 0.01039

1bSb1

329383032970

0109770

S

βbt

1b

11STAT

..

.


Test Statistic: tSTAT = 3.329

There is sufficient evidence that square footage affects house price

Decision: Reject H0

Reject H0Reject H0

a/2=.025

-tα/2

Do not reject H0

0 tα/2

a/2=.025

-2.3060 2.3060 3.329

d.f. = 10- 2 = 8

H0: β1 = 0

H1: β1 ≠ 0


H0: β1 = 0

H1: β1 ≠ 0 Coefficients Standard Error t Stat P-value

Intercept 98.24833 58.03348 1.69296 0.12892

Square Feet 0.10977 0.03297 3.32938 0.01039

p-value

There is sufficient evidence that square footage affects house price.

Decision: Reject H0, since p-value < α

F Test for Significance

F Test statistic:

where

MSE

MSRFSTAT

1kn

SSEMSE

k

SSRMSR

where FSTAT follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom

(k = the number of independent variables in the regression model)

F-Test for Significance

Regression Statistics

Multiple R 0.76211

R Square 0.58082

Adjusted R Square 0.52842

Standard Error 41.33032

Observations 10

ANOVA df SS MS F Significance F

Regression 1 18934.9348 18934.9348 11.0848 0.01039

Residual 8 13665.5652 1708.1957

Total 9 32600.5000

11.08481708.1957

18934.9348

MSE

MSRFSTAT

With 1 and 8 degrees of freedom

p-value for the F-Test

H0: β1 = 0

H1: β1 ≠ 0

= .05

df1= 1 df2 = 8

Test Statistic:

Decision:

Conclusion:

Reject H0 at = 0.05

There is sufficient evidence that house size affects selling price0

= .05

F.05 = 5.32Reject H0Do not

reject H0

11.08FSTAT MSE

MSR

Critical Value:

F = 5.32

F Test for Significance(continued)

F

Confidence Interval Estimate for the Slope

Confidence Interval Estimate of the Slope:

Excel Printout for House Prices:

At 95% level of confidence, the confidence interval for the slope is (0.0337, 0.1858)

1b2/1 Sb αt

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

d.f. = n - 2

Since the units of the house price variable is $1000s, we are 95% confident that the average impact on sales price is between $33.74 and $185.80 per square foot of house size

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386

Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

This 95% confidence interval does not include 0.

Conclusion: There is a significant relationship between house price and square feet at the .05 level of significance

Confidence Interval Estimate for the Slope (continued)

Confidence Interval Example

Cereal fill example Population has µ = 368 and σ = 15. If you take a sample of size n = 25 you know

368 ± 1.96 * 15 / = (362.12, 373.88). 95% of the intervals formed in this manner will contain µ.

When you don’t know µ, you use X to estimate µ If X = 362.3 the interval is 362.3 ± 1.96 * 15 / = (356.42, 368.18) Since 356.42 ≤ µ ≤ 368.18 the interval based on this sample makes a

correct statement about µ.

But what about the intervals from other possible samples of size 25?

25

25

Confidence Interval Example(continued)

Sample # XLowerLimit

UpperLimit

Containµ?

1 362.30 356.42 368.18 Yes

2 369.50 363.62 375.38 Yes

3 360.00 354.12 365.88 No

4 362.12 356.24 368.00 Yes

5 373.88 368.00 379.76 Yes

Confidence Interval Example

In practice you only take one sample of size n In practice you do not know µ so you do not

know if the interval actually contains µ However you do know that 95% of the intervals

formed in this manner will contain µ Thus, based on the one sample, you actually

selected you can be 95% confident your interval will contain µ (this is a 95% confidence interval)

(continued)

Note: 95% confidence is based on the fact that we used Z = 1.96.

Estimation Process

(mean, μ, is unknown)

Population

Random Sample

Mean X = 50

Sample

I am 95% confident that μ is between 40 & 60.

General Formula

The general formula for all confidence intervals is:

Point Estimate ± (Critical Value)(Standard Error)

Where:•Point Estimate is the sample statistic estimating the population parameter of interest

•Critical Value is a table value based on the sampling distribution of the point estimate and the desired confidence level

•Standard Error is the standard deviation of the point estimate

Confidence Intervals

Population Mean

σ Unknown

ConfidenceIntervals

PopulationProportion

σ Known

Confidence Interval for μ(σ Known)

Assumptions Population standard deviation σ is known Population is normally distributed If population is not normal, use large sample (n > 30)

Confidence interval estimate:

where is the point estimate

Zα/2 is the normal distribution critical value for a probability of /2 in each tail is the standard error

n

σ/2ZX α

X

nσ/

Finding the Critical Value, Zα/2

Consider a 95% confidence interval:

Zα/2 = -1.96 Zα/2 = 1.96

0.05 so 0.951 αα

0.0252

α

0.0252

α

Point EstimateLower Confidence Limit

UpperConfidence Limit

Z units:

X units: Point Estimate

0

1.96/2Z α

Common Levels of Confidence

Commonly used confidence levels are 90%, 95%, and 99%

Confidence Level

Confidence Coefficient,

Zα/2 value

1.28

1.645

1.96

2.33

2.58

3.08

3.27

0.80

0.90

0.95

0.98

0.99

0.998

0.999

80%

90%

95%

98%

99%

99.8%

99.9%

1

μμx

Intervals and Level of Confidence

Confidence Intervals

Intervals extend from

to

(1-)100%of intervals constructed contain μ;

()100% do not.

Sampling Distribution of the Mean

n

σ2/αZX

n

σ2/αZX

x

x1

x2

/2 /21

Example

A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.

Determine a 95% confidence interval for the true mean resistance of the population.

2.4068 1.9932

0.2068 2.20

)11(0.35/ 1.96 2.20

n

σ/2 ZX

μ

α

Example

A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms.

Solution:

(continued)

DCOVA

Interpretation

We are 95% confident that the true mean resistance is between 1.9932 and 2.4068 ohms

Although the true mean may or may not be in this interval, 95% of intervals formed in this manner will contain the true mean

DCOVA

t Test for a Correlation Coefficient

Hypotheses

H0: ρ = 0 (no correlation between X and Y)

H1: ρ ≠ 0 (correlation exists)

Test statistic

(with n – 2 degrees of freedom)

2n

r1

ρ-rt

2STAT

0 b if rr

0 b if rr

where

12

12

t-test For A Correlation Coefficient

Is there evidence of a linear relationship between square feet and house price at the .05 level of significance?

H0: ρ = 0 (No correlation)

H1: ρ ≠ 0 (correlation exists)

=.05 , df = 10 - 2 = 8

3.329

210

.7621

0.762

2n

r1

ρrt

22STAT

(continued)

t-test For A Correlation Coefficient

Conclusion:There is evidence of a linear association at the 5% level of significance

Decision:Reject H0

Reject H0Reject H0

a/2=.025

-tα/2

Do not reject H0

0 tα/2

a/2=.025

-2.3060 2.3060

3.329

d.f. = 10-2 = 8

3.329

210

.7621

0.762

2n

r1

ρrt

22STAT

(continued)

Pitfalls of Regression Analysis

Lacking an awareness of the assumptions underlying least-squares regression

Not knowing how to evaluate the assumptions Not knowing the alternatives to least-squares

regression if a particular assumption is violated Using a regression model without knowledge of

the subject matter Extrapolating outside the relevant range

Strategies for Avoiding the Pitfalls of Regression

Start with a scatter plot of X vs. Y to observe possible relationship

Perform residual analysis to check the assumptions Plot the residuals vs. X to check for violations of

assumptions such as homoscedasticity Use a histogram, stem-and-leaf display, boxplot,

or normal probability plot of the residuals to uncover possible non-normality

Strategies for Avoiding the Pitfalls of Regression

If there is violation of any assumption, use alternative methods or models

If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals

Avoid making predictions or forecasts outside the relevant range

(continued)

Lecture Summary Confidence Interval Concept Confidence Interval examples t test for correlation coefficient Application of t test in correlation coefficient Pitfalls of Regression Strategies to avoid pitfalls in regression SPSS Application in regression

Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.

Documents

Transcript of Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.