Goals in Statistic

149
Chapter Three Describing Data: Numerical Measures GOALS When you have completed this chapter, you will be able to: ONE Calculate the arithmetic mean, median, mode, weighted mean, and the geometric mean. TWO Explain the characteristics, uses, advantages, and disadvantages of each measure of location. THREE Identify the position of the arithmetic mean, median, and mode for both a symmetrical and a skewed distribution. Goal s 3- 1

Transcript of Goals in Statistic

Page 1: Goals in Statistic

Chapter Three Describing Data: Numerical Measures

GOALSWhen you have completed this chapter, you will be able to:ONECalculate the arithmetic mean, median, mode, weighted mean, and the geometric mean.TWO Explain the characteristics, uses, advantages, and disadvantages of each measure of location.THREEIdentify the position of the arithmetic mean, median, and mode for both a symmetrical and a skewed distribution.

Goals

3- 1

Page 2: Goals in Statistic

FOURCompute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data.

Describing Data: Numerical Measures

FIVEExplain the characteristics, uses, advantages, and disadvantages of each measure of dispersion.

SIXUnderstand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations.

Goals

Chapter Three 3- 2

Page 3: Goals in Statistic

Characteristics of the Mean

It is calculated by summing the values and dividing by the number of values.

It requires the interval scale.All values are used.It is unique.The sum of the deviations from the mean is 0.

The Arithmetic Mean is the most widely used measure of location and shows the central value of the data.

The major characteristics of the mean are: A verag e J oe

3- 3

Page 4: Goals in Statistic

Population Mean

N

X

where µ is the population meanN is the total number of observations.X is a particular value. indicates the operation of adding.

For ungrouped data, the

Population Mean is the sum of all the population values divided by the total number of population values:

3- 4

Page 5: Goals in Statistic

Example 1

500,484

000,73...000,56

N

X

Find the mean mileage for the cars.

A Parameter is a measurable characteristic of a population.

The Kiers family owns four cars. The following is the current mileage on each of the four cars.

56,000

23,000

42,000

73,000

3- 5

Page 6: Goals in Statistic

Sample Mean

n

XX

where n is the total number of values in the sample.

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

3- 6

Page 7: Goals in Statistic

Example 2

4.155

77

5

0.15...0.14

n

XX

A statistic is a measurable characteristic of a sample.

A sample of five executives received the following bonus last year ($000):

14.0, 15.0, 17.0, 16.0, 15.0

3- 7

Page 8: Goals in Statistic

Properties of the Arithmetic Mean

Every set of interval-level and ratio-level data has a mean.

All the values are included in computing the mean.

A set of data has a unique mean.

The mean is affected by unusually large or small data values.

The arithmetic mean is the only measure of location where the sum of the deviations of each value from the mean is zero.

Properties of the Arithmetic Mean 3- 8

Page 9: Goals in Statistic

Example 3

0)54()58()53()( XX

Consider the set of values: 3, 8, and 4. The mean is 5. Illustrating the fifth

property

3- 9

Page 10: Goals in Statistic

Weighted Mean

)21

)2211

...(

...(

n

nnw

www

XwXwXwX

The Weighted Mean of a set of numbers X1, X2, ..., Xn, with

corresponding weights w1, w2, ...,wn, is computed from the following

formula:

3- 10

Page 11: Goals in Statistic

Example 4

89.0$50

50.44$1515155

)15.1($15)90.0($15)75.0($15)50.0($5

wX

During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10.

Compute the weighted mean of the price of the drinks.

3- 11

Page 12: Goals in Statistic

The Median

There are as many values above the median as below it in the data array.

For an even set of values, the median will be the arithmetic average of the two middle numbers and is

found at the (n+1)/2 ranked observation.

The Median is the midpoint of the values after they have been ordered from the smallest to the largest.

3- 12

Page 13: Goals in Statistic

The ages for a sample of five college students are:21, 25, 19, 20, 22.

Arranging the data in ascending order gives:

19, 20, 21, 22, 25.

Thus the median is 21.

The median (continued)

3- 13

Page 14: Goals in Statistic

Example 5

Arranging the data in ascending order gives:

73, 75, 76, 80

Thus the median is 75.5.

The heights of four basketball players, in inches, are: 76, 73, 80, 75.

The median is found at the (n+1)/2 =

(4+1)/2 =2.5th data point.

3- 14

Page 15: Goals in Statistic

Properties of the Median

There is a unique median for each data set.It is not affected by extremely large or small values and is therefore a valuable measure of location when such values occur.It can be computed for ratio-level, interval-level, and ordinal-level data.It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.

Properties of the Median

3- 15

Page 16: Goals in Statistic

The Mode: Example 6

Example 6: The exam scores for ten nursing students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.

Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.

The Mode is another measure of location and represents the value of the observation that appears most frequently.

3- 16

Page 17: Goals in Statistic

Symmetric distribution: A distribution having the same shape on either side of the center

Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.

Can be negatively skewed, or positively skewed

The Relative Positions of the Mean, Median, and Mode

3- 17

Page 18: Goals in Statistic

The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution

Zero skewness Mean

=Median

=Mode

M o d e

M ed ia n

M ea n

3- 18

Page 19: Goals in Statistic

The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution

• Positively skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M o d e

M ed ia n

M ea n

3- 19

Page 20: Goals in Statistic

Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution

M o d eM ea n

M ed ia n

3- 20

Page 21: Goals in Statistic

Dispersion refers to the spread or variability in the data.

Measures of dispersion include the following: range, mean deviation, variance, and standard deviation.

Range = Largest value – Smallest value

Measures of Dispersion

0

5

10

15

20

25

30

0 2 4 6 8 10 12

3- 21

Page 22: Goals in Statistic

The following represents the scores of the 25 master of Nursing students during the first long examination.

65.0 60.0 59.0 81.0 73.048.0 95.0 63.0 92.0 53.078.0 56.0 79.0 95.0 78.080.0 89.0 79.0 97.0 69.065.0 77.0 80.0 83.0 75.0

Example 9

Highest score:97 Lowest score:48

Range = Highest value – lowest value= 97 - 48= 49

3- 22

Page 23: Goals in Statistic

Mean Deviation

The arithmetic mean of the

absolute values of the

deviations from the arithmetic

mean.

The main features of the mean deviation are:

All values are used in the calculation.

It is not unduly influenced by large or small values.

Mean Deviation

M D = X - X

n

3- 23

Page 24: Goals in Statistic

The weights of a sample of crates containing books for the bookstore (in pounds ) are:

103, 97, 101, 106, 103Find the mean deviation.

X = 102

The mean deviation is:

4.25

541515

102103...102103

n

XXMD

Example 10

3- 24

Page 25: Goals in Statistic

Variance: the arithmetic mean of the squared

deviations from the mean.

Standard deviation: The square root of the variance.

Variance and standard Deviation

3- 25

Page 26: Goals in Statistic

Not influenced by extreme values.All values are used in the calculation.

The major characteristics of the

Population Variance are:

Population Variance

3- 26

Page 27: Goals in Statistic

Population Variance formula:

(X - )2

N =

X is the value of an observation in the population

m is the arithmetic mean of the population

N is the number of observations in the population

Population Standard Deviation formula:

2Variance and standard deviation

3- 27

Page 28: Goals in Statistic

Sample variance (s2)

s 2 =(X - X ) 2

n -1

Sample standard deviation (s)

2ss

Sample variance and standard deviation

3- 28

Page 29: Goals in Statistic

40.75

37

n

XX

30.515

2.2115

4.76...4.77

1

2222

n

XXs

Example 11

The hourly wages earned by a sample of five students are:

$7, $5, $11, $8, $6.

Find the sample variance and standard deviation.

30.230.52 ss

3- 29

Page 30: Goals in Statistic

Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at

least:

where k is any constant greater than 1.

2

11

k

Chebyshev’s theorem

3- 30

Page 31: Goals in Statistic

Empirical Rule: For any symmetrical, bell-shaped distribution:

About 68% of the observations will lie within 1s of the mean

About 95% of the observations will lie within 2s of the mean

Virtually all the observations will be within 3s of the mean

Interpretation and Uses of the Standard Deviation

3- 31

Page 32: Goals in Statistic

Bell -Shaped Curve showing the relationship between and .

-m 3s -m s -1m s m +1m s +m s +3m s

68%

95%99.7%

Interpretation and Uses of the Standard Deviation

3- 32

Page 33: Goals in Statistic

The Mean of Grouped Data

n

fXX

The Mean of a sample of data organized in a frequency

distribution is computed by the following formula:

3- 33

Page 34: Goals in Statistic

From the example on hours spent in studying, we have

Hours studying Class Midpoint(X) Frequency(f) fX

30.2 – 35.1 32.65 1 32.65

25.2 – 30.1 27.65 3 82.95

20.2 – 25.1 22.65 7 158.55

15.2 – 20.1 17.65 11 194.15

10.2 – 15.1 12.65 8 101.2

∑fX=569.5

Therefore, the mean hours spent in studying is 569.5 / 30 = 18.98 hours

Page 35: Goals in Statistic

The Median of Grouped Data

)(2 if

CFn

LMedian

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the class width.

The Median of a sample of data organized in a frequency distribution is computed by:

3- 35

Page 36: Goals in Statistic

Finding the Median Class

To determine the median class for grouped data

Construct a cumulative frequency distribution.

Divide the total number of data values by 2.

Determine which class will contain this value. For example, if n=30, 30/2 = 15, then determine which class will contain the 15th value.

3- 36

Page 37: Goals in Statistic

From the example on hours spent in studying

Hours Studying

Lower Limit

f Cumulative Frequency

30.2 – 35.1 30.2 1 30

25.2 – 30.1 25.2 3 29

20.2 – 25.1 20.2 7 26

15.2 – 20.1 15.2 11 19

10.2 – 15.1 10.2 8 8

Page 38: Goals in Statistic

n/2 = 30/2 = 15. The second class (15.2 – 20.1) having a cumulative frequency of 19 is the median class. The lower limit is 15.2. The cumulative frequency (CF) that precedes the median class is 8. The frequency (f) of the median class is 11.The class width, i = 5.

Therefore, the median is:Median = 15.2 + (15-8)(5) = 18.4 11

Page 39: Goals in Statistic

The Mode of Grouped Data

The Mode for grouped data is approximated by the formula

3- 39

))((21

1 idd

dLMode

21

1

dd

dLbMo (i)

whereL = lower limit of the modal class intervald1 = diff. between the freq. of the modal CI

and the next class lower in valued2 = diff. between the freq. of the modal CI

and the next class higher in value i = class width

Page 40: Goals in Statistic

From the example on the hours spent in studying, we have

• the modal class is 15.2 – 20.1 with the highest frequency of 11(L = 15.2)

• d1 =11-8 = 3• d2 = 11-7 = 4• i = 5• Therefore, the mode is

Mode = 15.2 +[3/(3+4)](5) = 17.3Since the Mean = 18.98 > median = 18.4> mode = 17.3,

therefore the distribution of the number of hours spent in studying is positively skewed.

Page 41: Goals in Statistic

Variance for Grouped Data

222

)1(

nn

fxfxns

wheren = sample sizef = frequencyx = class marks

Page 42: Goals in Statistic

From the example on hours spent in studying, we have

Hours studying X f fX fX2

30.2 – 35.1 32.65 1 32.65 1066.02

25.2 – 30.1 27.65 3 82.95 2293.57

20.2 – 25.1 22.65 7 158.55 3591.16

15.2 – 20.1 17.65 11 194.15 3426.75

10.2 – 15.1 12.65 8 101.2 1280.18

∑fX=569.5 ∑fX2=11657.68

Page 43: Goals in Statistic

19.29870

25.25399

)130(30

)5.569()65.11657(30 22

s

And the standard deviation is

S = (29.19)1/2 = 5.4

Therefore, the variance is

Page 44: Goals in Statistic

Chapter FourOne-Sample Tests of Hypothesis

GOALSWhen you have completed this chapter, you will be able to:

ONEDefine a hypothesis and hypothesis testing.TWO Describe the five step hypothesis testing procedure.THREEDistinguish between a one-tailed and a two-tailed test of hypothesis.FOURConduct a test of hypothesis about a population mean.

Page 45: Goals in Statistic

Chapter Ten continued

GOALSWhen you have completed this chapter, you will be able to:

FIVE Conduct a test of hypothesis about a population proportion.SIXDefine Type I and Type II errors.

One-Sample Tests of Hypothesis

Page 46: Goals in Statistic

What is a Hypothesis?

What is a Hypothesis?

A HYPOTHESIS is defined by Webster as “a tentative theory or supposition provisionally adopted to explain certain facts and to guide in the investigation of others”.

A statistical hypothesis is an assertion or statement that may or may not be true concerning one or more population.

Page 47: Goals in Statistic

Example of Statistical Hypotheses

• A leading drug in the treatment of hypertension has an advertised therapeutic success rate of 84%. A medical researcher believes he has found a new drug for treating hypertensive patients that has higher therapeutic success rate than the leading drug but with fewer side effects. He should assume that it is no better than the leading drug and then set out to reject this contention. The two statements: the new drug is no better than the old one (p = 0.84) and the new drug is better than the old one (p > 0.84) are examples of statistical hypothesis.

Page 48: Goals in Statistic

What is Hypothesis Testing?

Hypothesis testing

Based on sample evidence and

probability theory

Used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable

and should be rejected

Page 49: Goals in Statistic

Types of Hypothesis1. Null Hypothesis (Ho) - the hypothesis that two or more variables are not related or that two or more statistics (e.g. means for two different groups) are not significantly different. It is a negation of the theory that the researcher would like to derive. In the above example, the statement ‘the new drug is no better than the old one’ is an example of a null hypothesis. It is usually constructed to enable the researcher to evaluate his own theory or the research hypothesis. In other words, the null hypothesis is stated with the sole purpose of rejecting it, thereby accepting the research hypothesis. Equality symbol (=) is commonly used in stating the null hypothesis.

Page 50: Goals in Statistic

Types of Hypothesis2. Alternative Hypothesis (H1) – the hypothesis derived from the theory of the investigator and generally state a specified relationship between two or more variables or that two or more statistics significantly differ. In other words, it is the operational statement of the investigator’s research hypothesis. The second statement ‘the new drug is better than the old one’ (above example) is an example of alternative hypothesis. The symbols commonly used are >, < and .

Page 51: Goals in Statistic

Two ways of stating the alternative hypothesis: Predictive H1 (One-tailed or directional) – specifies the type of relationship existing between two or more variables (e.g. direct or inverse relationship) or specifies the direction of the difference between two or more statistics (e.g. 1> 2 or 1 < 2).Non-predictive H1 (Two-tailed or non-directional )– does not specify the type of relationship or the direction of the difference ( e.g. 1 2)

Page 52: Goals in Statistic

One-Tailed and Two-Tailed TestsA test of any hypothesis where the alternative hypothesis is predictive such asHo: 1 = 2

H1 : 1 > 2 or 1 < 2

is called a one-tailed test. The rejection region for the alternative hypothesis 1 > 2 lies entirely in the right tail of the distribution, while the rejection region for the alternative hypothesis 1 < 2 lies entirely in the left tail.

Page 53: Goals in Statistic

A test of any hypothesis where the alternative hypothesis is non-predictive such asHo: 1 = 2

H1 : 1 2

is called a two-tailed test, since the rejection region is split into two equal parts placed in each tail of the distribution.

Page 54: Goals in Statistic

More Examples of Stating HypothesisExample1. Suppose you want to study the association between job satisfaction of the employees and the labor turnover in a certain private hospital.

Ho: There is no significant relationship between job satisfaction and the labor turnover in a certain private school or stated differently, job satisfaction does not significantly affect labor turnover

H1: There is an inverse relationship between job satisfaction and labor turnover, more specifically, when job satisfaction decreases, labor turnover increases.(predictive)

Page 55: Goals in Statistic

More Examples . . .Example 2. A researcher is conducting a study to determine if suicide incidence among teenagers can be attributed to drug use.

Ho: There is no significant difference between the suicide rates of teenagers who use drugs and those who do not.

H1: Suicidal rates of teenagers who use drugs are significantly higher than the suicidal rates of non-users.(predictive)

H1 : There is a significant difference between the suicide rates of teenagers who use drugs and those who do not.(non-predictive)

Page 56: Goals in Statistic

More Examples . . .Example 3. A social researcher is conducting a study to determine if the level of women’s participation in community extension programs of the barangay can be affected by their educational attainment, occupation, income, civil status, and age.

Ho : The level of women’s participation in community extension programs is not affected by their educational attainment, occupation, income, civil status and age.

H1 : The level of women’s participation in community extension programs is affected by their educational attainment, occupation, income, civil status and age.

Page 57: Goals in Statistic

Hypothesis Testing

D o n o t re jec t n u ll R e jec t n u ll an d accep t a lte rn a te

S tep 5 : Take a sam p le , a rrive a t a d ec is ion

S tep 4 : F orm u la te a d ec is ion ru le

S tep 3 : Id en tify th e tes t s ta tis t ic

S tep 2 : S e lec t a leve l o f s ig n ifican ce

S tep 1 : S ta te n u ll an d a lte rn a te h yp o th eses

Page 58: Goals in Statistic

Three possibilities

regarding means

H0: m = 0H1: m = 0

H0: m = 0H1: m > 0

H0: m = 0H1: m < 0

Step One: State the null and alternate hypotheses

The null hypothesis

always contains equality.

3 hypotheses about means

Page 59: Goals in Statistic

Step Two: Select a Level of Significance.

The probability of rejecting the null hypothesis when it is actually true; the

level of risk in so doing.

Rejecting the null hypothesis when it is actually true ( ).a

Accepting the null hypothesis when it is actually false ( ).b

Level of Significance

Type I Error

Type II Error

Page 60: Goals in Statistic

Step Two: Select a Level of Significance.

Researcher Null Accepts RejectsHypothesis Ho Ho

Ho is true

Ho is false

Correctdecision

Type I error( )a

Type IIError

( )b

Correctdecision

Risk table

Page 61: Goals in Statistic

Step Three: Select the test statistic.

A value, determined from sample information, used to determine whether or not to reject the null hypothesis.

Examples: z, t, F, c2

Test statistic z Distribution as a test statistic

n/

X

z

The z value is based on the sampling distribution of X, which is normally distributed when the sample is reasonably large (recall Central Limit Theorem).

Page 62: Goals in Statistic

Step Four: Formulate the decision rule.Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.

0 1.65

D o not

re ject

[P robability = .95]

R egion of

re jection

[P robability= .05]

C ritica l va lue

Sampling DistributionOf the Statistic z, aRight-Tailed Test, .05Level of Significance

Page 63: Goals in Statistic

Reject the null hypothesis and accept the alternate hypothesis if

Computed -z < Critical -z

or

Computed z > Critical z

Decision Rule

Decision Rule

Page 64: Goals in Statistic

Using the p-Value in Hypothesis Testing

If the p-Value is larger than or equal to the significance level, a, H0 is not rejected.

p-ValueThe probability, assuming that the null hypothesis is true, of finding a value of the test statistic at least as extreme as the computed value for the test

Calculated from the probability distribution function or by computer

Decision Rule

If the p-Value is smaller than the significance level, a, H0 is rejected.

Page 65: Goals in Statistic

> .0 5 .1 0p

> .0 1 .0 5p

Interpreting p-values

SOME evidence Ho is not true

> .0 0 1 .0 1p

STRONG evidence Ho is not true VERY STRONG evidence Ho is not true

Page 66: Goals in Statistic

Step Five: Make a decision.

Movie

Page 67: Goals in Statistic

Test Concerning Means ( is known or n 30) – a sample mean is to be compared with the population mean.a.) Ho: = o b.) Ho: = o c.) Ho: = o

H1: < o H1: > o H1: o

Test Statistic:

Rejection Region:a.) Z < -Z b.) Z > -Z c.) Z < -Z Z > Z/2

nx

z o )(

Page 68: Goals in Statistic

An Example A private university hypothesized that the mean starting monthly salary of its

graduates is P8000 with a standard deviation of P1500. A sample of 25 employed graduates showed an average starting salary of P7500. Test this hypothesis at 5% level of significance. Solution:

Ho: = 8000H1: 8000Level of significance: /2 = 0.05/2 = 0.025 (two-tailed)

Rejection Region: Z 1.96 and Z -1.96 (refer to Table A.1 Appendix)

Page 69: Goals in Statistic

Computation of the Test Statistic: Given: = 7500, = 900, n = 25

Decision: Since the value of the test statistic (z = -1.67) is greater than the tabular value of –1.96, then do not reject Ho and conclude that the claim of the private university is true.

67.11500

25)80007500()(

nx

z o

Page 70: Goals in Statistic

Test Concerning Means ( is unknown and n < 30)a.) Ho: = o b.) Ho: = o c.) Ho: = o

H1: < o H1: > o H1: o

Test Statistic:

Rejection Region: a.) t < -t b.) .) t > -t c.) .) t < -t/2 and t > t/2

s

nxt o )(

Page 71: Goals in Statistic

An Example A perfume company claims that their best selling brand, Blue Ginger, has an average purity of 87.9%. Twenty bottles were examined to have an average purity of 85.6% with a standard deviation of 8.3%. Test at the 1% significance level whether the perfume company is telling the truth.Solution:HO : The average purity of blue ginger is equal to 87.9% ( = 0.879).H1 : The average purity of blue ginger is less than 87.9%, or ( < 0.879).Significance Level : 1%Rejection Region: t - 2.539 (with = 0.01 and degrees of freedom of 19, i.e.20 – 1 = 19

Page 72: Goals in Statistic

Computation of the Test Statistic: t – test

Decision: Since –1.24 is outside the rejection region, then we accept HO: at the 1% significance level. Therefore, the average purity of blue ginger is equal or greater than 87.9%. Therefore, the perfume company is telling the truth.

856.0x 879.0O 083.0s 20n

n

sx

t O

20

083.0879.0856.0

24.1

Page 73: Goals in Statistic

Test Concerning Proportion Tests of hypotheses concerning proportions are required in many areas. A drug manufacturer is certainly interested in knowing what fraction of the patients recover after taking the medicine. In business, manufacturing firms are concerned about the proportion of defective when shipment is made. A social researcher might be interested in determining the proportion of people favoring the legalization of abortion in the Philippines.

Page 74: Goals in Statistic

Steps in testing a proportion1. Ho: p = po

2. H1: p < po , p > po or p po

Choose a level of significance of size .Establish the rejection regionComputation: where p= sample proportion

p0 = population proportionn = sample size

Make a decision that is, reject Ho if z falls in the rejection region, otherwise do not reject Ho.

n

pp

ppz

OO

O

)1(

Page 75: Goals in Statistic

An Example A television station intends to stop airing a “telenovela” show, “Maganda ang Buhay”, if less than 35% of its intended televiewers watch the said program. A random sample of 1500 households yields an average viewing percentage of 32.9%. Should the television company pull out the airing of the show? Use the 5% significance level.

Page 76: Goals in Statistic

Solution:HO: The average viewing percentage of the telenovela “Maganda ang Buhay” is equal 35% (p = .35). H1: The average viewing percentage of the telenovela “Maganda ang Buhay” is less than 35% (p <.35) Significance Level: 5% Rejection Region:

zz 65.1z

Page 77: Goals in Statistic

Computation of Test Statistics: z – test

Decision: Since –1.71 is within the rejection region, then we reject HO at the 5% significance level. Therefore, the average viewing percentage of the telenovela “Maganda ang Buhay” is less than 35% and the television company should pull out the airing of the show.

1500n 329.0p 35.0Op

n

pp

ppz

OO

O

)1(

1500

)35.01(35.0

35.0329.0

71.1

Page 78: Goals in Statistic

Exercises1. A sample of size 60 has a mean of 12.8 and a standard deviation of 2.5. Test the

hypothesis that the population mean is 12 using a.) a one-tailed test at 0.01 level and b.) a two-tailed test at 0.05 level.

2. Suppose it is known that the mean annual income of assembly line workers in a certain plant is P100,000 with a standard deviation of P7000. You suspect that workers with active union interests have higher than the average incomes and take a random sample of 85 of these active members, obtaining a mean of P105,000. Can you say that active union members have significantly higher incomes? Use 5% level of significance.

Page 79: Goals in Statistic

3. Last year the employees of the city sanitation department donated an average of 250 pesos to the volunteer rescue squad. Test the hypothesis at 0.01 level that the average contribution this year is still 250 if a random sample of 15 employees showed an average of 265 pesos with a standard deviation of 15 pesos. Assume that the donations are approximately normally distributed.4. Suppose that in the past 40% of all adults favored capital punishment. Do we have reason to believe that the proportion of adults favoring capital punishment today has increased if, in a random sample of 200 adults 120 favor capital punishment? Use a 0.05 level of significance.

Page 80: Goals in Statistic

5. The average height of males in the freshmen class of a certain university has been 65.8 inches, with a standard deviation of 3.2 inches. Is there reason to believe that there has been an increase in the average height if a random sample of 40 males in the present freshmen class have an average height of 67.5 inches? Use a 1% level of significance. 6.A researcher knows that the average height of Filipino women is 1.525 meters. A random sample of 25 women was taken and was found to have a mean height of 1.572 meters, with a standard deviation of .12 meters. Is there reason to believe that the 25 women in the sample are significantly taller than the others at .05 level of significance?

Page 81: Goals in Statistic

7. A random sample of 100 recorded deaths in the Philippines during the past year showed an average life span of 71.8 years with a standard deviation of 8.9 years. Does this seem to indicate that the average life span today is greater than 70 years? Use a 0.05 level of significance.8. A mayoralty candidate in a certain city expects that approximately 60% of the voters will favor him in the coming election. To support his claim, he let a social researcher conduct a survey consisting of 100 randomly selected voters in the different barangays comprising the city. Results showed that 70 interviewed voters favor this particular mayoralty candidate. Is this sufficient evidence to conclude that the proportion of voters favoring this candidate in the coming election is higher than what he expected. Use .01 level of significance.

Page 82: Goals in Statistic

Chapter Five

Two-Sample Tests of HypothesisGOALS

When you have completed this chapter, you will be able to:

TWOConduct a test of hypothesis regarding the difference in two population proportions.

THREEConduct a test of hypothesis about the mean difference between paired or dependent observations.

ONEConduct a test of hypothesis about the difference between two independent population means.

Page 83: Goals in Statistic

Chapter Eleven continued

Two Sample Tests of Hypothesis

GOALSWhen you have completed this chapter, you will be able to:

FOURUnderstand the difference between dependent and independent samples.

Page 84: Goals in Statistic

Difference – of – Means Test ( 1 and 2 are known) – two sample means are being compared.

a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1

= 2

H1: 1 < 2 H1: 1 > 2 H1: 1 2

Test Statistic:

2

22

1

21

021 )(

nn

dxxz

Page 85: Goals in Statistic

Rejection Region:a.) Z < -Z b.) Z > Z c.) Z < -Z/2 or Z > Z/2

An Example A university investigation team conducted a study to determine whether car ownership of students affects their academic performance. A random sample of 50 students who are non-car owners showed a grade point average (GPA) of 85 with a standard deviation of 10.2 while a group of 60 car owners got an average grade of 80 with a standard deviation of 8.9. Do the data provide sufficient evidence to indicate that the non-car owners are better than car owners in terms of academic performance? Test using 5 % level of significance.

Page 86: Goals in Statistic

Solution:Let 1 = mean grade for non-car owners

2 = mean grade for car owners

Ho: 1 = 2

H1: 1 > 2 (one-tailed)

Level of significance: = 0.05Rejection Region: Z 1.645 ( refer to Table A.1 Appendix)

Computation of the Test Statistic:Given:

60,9.8,80

50,2.10,85

2222

111 1

nsx

nsx

Page 87: Goals in Statistic

The value of the z statistic will be

Decision: Since the value of the test statistic (z= 2.71) is greater than the tabular value of 1.645, then reject Ho. The data provide sufficient evidence to indicate non-car owners have better academic performance than car owners.

71.2

60

)9.8(

50

)2.10(

0)8085()(22

2

22

1

21

021

nn

dxxz

Page 88: Goals in Statistic

Difference – of – Means Test ( 1 = 2 = but unknown)

a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1 = 2

H1: 1 < 2 H1: 1 > 2 H1: 1 2

Test Statistic:

where

with degrees of freedom v = n1 + n2 -2

21

021

11

)(

nnSp

dxxt

2

)1()1(

21

222

211

nn

snsnSp

Page 89: Goals in Statistic

Rejection Region: a.) t < -t b.) t > t c.) t < -t/2

t > t/2

An Example A businessman is planning to put up either a computer store or a video store. He randomly selected 10 computer stores and 10 video stores. The average weekly incomes are P45,000 and P53,000 respectively with corresponding standard deviations of P17,500 and P15,000. At the 1% significance level, indicate whether he should be convinced to put up a computer store or a video store.

Page 90: Goals in Statistic

Solution:Let 1 = mean weekly income of a computer stores

2 = mean weekly income of video stores

Ho: 1 = 2

H1: 1 ≠ 2 (two-tailed)

Significance Level: 1%Rejection Region: and and t> 2.878

18,005.tt 18,005.tt

878.2t

Page 91: Goals in Statistic

Computation of Test Statistics: t – test

2121

222

211

21

112

)1()1(nnnn

snsn

xxt

1021 nn

000,451 Px 000,532 Px 500,171 P 000,152 P

1.1

10

1

10

1

21010

)000,15)(110()500,17)(110(

000,53000,4522

t

Page 92: Goals in Statistic

Decision: Since –1.1 is not within the rejection region, then we accept HO : at the 1% significance level. Therefore, there is no significant difference whether the businessman puts up either a computer store or video store. Therefore, opening a computer store is as profitable as opening a video store.

Page 93: Goals in Statistic

Testing the Difference of Two Proportions

a) Ho: P1 = P 2 b.) Ho: P1 = P2 c.) Ho: P1 = P2

H1: P1 < P2 H1: P1 > P2 H1: P1 P2

Test Statistic:

where Rejection Region:

a.) Z < -Z b.) Z > Z c.) Z < -Z/2 and Z > Z/2

21

21

11)1(

nnpp

ppz

2

22

1

11 ,

n

xp

n

xp

21

2211

nn

pnpnp

Page 94: Goals in Statistic

An Example

Consider random samples of 85 married couples last year and another 100 couples this year. Data show that 70 of last year’s married couples and 95 of this year’s married couples are entrepreneurs. Using the 1% significance level, can it be stated that the proportion of entrepreneurs has significantly increased from last year to this year?

Solution: Let P1 and P2 be the true proportion of married couples last year and this year who are entrepreneurs, respectively.

Page 95: Goals in Statistic

HO: The proportion of entrepreneurs has not significantly increased from last year to this year, or (p1 = p2 ).

H1: The proportion of entrepreneurs significantly increased from last year to this year, or . (p1 < p2 ).

Significance Level : 1%Rejection Region:

2/zz

005.zz 33.2z

Page 96: Goals in Statistic

Computation of Test Statistics: z – test

21

21

11)1(

nnpp

ppz

851 n 1002 n 823.085

701 p 95.0

100

952 p

89.010085

)95.0(100)823.0(85

21

2211

nn

pnpnp

73.2

100

1

85

1)89.01)(89.0(

95.0823.0

z

Page 97: Goals in Statistic

Decision: Since –2.73 is within the rejection region, then reject HO : at the 1% significance level. Therefore, the proportion of entrepreneurs has significantly increased from last year to this year.

Page 98: Goals in Statistic

Dependent Samples

Samples are called dependent if any one of the following cases is true.

1. Before and after experiments - an experimental variable is being measured in terms of its effectiveness. This can be done by comparing the observations taken from the same group before and after the introduction of the experimental variable. If significant difference exists, then the experimental variable is said to be effective.

2.Two different groups matched pair by pair with respect to some relevant characteristics. The primary purpose of matching is to ensure that observations may differ because of the experimental variable being tested.

Page 99: Goals in Statistic

• A study is conducted to determine the effect of fraternity membership to the performance of the students. One way to measure the effect of fraternity membership (experimental variable) is to consider say for instance 30 students and their performance (grades) before and after membership to fraternity are being compared (case 1). Another way is to compare the performance of two groups of students, those who are fraternity members and those who are non-fraternity members. The students in the first group should be carefully matched with the students in the second group with respect to some relevant characteristics such as IQ level, study habits, gender, etc. This will ensure that if a significant difference in the performance exists, then this could solely be attributed to the fact that the first group of students are fraternity members and the second group are non-members.

Page 100: Goals in Statistic

a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1 = 2

H1: 1 < 2 H1: 1 > 2 H1: 1 2

Test Statistic:

where:

Sd

nddt

)( 0

)1(

)()( 2

nn

ddnSd i

ei

with v = n –1, n is the number of pairs

Page 101: Goals in Statistic

Rejection Region:a.) t < -t b.) t > t c.) t < -t/2 and t > t/2

Example. If you wished to measure the effectiveness of a new diet you would weigh the dieters at the start and at the finish of the program

Page 102: Goals in Statistic

Chapter SixAnalysis of Variance

GOALSWhen you have completed this chapter, you will be able to:

ONE Understand the purpose of Analysis of Variance.TWO

Discuss the general idea of analysis of variance.

THREE

Organize data into a one-way

Page 103: Goals in Statistic

Chapter Twelve continuedAnalysis of Variance

GOALSWhen you have completed this chapter, you will be able to:

FOURDefine and understand the terms treatments and blocks.FIVEConduct a test of hypothesis among three or more treatment means.

Page 104: Goals in Statistic

Purpose of ANOVA

• The Analysis of Variance (ANOVA for short) is a technique designed to test whether or not more than two sample means differ significantly from each other.

• In the preceding chapter, we learned that the t-test or the z-test is used to test for the significance of difference between two sample means. ANOVA, therefore, which can be used to test for the equality of several means simultaneously is an extension of the t-test or the z-test which can handle only two means at a time.

Page 105: Goals in Statistic

Illustration• Suppose for example, three methods of teaching (A,

B, C) are being compared in terms of effectiveness and are being employed to three groups of students. If the researcher uses the t-test, he will have to test separately for the following pairs: A and B, A and C, and B and C. In other words, the researcher would be using the t-test formula three times which would mean spending so much time and effort. Of course there is a possibility that none of the pairs are significantly different and so it is a waste of time and effort on the part of the researcher.

Page 106: Goals in Statistic

Purpose of ANOVA• Now, if the researcher uses ANOVA, he could take all

the three sample means simultaneously and the test stops if the conclusion arrived at is that of no significant difference. However, if the conclusion arrived at in using ANOVA shows significant difference between the three sample means (A, B, C), then the t-test must be used to find which pair of means differ significantly. Another way is to use the Duncan’s Multiple Range Test (DMRT) which is beyond the scope of this manual.

Page 107: Goals in Statistic

Why the name analysis of variance? • The phrase analysis of variance means that we will be

analyzing the total variation in a set of observations that can be attributed to specific sources or causes of variation. With reference to the above example, two such specific sources of variations might be (1) actual differences in the three methods that can be shown in the varying performance of the three group of students (treatment), and (2) chance, which in problems like this is usually called experimental error.

Page 108: Goals in Statistic

The populations have equal standard deviations.

ANOVA requires the following conditions

Underlying assumptions for ANOVA

The sampled populations follow the normal distribution.

The samples are independent

Page 109: Goals in Statistic

Steps in ANOVA

• The following are the steps and computations involved in the Analysis of Variance technique.

Ho: 1 = 2 = 3 = . . . = k ( for k sample means)

H1 : At least two means differ significantly

Test Statistic: Rejection Region: F > F tabular (, k-1, n-k)where

MSE

MSTrF

1)(

k

SSTrTreatmentMeanSquareMSTr

n

GrandTotal

n

ct

n

ct

n

ct

tesTreatmenSumofSquarSSTr

k

k22

2

22

1

21 )(

...

)(

Page 110: Goals in Statistic

Computational Formulas

kn

SSEErrorMeanSquareMSE

)(

SSTrTSSesErrorSumofSquarSSE )(

n

GrandTotalxSquaresTotalSumofTSS i

n

i

22

1

)()(

Page 111: Goals in Statistic

Source of Variation Degrees of Freedom(df)

Sum of Squares(SS)

Mean Square(MS)

FValue

Treatment k-1 SSTr MSTr F = MSTr/ MSE

Error n-k SSE MSE

Total n-1 TSS

Analysis of Variance (ANOVA) Table

Page 112: Goals in Statistic

ECP Restaurants specialize in meals for families. The owner recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants.

Example 1

She would like to know if there is a difference in the mean number of dinners sold per day at Restaurants A, B, and C. Use the .05 significance level.

Page 113: Goals in Statistic

Number of Dinners Sold by Restaurant

RestaurantDay

A B C

Day 1Day 2Day 3Day 4Day 5

13121412

10121311

1816171717

Example 1 continued

Page 114: Goals in Statistic

Solution:Ho: mA = mB = mC H1: At least 2 means differ significantly

Level of significance is .05.

Example 2 continued

Page 115: Goals in Statistic

Rejection Region:The numerator degrees of freedom, k-1, equal 3-1 or 2. The denominator degrees of freedom, n-k, equal 13-3 or 10. The value of F at 2 and 10 degrees of freedom is 4.10. Thus, H0 is rejected if F>4.10

Example 2 continued

Using the data provided, the ANOVA calculations follow.

Page 116: Goals in Statistic

Computations

= 132 + 122 + . . .+ 172 - (182) 2 / 13= 2634 – 2548 = 86

n

GrandTotalxSquaresTotalSumofTSS i

n

i

22

1

)()(

25.76254825.262413

)182()

5

85

4

46

4

51(

)(...

)(

2222

22

2

22

1

21

n

GrandTotal

n

ct

n

ct

n

ct

tesTreatmenSumofSquarSSTr

k

k

75.925.7686)( SSTrTSSesErrorSumofSquarSSE

Page 117: Goals in Statistic

ANOVA TableSource of Variation

Sum of Squares

Degrees of

Freedom

Mean Square

F

Treatments 76.25 3-1=2

76.25/2=38.125 38.125

.975= 39.103

Error 9.75 13-3=10

9.75/10=.975

Total 86.00 13-1=12

Example 1 continued

Page 118: Goals in Statistic

Example 1 continued

The ANOVA tables on the next two slides are from the SPSS and EXCEL systems

The mean number of meals sold at the three locations is not the same. Specifically, as shown in the Duncan’s Multiple Range Test, the meals sold at Restaurant C is significantly higher than A and B, but A and B do not differ significantly.

Since an F of 39.103 > the critical F of 4.10, the decision is to reject the null hypothesis and conclude that

At least two of the treatment means are not the same.

Page 119: Goals in Statistic

Example 1 continuedANOVAvolsold

Sum of Squares df Mean Square F Sig.Between Groups 76.250 2 38.125 39.103 .000

Within Groups 9.750 10 .975Total 86.000 12

volsoldDuncanRestaurant N Subset for alpha = 0.05

1 22 4 11.50001 4 12.75003 5 17.0000Sig. .094 1.000Means for groups in homogeneous subsets are displayed.

Page 120: Goals in Statistic

SUMMARY

Groups Count Sum Average Variance

Aynor 4 51 12.75 0.92

Loris 4 46 11.50 1.67

Lander 5 85 17.00 0.50

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 76.25 2 38.13 39.10 2E-05 4.10

Within Groups 9.75 10 0.98

Total 86.00 12

Anova: Single Factor

Example 2 continued

Page 121: Goals in Statistic

Some common Post Hoc Tests are Duncan’s Multiple Range Test of DMRT and Tukey’s Test.

When I reject the null hypothesis that the

means are equal, I want to know which treatment

means differ.

Page 122: Goals in Statistic

Chapter SevenLinear Regression and Correlation

GOALSWhen you have completed this chapter, you will be able to:

ONEDraw a scatter diagram.TWO Understand and interpret the terms dependent variable and independent variable.THREECalculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.FOURConduct a test of hypothesis to determine if the population coefficient of correlation is different from zero.

Goals

Page 123: Goals in Statistic

Chapter Thirteen continuedLinear Regression and Correlation

GOALSWhen you have completed this chapter, you will be able to:

FIVE Calculate the least squares regression line and interpret the slope and intercept values.SIXConstruct and interpret a confidence interval and prediction interval for the dependent variable.SEVENSet up and interpret an ANOVA table.

Goals

Page 124: Goals in Statistic

Correlation Analysis

The Independent Variable provides the basis for estimation. It is the predictor variable.

Correlation Analysis is a group of statistical techniques to measure the association between two variables.

A Scatter Diagram is a chart that portrays the relationship between two variables.

The Dependent Variable is the variable being predicted or estimated.

Advertising Minutes and $ Sales

0

5

10

15

20

25

30

70 90 110 130 150 170 190

Advertising Minutes

Sale

s ($

thou

sand

s)

Page 125: Goals in Statistic

The Coefficient of Correlation, r

Negative values indicate an inverse relationship and positive values indicate a direct relationship.

The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.

-1 10

P earson's r

Also called Pearson’s r and Pearson’s product moment correlation coefficient.

It requires interval or ratio-scaled data.

It can range from -1.00 to 1.00.

Values of -1.00 or 1.00 indicate perfect and strong correlation.

Values close to 0.0 indicate weak correlation.

Page 126: Goals in Statistic

Perfect Negative Correlation

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

Page 127: Goals in Statistic

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

Perfect Positive Correlation

Page 128: Goals in Statistic

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

Zero Correlation

Page 129: Goals in Statistic

0 1 2 3 4 5 6 7 8 9 10

10 9 8 7 6 5 4 3 2 1 0

X

Y

Strong Positive Correlation

Page 130: Goals in Statistic

Formula for r

We calculate the coefficient of correlation from the following formula.

])()(][)()([

))((2222 yynxxn

yxxynr

Page 131: Goals in Statistic

Coefficient of Determination

It is the square of the coefficient of correlation. It ranges from 0 to 1.It does not give any information on the direction of the relationship between the variables.

The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X).

Page 132: Goals in Statistic

Example

Suppose an investigator wants to determine the extent to which a relationship exists between size of income and the number of years of education the individual has completed. The following table shows the data of ten individuals. Compute and interpret r.

Page 133: Goals in Statistic

Individual # Y, Income (P,000) X, Education (years) 1 45 20

2 63 19 3 36 16 4 52 20 5 29 12 6 33 14 7 48 16 8 55 18 9 72 20 10 66 22

Page 134: Goals in Statistic

SolutionPreliminary computations of the above data: n = 10 x = 177 y = 499 xy= 9,173 x2

= 3,221 y2 = 26,793 Therefore,

])()(][)()([

))((2222 yynxxn

yxxynr

83.])499()26793(10][)177()3221(10[

)499)(177()9173(1022

r

The value of r2 which is (.83) 2 = .6889 means that 68.89% of the total variability in income is being explained by its linear relationship with education.

Page 135: Goals in Statistic

Testing for the Significance of rThe sample correlation coefficient r is a value computed from a random sample of n pairs of measurements. Different random samples of size n from the same population will generally produce different values of r. Therefore, we need to test for the significance of the computed r value. The null hypothesis will be = 0 against the alternative that 0. The rejection of Ho will lead to a conclusion that the existing relationship is significant at alpha level. However, failure to reject Ho would imply that the relationship is not significant and thus it can be attributed to chance.From the above result, test the null hypothesis that there is no linear relationship between income and education.. Use a 0.05 level of significance.

Page 136: Goals in Statistic

Solution: Ho: = 0H1 : 0 (two-tailed)

Level of Significance: 0.05 / 2 = 0.025Rejection Region: t > 2.306 and t < -2.306

Computation of the Test Statistic:

Decision: Since the value of the test statistic (t = 4.26) is greater

than the tabular value of 2.306, therefore reject the null hypothesis of no linear relationship and conclude that there is a significant relationship between education and income. Specifically, the higher the educational attainment, the higher the income earned

26.4)83(.1

)8(83.

1

)2(22

r

nrt

Page 137: Goals in Statistic

Regression Analysis

The least squares criterion is used to determine the equation. That is the term S(Y – Y’)2 is minimized.

In Regression Analysis we use the independent variable (X) to estimate the dependent variable (Y).

The relationship between the

variables is linear.

Both variables must be at least

interval scale.

Page 138: Goals in Statistic

The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

For each value of X, there is a group of Y values, and these Y values are normally distributed.

Assumptions Underlying Linear Regression

The means of these normal distributions of Y values all lie on the straight line of regression.

The standard deviations of these normal distributions are the same.

Page 139: Goals in Statistic

Regression Analysis

The regression equation is Y(hat)= a + bX (1)

where Y(hat) is the predicted value of Y for any X.

a is the Y-intercept. It is the estimated Y value when X=0

b is the slope of the line, or the average change in Y’ for each change of one unit in X

The least squares principle is used to obtain a and b.

Page 140: Goals in Statistic

Deterministic ModelModel (1) is called a deterministic model. It

gives an exact relationship between x and y. This model expresses that y is determined exactly by x and for a given value of x there is one and only one value of y.

Page 141: Goals in Statistic

Probabilistic ModelHowever, in many cases the relationship between variables is

not exact. For instance, if y is food expenditure and x is income, then model (1) would state that food expenditure is determined by income only and that all households with the same income will spend the same amount of food. But food expenditure is determined by many other variables such as household size, taste and preference which explains why different households with the same income spend different amounts of money for food. Hence, to take these variables into considerations and to make our model complete, we add another term to the right side of model(1) called the random error term ( ).

• The complete regression model is written asy = A + Bx + (2)

Page 142: Goals in Statistic

The regression model (2) is called a probabilistic model or a statistical relationship.

The random error term () is included in the model to represent the following two phenomena.

• Missing or omitted variables. As mentioned earlier, food expenditure is affected by many variables other than income. The random error term is included to capture the effect of all those missing variables that have not been included in the model.

• Random variation. Human behavior is unpredictable. For example, a household may have many parties during the month and may spend more than usual on food during that month. This variation in food expenditure may be called random variation.

Page 143: Goals in Statistic

Regression Analysis

The least squares principle is used to obtain a and b. The equations to determine a and b are:

22 )()(

))((

xxn

yxxynb

Page 144: Goals in Statistic

• An ExampleWe have already established a significant linear relationshipbetween income and education (i.e. number of years ineducation). We may now formulate an equation allowing us topredict the income of a person given the number of years heattended for his education. Referring to the above example,

wefind thatn = 10 Xi = 177 Yi = 499 Xi2 = 3221 Yi2 = 2679 XiYi =

9173 Y(bar) = 49.9 X(bar) = 17.7

Page 145: Goals in Statistic

Therefore,

And

The model for prediction purposes is given by

87.3)177()3221(10

)499)(177()9173(10

)()(

))((222

xxn

yxxynb

59.18)7.17(87.39.49 xbya

xy 87.358.18

Page 146: Goals in Statistic

The equation can be interpreted as per additional year in

educational attainment, there corresponds 3.87 or 3,870 pesos

increase in the income of a person.

Page 147: Goals in Statistic

We can use the regression equation to estimate values of Y.

The estimated income of a person who have spent 16 years of education will be:

34.43)16(87.358.18 Income

Page 148: Goals in Statistic

Standard Error of Estimate (SEE) – a measure of the variability of the regression line, i.e. the

dispersion around the regression line – it tells how much variation there is in the dependent

variable between the raw value and the expected value in the regression

– this SEE allows us to generate the confidence interval on the regression line as we did in the estimation of means

Page 149: Goals in Statistic

ExerciseIt is generally known that the number of road accidents is

inversely proportional with road width. The following data show the results of a study indicating the number of accidents occurring annually at roads with different widths:

Road width (in feet) (X)75 52 60 33 22 40 70 35 55 80Number of accidents (Y)40 84 55 92 90 86 38 88 78 32

– Draw the scatter diagram.– Find the correlation coefficient and test for its significance.– Establish a regression equation of the form Y = a + bX– Predict the number of accidents for a 50 ft road width.