Goals in Statistic
-
Upload
jay-eamon-reyes-mendros -
Category
Documents
-
view
36 -
download
5
Transcript of Goals in Statistic
![Page 1: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/1.jpg)
Chapter Three Describing Data: Numerical Measures
GOALSWhen you have completed this chapter, you will be able to:ONECalculate the arithmetic mean, median, mode, weighted mean, and the geometric mean.TWO Explain the characteristics, uses, advantages, and disadvantages of each measure of location.THREEIdentify the position of the arithmetic mean, median, and mode for both a symmetrical and a skewed distribution.
Goals
3- 1
![Page 2: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/2.jpg)
FOURCompute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data.
Describing Data: Numerical Measures
FIVEExplain the characteristics, uses, advantages, and disadvantages of each measure of dispersion.
SIXUnderstand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations.
Goals
Chapter Three 3- 2
![Page 3: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/3.jpg)
Characteristics of the Mean
It is calculated by summing the values and dividing by the number of values.
It requires the interval scale.All values are used.It is unique.The sum of the deviations from the mean is 0.
The Arithmetic Mean is the most widely used measure of location and shows the central value of the data.
The major characteristics of the mean are: A verag e J oe
3- 3
![Page 4: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/4.jpg)
Population Mean
N
X
where µ is the population meanN is the total number of observations.X is a particular value. indicates the operation of adding.
For ungrouped data, the
Population Mean is the sum of all the population values divided by the total number of population values:
3- 4
![Page 5: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/5.jpg)
Example 1
500,484
000,73...000,56
N
X
Find the mean mileage for the cars.
A Parameter is a measurable characteristic of a population.
The Kiers family owns four cars. The following is the current mileage on each of the four cars.
56,000
23,000
42,000
73,000
3- 5
![Page 6: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/6.jpg)
Sample Mean
n
XX
where n is the total number of values in the sample.
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:
3- 6
![Page 7: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/7.jpg)
Example 2
4.155
77
5
0.15...0.14
n
XX
A statistic is a measurable characteristic of a sample.
A sample of five executives received the following bonus last year ($000):
14.0, 15.0, 17.0, 16.0, 15.0
3- 7
![Page 8: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/8.jpg)
Properties of the Arithmetic Mean
Every set of interval-level and ratio-level data has a mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small data values.
The arithmetic mean is the only measure of location where the sum of the deviations of each value from the mean is zero.
Properties of the Arithmetic Mean 3- 8
![Page 9: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/9.jpg)
Example 3
0)54()58()53()( XX
Consider the set of values: 3, 8, and 4. The mean is 5. Illustrating the fifth
property
3- 9
![Page 10: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/10.jpg)
Weighted Mean
)21
)2211
...(
...(
n
nnw
www
XwXwXwX
The Weighted Mean of a set of numbers X1, X2, ..., Xn, with
corresponding weights w1, w2, ...,wn, is computed from the following
formula:
3- 10
![Page 11: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/11.jpg)
Example 4
89.0$50
50.44$1515155
)15.1($15)90.0($15)75.0($15)50.0($5
wX
During a one hour period on a hot Saturday afternoon cabana boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10.
Compute the weighted mean of the price of the drinks.
3- 11
![Page 12: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/12.jpg)
The Median
There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.
The Median is the midpoint of the values after they have been ordered from the smallest to the largest.
3- 12
![Page 13: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/13.jpg)
The ages for a sample of five college students are:21, 25, 19, 20, 22.
Arranging the data in ascending order gives:
19, 20, 21, 22, 25.
Thus the median is 21.
The median (continued)
3- 13
![Page 14: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/14.jpg)
Example 5
Arranging the data in ascending order gives:
73, 75, 76, 80
Thus the median is 75.5.
The heights of four basketball players, in inches, are: 76, 73, 80, 75.
The median is found at the (n+1)/2 =
(4+1)/2 =2.5th data point.
3- 14
![Page 15: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/15.jpg)
Properties of the Median
There is a unique median for each data set.It is not affected by extremely large or small values and is therefore a valuable measure of location when such values occur.It can be computed for ratio-level, interval-level, and ordinal-level data.It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.
Properties of the Median
3- 15
![Page 16: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/16.jpg)
The Mode: Example 6
Example 6: The exam scores for ten nursing students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.
Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.
The Mode is another measure of location and represents the value of the observation that appears most frequently.
3- 16
![Page 17: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/17.jpg)
Symmetric distribution: A distribution having the same shape on either side of the center
Skewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.
Can be negatively skewed, or positively skewed
The Relative Positions of the Mean, Median, and Mode
3- 17
![Page 18: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/18.jpg)
The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Zero skewness Mean
=Median
=Mode
M o d e
M ed ia n
M ea n
3- 18
![Page 19: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/19.jpg)
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
• Positively skewed: Mean and median are to the right of the mode.
Mean>Median>Mode
M o d e
M ed ia n
M ea n
3- 19
![Page 20: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/20.jpg)
Negatively Skewed: Mean and Median are to the left of the Mode.
Mean<Median<Mode
The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution
M o d eM ea n
M ed ia n
3- 20
![Page 21: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/21.jpg)
Dispersion refers to the spread or variability in the data.
Measures of dispersion include the following: range, mean deviation, variance, and standard deviation.
Range = Largest value – Smallest value
Measures of Dispersion
0
5
10
15
20
25
30
0 2 4 6 8 10 12
3- 21
![Page 22: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/22.jpg)
The following represents the scores of the 25 master of Nursing students during the first long examination.
65.0 60.0 59.0 81.0 73.048.0 95.0 63.0 92.0 53.078.0 56.0 79.0 95.0 78.080.0 89.0 79.0 97.0 69.065.0 77.0 80.0 83.0 75.0
Example 9
Highest score:97 Lowest score:48
Range = Highest value – lowest value= 97 - 48= 49
3- 22
![Page 23: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/23.jpg)
Mean Deviation
The arithmetic mean of the
absolute values of the
deviations from the arithmetic
mean.
The main features of the mean deviation are:
All values are used in the calculation.
It is not unduly influenced by large or small values.
Mean Deviation
M D = X - X
n
3- 23
![Page 24: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/24.jpg)
The weights of a sample of crates containing books for the bookstore (in pounds ) are:
103, 97, 101, 106, 103Find the mean deviation.
X = 102
The mean deviation is:
4.25
541515
102103...102103
n
XXMD
Example 10
3- 24
![Page 25: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/25.jpg)
Variance: the arithmetic mean of the squared
deviations from the mean.
Standard deviation: The square root of the variance.
Variance and standard Deviation
3- 25
![Page 26: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/26.jpg)
Not influenced by extreme values.All values are used in the calculation.
The major characteristics of the
Population Variance are:
Population Variance
3- 26
![Page 27: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/27.jpg)
Population Variance formula:
(X - )2
N =
X is the value of an observation in the population
m is the arithmetic mean of the population
N is the number of observations in the population
Population Standard Deviation formula:
2Variance and standard deviation
3- 27
![Page 28: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/28.jpg)
Sample variance (s2)
s 2 =(X - X ) 2
n -1
Sample standard deviation (s)
2ss
Sample variance and standard deviation
3- 28
![Page 29: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/29.jpg)
40.75
37
n
XX
30.515
2.2115
4.76...4.77
1
2222
n
XXs
Example 11
The hourly wages earned by a sample of five students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
30.230.52 ss
3- 29
![Page 30: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/30.jpg)
Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at
least:
where k is any constant greater than 1.
2
11
k
Chebyshev’s theorem
3- 30
![Page 31: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/31.jpg)
Empirical Rule: For any symmetrical, bell-shaped distribution:
About 68% of the observations will lie within 1s of the mean
About 95% of the observations will lie within 2s of the mean
Virtually all the observations will be within 3s of the mean
Interpretation and Uses of the Standard Deviation
3- 31
![Page 32: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/32.jpg)
Bell -Shaped Curve showing the relationship between and .
-m 3s -m s -1m s m +1m s +m s +3m s
68%
95%99.7%
Interpretation and Uses of the Standard Deviation
3- 32
![Page 33: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/33.jpg)
The Mean of Grouped Data
n
fXX
The Mean of a sample of data organized in a frequency
distribution is computed by the following formula:
3- 33
![Page 34: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/34.jpg)
From the example on hours spent in studying, we have
Hours studying Class Midpoint(X) Frequency(f) fX
30.2 – 35.1 32.65 1 32.65
25.2 – 30.1 27.65 3 82.95
20.2 – 25.1 22.65 7 158.55
15.2 – 20.1 17.65 11 194.15
10.2 – 15.1 12.65 8 101.2
∑fX=569.5
Therefore, the mean hours spent in studying is 569.5 / 30 = 18.98 hours
![Page 35: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/35.jpg)
The Median of Grouped Data
)(2 if
CFn
LMedian
where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the class width.
The Median of a sample of data organized in a frequency distribution is computed by:
3- 35
![Page 36: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/36.jpg)
Finding the Median Class
To determine the median class for grouped data
Construct a cumulative frequency distribution.
Divide the total number of data values by 2.
Determine which class will contain this value. For example, if n=30, 30/2 = 15, then determine which class will contain the 15th value.
3- 36
![Page 37: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/37.jpg)
From the example on hours spent in studying
Hours Studying
Lower Limit
f Cumulative Frequency
30.2 – 35.1 30.2 1 30
25.2 – 30.1 25.2 3 29
20.2 – 25.1 20.2 7 26
15.2 – 20.1 15.2 11 19
10.2 – 15.1 10.2 8 8
![Page 38: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/38.jpg)
n/2 = 30/2 = 15. The second class (15.2 – 20.1) having a cumulative frequency of 19 is the median class. The lower limit is 15.2. The cumulative frequency (CF) that precedes the median class is 8. The frequency (f) of the median class is 11.The class width, i = 5.
Therefore, the median is:Median = 15.2 + (15-8)(5) = 18.4 11
![Page 39: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/39.jpg)
The Mode of Grouped Data
The Mode for grouped data is approximated by the formula
3- 39
))((21
1 idd
dLMode
21
1
dd
dLbMo (i)
whereL = lower limit of the modal class intervald1 = diff. between the freq. of the modal CI
and the next class lower in valued2 = diff. between the freq. of the modal CI
and the next class higher in value i = class width
![Page 40: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/40.jpg)
From the example on the hours spent in studying, we have
• the modal class is 15.2 – 20.1 with the highest frequency of 11(L = 15.2)
• d1 =11-8 = 3• d2 = 11-7 = 4• i = 5• Therefore, the mode is
Mode = 15.2 +[3/(3+4)](5) = 17.3Since the Mean = 18.98 > median = 18.4> mode = 17.3,
therefore the distribution of the number of hours spent in studying is positively skewed.
![Page 41: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/41.jpg)
Variance for Grouped Data
222
)1(
nn
fxfxns
wheren = sample sizef = frequencyx = class marks
![Page 42: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/42.jpg)
From the example on hours spent in studying, we have
Hours studying X f fX fX2
30.2 – 35.1 32.65 1 32.65 1066.02
25.2 – 30.1 27.65 3 82.95 2293.57
20.2 – 25.1 22.65 7 158.55 3591.16
15.2 – 20.1 17.65 11 194.15 3426.75
10.2 – 15.1 12.65 8 101.2 1280.18
∑fX=569.5 ∑fX2=11657.68
![Page 43: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/43.jpg)
19.29870
25.25399
)130(30
)5.569()65.11657(30 22
s
And the standard deviation is
S = (29.19)1/2 = 5.4
Therefore, the variance is
![Page 44: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/44.jpg)
Chapter FourOne-Sample Tests of Hypothesis
GOALSWhen you have completed this chapter, you will be able to:
ONEDefine a hypothesis and hypothesis testing.TWO Describe the five step hypothesis testing procedure.THREEDistinguish between a one-tailed and a two-tailed test of hypothesis.FOURConduct a test of hypothesis about a population mean.
![Page 45: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/45.jpg)
Chapter Ten continued
GOALSWhen you have completed this chapter, you will be able to:
FIVE Conduct a test of hypothesis about a population proportion.SIXDefine Type I and Type II errors.
One-Sample Tests of Hypothesis
![Page 46: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/46.jpg)
What is a Hypothesis?
What is a Hypothesis?
A HYPOTHESIS is defined by Webster as “a tentative theory or supposition provisionally adopted to explain certain facts and to guide in the investigation of others”.
A statistical hypothesis is an assertion or statement that may or may not be true concerning one or more population.
![Page 47: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/47.jpg)
Example of Statistical Hypotheses
• A leading drug in the treatment of hypertension has an advertised therapeutic success rate of 84%. A medical researcher believes he has found a new drug for treating hypertensive patients that has higher therapeutic success rate than the leading drug but with fewer side effects. He should assume that it is no better than the leading drug and then set out to reject this contention. The two statements: the new drug is no better than the old one (p = 0.84) and the new drug is better than the old one (p > 0.84) are examples of statistical hypothesis.
![Page 48: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/48.jpg)
What is Hypothesis Testing?
Hypothesis testing
Based on sample evidence and
probability theory
Used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable
and should be rejected
![Page 49: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/49.jpg)
Types of Hypothesis1. Null Hypothesis (Ho) - the hypothesis that two or more variables are not related or that two or more statistics (e.g. means for two different groups) are not significantly different. It is a negation of the theory that the researcher would like to derive. In the above example, the statement ‘the new drug is no better than the old one’ is an example of a null hypothesis. It is usually constructed to enable the researcher to evaluate his own theory or the research hypothesis. In other words, the null hypothesis is stated with the sole purpose of rejecting it, thereby accepting the research hypothesis. Equality symbol (=) is commonly used in stating the null hypothesis.
![Page 50: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/50.jpg)
Types of Hypothesis2. Alternative Hypothesis (H1) – the hypothesis derived from the theory of the investigator and generally state a specified relationship between two or more variables or that two or more statistics significantly differ. In other words, it is the operational statement of the investigator’s research hypothesis. The second statement ‘the new drug is better than the old one’ (above example) is an example of alternative hypothesis. The symbols commonly used are >, < and .
![Page 51: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/51.jpg)
Two ways of stating the alternative hypothesis: Predictive H1 (One-tailed or directional) – specifies the type of relationship existing between two or more variables (e.g. direct or inverse relationship) or specifies the direction of the difference between two or more statistics (e.g. 1> 2 or 1 < 2).Non-predictive H1 (Two-tailed or non-directional )– does not specify the type of relationship or the direction of the difference ( e.g. 1 2)
![Page 52: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/52.jpg)
One-Tailed and Two-Tailed TestsA test of any hypothesis where the alternative hypothesis is predictive such asHo: 1 = 2
H1 : 1 > 2 or 1 < 2
is called a one-tailed test. The rejection region for the alternative hypothesis 1 > 2 lies entirely in the right tail of the distribution, while the rejection region for the alternative hypothesis 1 < 2 lies entirely in the left tail.
![Page 53: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/53.jpg)
A test of any hypothesis where the alternative hypothesis is non-predictive such asHo: 1 = 2
H1 : 1 2
is called a two-tailed test, since the rejection region is split into two equal parts placed in each tail of the distribution.
![Page 54: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/54.jpg)
More Examples of Stating HypothesisExample1. Suppose you want to study the association between job satisfaction of the employees and the labor turnover in a certain private hospital.
Ho: There is no significant relationship between job satisfaction and the labor turnover in a certain private school or stated differently, job satisfaction does not significantly affect labor turnover
H1: There is an inverse relationship between job satisfaction and labor turnover, more specifically, when job satisfaction decreases, labor turnover increases.(predictive)
![Page 55: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/55.jpg)
More Examples . . .Example 2. A researcher is conducting a study to determine if suicide incidence among teenagers can be attributed to drug use.
Ho: There is no significant difference between the suicide rates of teenagers who use drugs and those who do not.
H1: Suicidal rates of teenagers who use drugs are significantly higher than the suicidal rates of non-users.(predictive)
H1 : There is a significant difference between the suicide rates of teenagers who use drugs and those who do not.(non-predictive)
![Page 56: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/56.jpg)
More Examples . . .Example 3. A social researcher is conducting a study to determine if the level of women’s participation in community extension programs of the barangay can be affected by their educational attainment, occupation, income, civil status, and age.
Ho : The level of women’s participation in community extension programs is not affected by their educational attainment, occupation, income, civil status and age.
H1 : The level of women’s participation in community extension programs is affected by their educational attainment, occupation, income, civil status and age.
![Page 57: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/57.jpg)
Hypothesis Testing
D o n o t re jec t n u ll R e jec t n u ll an d accep t a lte rn a te
S tep 5 : Take a sam p le , a rrive a t a d ec is ion
S tep 4 : F orm u la te a d ec is ion ru le
S tep 3 : Id en tify th e tes t s ta tis t ic
S tep 2 : S e lec t a leve l o f s ig n ifican ce
S tep 1 : S ta te n u ll an d a lte rn a te h yp o th eses
![Page 58: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/58.jpg)
Three possibilities
regarding means
H0: m = 0H1: m = 0
H0: m = 0H1: m > 0
H0: m = 0H1: m < 0
Step One: State the null and alternate hypotheses
The null hypothesis
always contains equality.
3 hypotheses about means
![Page 59: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/59.jpg)
Step Two: Select a Level of Significance.
The probability of rejecting the null hypothesis when it is actually true; the
level of risk in so doing.
Rejecting the null hypothesis when it is actually true ( ).a
Accepting the null hypothesis when it is actually false ( ).b
Level of Significance
Type I Error
Type II Error
![Page 60: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/60.jpg)
Step Two: Select a Level of Significance.
Researcher Null Accepts RejectsHypothesis Ho Ho
Ho is true
Ho is false
Correctdecision
Type I error( )a
Type IIError
( )b
Correctdecision
Risk table
![Page 61: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/61.jpg)
Step Three: Select the test statistic.
A value, determined from sample information, used to determine whether or not to reject the null hypothesis.
Examples: z, t, F, c2
Test statistic z Distribution as a test statistic
n/
X
z
The z value is based on the sampling distribution of X, which is normally distributed when the sample is reasonably large (recall Central Limit Theorem).
![Page 62: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/62.jpg)
Step Four: Formulate the decision rule.Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected.
0 1.65
D o not
re ject
[P robability = .95]
R egion of
re jection
[P robability= .05]
C ritica l va lue
Sampling DistributionOf the Statistic z, aRight-Tailed Test, .05Level of Significance
![Page 63: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/63.jpg)
Reject the null hypothesis and accept the alternate hypothesis if
Computed -z < Critical -z
or
Computed z > Critical z
Decision Rule
Decision Rule
![Page 64: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/64.jpg)
Using the p-Value in Hypothesis Testing
If the p-Value is larger than or equal to the significance level, a, H0 is not rejected.
p-ValueThe probability, assuming that the null hypothesis is true, of finding a value of the test statistic at least as extreme as the computed value for the test
Calculated from the probability distribution function or by computer
Decision Rule
If the p-Value is smaller than the significance level, a, H0 is rejected.
![Page 65: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/65.jpg)
> .0 5 .1 0p
> .0 1 .0 5p
Interpreting p-values
SOME evidence Ho is not true
> .0 0 1 .0 1p
STRONG evidence Ho is not true VERY STRONG evidence Ho is not true
![Page 66: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/66.jpg)
Step Five: Make a decision.
Movie
![Page 67: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/67.jpg)
Test Concerning Means ( is known or n 30) – a sample mean is to be compared with the population mean.a.) Ho: = o b.) Ho: = o c.) Ho: = o
H1: < o H1: > o H1: o
Test Statistic:
Rejection Region:a.) Z < -Z b.) Z > -Z c.) Z < -Z Z > Z/2
nx
z o )(
![Page 68: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/68.jpg)
An Example A private university hypothesized that the mean starting monthly salary of its
graduates is P8000 with a standard deviation of P1500. A sample of 25 employed graduates showed an average starting salary of P7500. Test this hypothesis at 5% level of significance. Solution:
Ho: = 8000H1: 8000Level of significance: /2 = 0.05/2 = 0.025 (two-tailed)
Rejection Region: Z 1.96 and Z -1.96 (refer to Table A.1 Appendix)
![Page 69: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/69.jpg)
Computation of the Test Statistic: Given: = 7500, = 900, n = 25
Decision: Since the value of the test statistic (z = -1.67) is greater than the tabular value of –1.96, then do not reject Ho and conclude that the claim of the private university is true.
67.11500
25)80007500()(
nx
z o
![Page 70: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/70.jpg)
Test Concerning Means ( is unknown and n < 30)a.) Ho: = o b.) Ho: = o c.) Ho: = o
H1: < o H1: > o H1: o
Test Statistic:
Rejection Region: a.) t < -t b.) .) t > -t c.) .) t < -t/2 and t > t/2
s
nxt o )(
![Page 71: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/71.jpg)
An Example A perfume company claims that their best selling brand, Blue Ginger, has an average purity of 87.9%. Twenty bottles were examined to have an average purity of 85.6% with a standard deviation of 8.3%. Test at the 1% significance level whether the perfume company is telling the truth.Solution:HO : The average purity of blue ginger is equal to 87.9% ( = 0.879).H1 : The average purity of blue ginger is less than 87.9%, or ( < 0.879).Significance Level : 1%Rejection Region: t - 2.539 (with = 0.01 and degrees of freedom of 19, i.e.20 – 1 = 19
![Page 72: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/72.jpg)
Computation of the Test Statistic: t – test
Decision: Since –1.24 is outside the rejection region, then we accept HO: at the 1% significance level. Therefore, the average purity of blue ginger is equal or greater than 87.9%. Therefore, the perfume company is telling the truth.
856.0x 879.0O 083.0s 20n
n
sx
t O
20
083.0879.0856.0
24.1
![Page 73: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/73.jpg)
Test Concerning Proportion Tests of hypotheses concerning proportions are required in many areas. A drug manufacturer is certainly interested in knowing what fraction of the patients recover after taking the medicine. In business, manufacturing firms are concerned about the proportion of defective when shipment is made. A social researcher might be interested in determining the proportion of people favoring the legalization of abortion in the Philippines.
![Page 74: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/74.jpg)
Steps in testing a proportion1. Ho: p = po
2. H1: p < po , p > po or p po
Choose a level of significance of size .Establish the rejection regionComputation: where p= sample proportion
p0 = population proportionn = sample size
Make a decision that is, reject Ho if z falls in the rejection region, otherwise do not reject Ho.
n
pp
ppz
OO
O
)1(
![Page 75: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/75.jpg)
An Example A television station intends to stop airing a “telenovela” show, “Maganda ang Buhay”, if less than 35% of its intended televiewers watch the said program. A random sample of 1500 households yields an average viewing percentage of 32.9%. Should the television company pull out the airing of the show? Use the 5% significance level.
![Page 76: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/76.jpg)
Solution:HO: The average viewing percentage of the telenovela “Maganda ang Buhay” is equal 35% (p = .35). H1: The average viewing percentage of the telenovela “Maganda ang Buhay” is less than 35% (p <.35) Significance Level: 5% Rejection Region:
zz 65.1z
![Page 77: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/77.jpg)
Computation of Test Statistics: z – test
Decision: Since –1.71 is within the rejection region, then we reject HO at the 5% significance level. Therefore, the average viewing percentage of the telenovela “Maganda ang Buhay” is less than 35% and the television company should pull out the airing of the show.
1500n 329.0p 35.0Op
n
pp
ppz
OO
O
)1(
1500
)35.01(35.0
35.0329.0
71.1
![Page 78: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/78.jpg)
Exercises1. A sample of size 60 has a mean of 12.8 and a standard deviation of 2.5. Test the
hypothesis that the population mean is 12 using a.) a one-tailed test at 0.01 level and b.) a two-tailed test at 0.05 level.
2. Suppose it is known that the mean annual income of assembly line workers in a certain plant is P100,000 with a standard deviation of P7000. You suspect that workers with active union interests have higher than the average incomes and take a random sample of 85 of these active members, obtaining a mean of P105,000. Can you say that active union members have significantly higher incomes? Use 5% level of significance.
![Page 79: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/79.jpg)
3. Last year the employees of the city sanitation department donated an average of 250 pesos to the volunteer rescue squad. Test the hypothesis at 0.01 level that the average contribution this year is still 250 if a random sample of 15 employees showed an average of 265 pesos with a standard deviation of 15 pesos. Assume that the donations are approximately normally distributed.4. Suppose that in the past 40% of all adults favored capital punishment. Do we have reason to believe that the proportion of adults favoring capital punishment today has increased if, in a random sample of 200 adults 120 favor capital punishment? Use a 0.05 level of significance.
![Page 80: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/80.jpg)
5. The average height of males in the freshmen class of a certain university has been 65.8 inches, with a standard deviation of 3.2 inches. Is there reason to believe that there has been an increase in the average height if a random sample of 40 males in the present freshmen class have an average height of 67.5 inches? Use a 1% level of significance. 6.A researcher knows that the average height of Filipino women is 1.525 meters. A random sample of 25 women was taken and was found to have a mean height of 1.572 meters, with a standard deviation of .12 meters. Is there reason to believe that the 25 women in the sample are significantly taller than the others at .05 level of significance?
![Page 81: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/81.jpg)
7. A random sample of 100 recorded deaths in the Philippines during the past year showed an average life span of 71.8 years with a standard deviation of 8.9 years. Does this seem to indicate that the average life span today is greater than 70 years? Use a 0.05 level of significance.8. A mayoralty candidate in a certain city expects that approximately 60% of the voters will favor him in the coming election. To support his claim, he let a social researcher conduct a survey consisting of 100 randomly selected voters in the different barangays comprising the city. Results showed that 70 interviewed voters favor this particular mayoralty candidate. Is this sufficient evidence to conclude that the proportion of voters favoring this candidate in the coming election is higher than what he expected. Use .01 level of significance.
![Page 82: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/82.jpg)
Chapter Five
Two-Sample Tests of HypothesisGOALS
When you have completed this chapter, you will be able to:
TWOConduct a test of hypothesis regarding the difference in two population proportions.
THREEConduct a test of hypothesis about the mean difference between paired or dependent observations.
ONEConduct a test of hypothesis about the difference between two independent population means.
![Page 83: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/83.jpg)
Chapter Eleven continued
Two Sample Tests of Hypothesis
GOALSWhen you have completed this chapter, you will be able to:
FOURUnderstand the difference between dependent and independent samples.
![Page 84: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/84.jpg)
Difference – of – Means Test ( 1 and 2 are known) – two sample means are being compared.
a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1
= 2
H1: 1 < 2 H1: 1 > 2 H1: 1 2
Test Statistic:
2
22
1
21
021 )(
nn
dxxz
![Page 85: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/85.jpg)
Rejection Region:a.) Z < -Z b.) Z > Z c.) Z < -Z/2 or Z > Z/2
An Example A university investigation team conducted a study to determine whether car ownership of students affects their academic performance. A random sample of 50 students who are non-car owners showed a grade point average (GPA) of 85 with a standard deviation of 10.2 while a group of 60 car owners got an average grade of 80 with a standard deviation of 8.9. Do the data provide sufficient evidence to indicate that the non-car owners are better than car owners in terms of academic performance? Test using 5 % level of significance.
![Page 86: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/86.jpg)
Solution:Let 1 = mean grade for non-car owners
2 = mean grade for car owners
Ho: 1 = 2
H1: 1 > 2 (one-tailed)
Level of significance: = 0.05Rejection Region: Z 1.645 ( refer to Table A.1 Appendix)
Computation of the Test Statistic:Given:
60,9.8,80
50,2.10,85
2222
111 1
nsx
nsx
![Page 87: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/87.jpg)
The value of the z statistic will be
Decision: Since the value of the test statistic (z= 2.71) is greater than the tabular value of 1.645, then reject Ho. The data provide sufficient evidence to indicate non-car owners have better academic performance than car owners.
71.2
60
)9.8(
50
)2.10(
0)8085()(22
2
22
1
21
021
nn
dxxz
![Page 88: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/88.jpg)
Difference – of – Means Test ( 1 = 2 = but unknown)
a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1 = 2
H1: 1 < 2 H1: 1 > 2 H1: 1 2
Test Statistic:
where
with degrees of freedom v = n1 + n2 -2
21
021
11
)(
nnSp
dxxt
2
)1()1(
21
222
211
nn
snsnSp
![Page 89: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/89.jpg)
Rejection Region: a.) t < -t b.) t > t c.) t < -t/2
t > t/2
An Example A businessman is planning to put up either a computer store or a video store. He randomly selected 10 computer stores and 10 video stores. The average weekly incomes are P45,000 and P53,000 respectively with corresponding standard deviations of P17,500 and P15,000. At the 1% significance level, indicate whether he should be convinced to put up a computer store or a video store.
![Page 90: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/90.jpg)
Solution:Let 1 = mean weekly income of a computer stores
2 = mean weekly income of video stores
Ho: 1 = 2
H1: 1 ≠ 2 (two-tailed)
Significance Level: 1%Rejection Region: and and t> 2.878
18,005.tt 18,005.tt
878.2t
![Page 91: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/91.jpg)
Computation of Test Statistics: t – test
2121
222
211
21
112
)1()1(nnnn
snsn
xxt
1021 nn
000,451 Px 000,532 Px 500,171 P 000,152 P
1.1
10
1
10
1
21010
)000,15)(110()500,17)(110(
000,53000,4522
t
![Page 92: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/92.jpg)
Decision: Since –1.1 is not within the rejection region, then we accept HO : at the 1% significance level. Therefore, there is no significant difference whether the businessman puts up either a computer store or video store. Therefore, opening a computer store is as profitable as opening a video store.
![Page 93: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/93.jpg)
Testing the Difference of Two Proportions
a) Ho: P1 = P 2 b.) Ho: P1 = P2 c.) Ho: P1 = P2
H1: P1 < P2 H1: P1 > P2 H1: P1 P2
Test Statistic:
where Rejection Region:
a.) Z < -Z b.) Z > Z c.) Z < -Z/2 and Z > Z/2
21
21
11)1(
nnpp
ppz
2
22
1
11 ,
n
xp
n
xp
21
2211
nn
pnpnp
![Page 94: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/94.jpg)
An Example
Consider random samples of 85 married couples last year and another 100 couples this year. Data show that 70 of last year’s married couples and 95 of this year’s married couples are entrepreneurs. Using the 1% significance level, can it be stated that the proportion of entrepreneurs has significantly increased from last year to this year?
Solution: Let P1 and P2 be the true proportion of married couples last year and this year who are entrepreneurs, respectively.
![Page 95: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/95.jpg)
HO: The proportion of entrepreneurs has not significantly increased from last year to this year, or (p1 = p2 ).
H1: The proportion of entrepreneurs significantly increased from last year to this year, or . (p1 < p2 ).
Significance Level : 1%Rejection Region:
2/zz
005.zz 33.2z
![Page 96: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/96.jpg)
Computation of Test Statistics: z – test
21
21
11)1(
nnpp
ppz
851 n 1002 n 823.085
701 p 95.0
100
952 p
89.010085
)95.0(100)823.0(85
21
2211
nn
pnpnp
73.2
100
1
85
1)89.01)(89.0(
95.0823.0
z
![Page 97: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/97.jpg)
Decision: Since –2.73 is within the rejection region, then reject HO : at the 1% significance level. Therefore, the proportion of entrepreneurs has significantly increased from last year to this year.
![Page 98: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/98.jpg)
Dependent Samples
Samples are called dependent if any one of the following cases is true.
1. Before and after experiments - an experimental variable is being measured in terms of its effectiveness. This can be done by comparing the observations taken from the same group before and after the introduction of the experimental variable. If significant difference exists, then the experimental variable is said to be effective.
2.Two different groups matched pair by pair with respect to some relevant characteristics. The primary purpose of matching is to ensure that observations may differ because of the experimental variable being tested.
![Page 99: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/99.jpg)
• A study is conducted to determine the effect of fraternity membership to the performance of the students. One way to measure the effect of fraternity membership (experimental variable) is to consider say for instance 30 students and their performance (grades) before and after membership to fraternity are being compared (case 1). Another way is to compare the performance of two groups of students, those who are fraternity members and those who are non-fraternity members. The students in the first group should be carefully matched with the students in the second group with respect to some relevant characteristics such as IQ level, study habits, gender, etc. This will ensure that if a significant difference in the performance exists, then this could solely be attributed to the fact that the first group of students are fraternity members and the second group are non-members.
![Page 100: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/100.jpg)
a.) Ho: 1 = 2 b.) Ho: 1 = 2 c.) Ho: 1 = 2
H1: 1 < 2 H1: 1 > 2 H1: 1 2
Test Statistic:
where:
Sd
nddt
)( 0
)1(
)()( 2
nn
ddnSd i
ei
with v = n –1, n is the number of pairs
![Page 101: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/101.jpg)
Rejection Region:a.) t < -t b.) t > t c.) t < -t/2 and t > t/2
Example. If you wished to measure the effectiveness of a new diet you would weigh the dieters at the start and at the finish of the program
![Page 102: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/102.jpg)
Chapter SixAnalysis of Variance
GOALSWhen you have completed this chapter, you will be able to:
ONE Understand the purpose of Analysis of Variance.TWO
Discuss the general idea of analysis of variance.
THREE
Organize data into a one-way
![Page 103: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/103.jpg)
Chapter Twelve continuedAnalysis of Variance
GOALSWhen you have completed this chapter, you will be able to:
FOURDefine and understand the terms treatments and blocks.FIVEConduct a test of hypothesis among three or more treatment means.
![Page 104: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/104.jpg)
Purpose of ANOVA
• The Analysis of Variance (ANOVA for short) is a technique designed to test whether or not more than two sample means differ significantly from each other.
• In the preceding chapter, we learned that the t-test or the z-test is used to test for the significance of difference between two sample means. ANOVA, therefore, which can be used to test for the equality of several means simultaneously is an extension of the t-test or the z-test which can handle only two means at a time.
![Page 105: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/105.jpg)
Illustration• Suppose for example, three methods of teaching (A,
B, C) are being compared in terms of effectiveness and are being employed to three groups of students. If the researcher uses the t-test, he will have to test separately for the following pairs: A and B, A and C, and B and C. In other words, the researcher would be using the t-test formula three times which would mean spending so much time and effort. Of course there is a possibility that none of the pairs are significantly different and so it is a waste of time and effort on the part of the researcher.
![Page 106: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/106.jpg)
Purpose of ANOVA• Now, if the researcher uses ANOVA, he could take all
the three sample means simultaneously and the test stops if the conclusion arrived at is that of no significant difference. However, if the conclusion arrived at in using ANOVA shows significant difference between the three sample means (A, B, C), then the t-test must be used to find which pair of means differ significantly. Another way is to use the Duncan’s Multiple Range Test (DMRT) which is beyond the scope of this manual.
![Page 107: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/107.jpg)
Why the name analysis of variance? • The phrase analysis of variance means that we will be
analyzing the total variation in a set of observations that can be attributed to specific sources or causes of variation. With reference to the above example, two such specific sources of variations might be (1) actual differences in the three methods that can be shown in the varying performance of the three group of students (treatment), and (2) chance, which in problems like this is usually called experimental error.
![Page 108: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/108.jpg)
The populations have equal standard deviations.
ANOVA requires the following conditions
Underlying assumptions for ANOVA
The sampled populations follow the normal distribution.
The samples are independent
![Page 109: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/109.jpg)
Steps in ANOVA
• The following are the steps and computations involved in the Analysis of Variance technique.
Ho: 1 = 2 = 3 = . . . = k ( for k sample means)
H1 : At least two means differ significantly
Test Statistic: Rejection Region: F > F tabular (, k-1, n-k)where
MSE
MSTrF
1)(
k
SSTrTreatmentMeanSquareMSTr
n
GrandTotal
n
ct
n
ct
n
ct
tesTreatmenSumofSquarSSTr
k
k22
2
22
1
21 )(
...
)(
![Page 110: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/110.jpg)
Computational Formulas
kn
SSEErrorMeanSquareMSE
)(
SSTrTSSesErrorSumofSquarSSE )(
n
GrandTotalxSquaresTotalSumofTSS i
n
i
22
1
)()(
![Page 111: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/111.jpg)
Source of Variation Degrees of Freedom(df)
Sum of Squares(SS)
Mean Square(MS)
FValue
Treatment k-1 SSTr MSTr F = MSTr/ MSE
Error n-k SSE MSE
Total n-1 TSS
Analysis of Variance (ANOVA) Table
![Page 112: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/112.jpg)
ECP Restaurants specialize in meals for families. The owner recently developed a new meat loaf dinner. Before making it a part of the regular menu she decides to test it in several of her restaurants.
Example 1
She would like to know if there is a difference in the mean number of dinners sold per day at Restaurants A, B, and C. Use the .05 significance level.
![Page 113: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/113.jpg)
Number of Dinners Sold by Restaurant
RestaurantDay
A B C
Day 1Day 2Day 3Day 4Day 5
13121412
10121311
1816171717
Example 1 continued
![Page 114: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/114.jpg)
Solution:Ho: mA = mB = mC H1: At least 2 means differ significantly
Level of significance is .05.
Example 2 continued
![Page 115: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/115.jpg)
Rejection Region:The numerator degrees of freedom, k-1, equal 3-1 or 2. The denominator degrees of freedom, n-k, equal 13-3 or 10. The value of F at 2 and 10 degrees of freedom is 4.10. Thus, H0 is rejected if F>4.10
Example 2 continued
Using the data provided, the ANOVA calculations follow.
![Page 116: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/116.jpg)
Computations
= 132 + 122 + . . .+ 172 - (182) 2 / 13= 2634 – 2548 = 86
n
GrandTotalxSquaresTotalSumofTSS i
n
i
22
1
)()(
25.76254825.262413
)182()
5
85
4
46
4
51(
)(...
)(
2222
22
2
22
1
21
n
GrandTotal
n
ct
n
ct
n
ct
tesTreatmenSumofSquarSSTr
k
k
75.925.7686)( SSTrTSSesErrorSumofSquarSSE
![Page 117: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/117.jpg)
ANOVA TableSource of Variation
Sum of Squares
Degrees of
Freedom
Mean Square
F
Treatments 76.25 3-1=2
76.25/2=38.125 38.125
.975= 39.103
Error 9.75 13-3=10
9.75/10=.975
Total 86.00 13-1=12
Example 1 continued
![Page 118: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/118.jpg)
Example 1 continued
The ANOVA tables on the next two slides are from the SPSS and EXCEL systems
The mean number of meals sold at the three locations is not the same. Specifically, as shown in the Duncan’s Multiple Range Test, the meals sold at Restaurant C is significantly higher than A and B, but A and B do not differ significantly.
Since an F of 39.103 > the critical F of 4.10, the decision is to reject the null hypothesis and conclude that
At least two of the treatment means are not the same.
![Page 119: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/119.jpg)
Example 1 continuedANOVAvolsold
Sum of Squares df Mean Square F Sig.Between Groups 76.250 2 38.125 39.103 .000
Within Groups 9.750 10 .975Total 86.000 12
volsoldDuncanRestaurant N Subset for alpha = 0.05
1 22 4 11.50001 4 12.75003 5 17.0000Sig. .094 1.000Means for groups in homogeneous subsets are displayed.
![Page 120: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/120.jpg)
SUMMARY
Groups Count Sum Average Variance
Aynor 4 51 12.75 0.92
Loris 4 46 11.50 1.67
Lander 5 85 17.00 0.50
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 76.25 2 38.13 39.10 2E-05 4.10
Within Groups 9.75 10 0.98
Total 86.00 12
Anova: Single Factor
Example 2 continued
![Page 121: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/121.jpg)
Some common Post Hoc Tests are Duncan’s Multiple Range Test of DMRT and Tukey’s Test.
When I reject the null hypothesis that the
means are equal, I want to know which treatment
means differ.
![Page 122: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/122.jpg)
Chapter SevenLinear Regression and Correlation
GOALSWhen you have completed this chapter, you will be able to:
ONEDraw a scatter diagram.TWO Understand and interpret the terms dependent variable and independent variable.THREECalculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.FOURConduct a test of hypothesis to determine if the population coefficient of correlation is different from zero.
Goals
![Page 123: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/123.jpg)
Chapter Thirteen continuedLinear Regression and Correlation
GOALSWhen you have completed this chapter, you will be able to:
FIVE Calculate the least squares regression line and interpret the slope and intercept values.SIXConstruct and interpret a confidence interval and prediction interval for the dependent variable.SEVENSet up and interpret an ANOVA table.
Goals
![Page 124: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/124.jpg)
Correlation Analysis
The Independent Variable provides the basis for estimation. It is the predictor variable.
Correlation Analysis is a group of statistical techniques to measure the association between two variables.
A Scatter Diagram is a chart that portrays the relationship between two variables.
The Dependent Variable is the variable being predicted or estimated.
Advertising Minutes and $ Sales
0
5
10
15
20
25
30
70 90 110 130 150 170 190
Advertising Minutes
Sale
s ($
thou
sand
s)
![Page 125: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/125.jpg)
The Coefficient of Correlation, r
Negative values indicate an inverse relationship and positive values indicate a direct relationship.
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.
-1 10
P earson's r
Also called Pearson’s r and Pearson’s product moment correlation coefficient.
It requires interval or ratio-scaled data.
It can range from -1.00 to 1.00.
Values of -1.00 or 1.00 indicate perfect and strong correlation.
Values close to 0.0 indicate weak correlation.
![Page 126: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/126.jpg)
Perfect Negative Correlation
0 1 2 3 4 5 6 7 8 9 10
10 9 8 7 6 5 4 3 2 1 0
X
Y
![Page 127: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/127.jpg)
0 1 2 3 4 5 6 7 8 9 10
10 9 8 7 6 5 4 3 2 1 0
X
Y
Perfect Positive Correlation
![Page 128: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/128.jpg)
0 1 2 3 4 5 6 7 8 9 10
10 9 8 7 6 5 4 3 2 1 0
X
Y
Zero Correlation
![Page 129: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/129.jpg)
0 1 2 3 4 5 6 7 8 9 10
10 9 8 7 6 5 4 3 2 1 0
X
Y
Strong Positive Correlation
![Page 130: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/130.jpg)
Formula for r
We calculate the coefficient of correlation from the following formula.
])()(][)()([
))((2222 yynxxn
yxxynr
![Page 131: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/131.jpg)
Coefficient of Determination
It is the square of the coefficient of correlation. It ranges from 0 to 1.It does not give any information on the direction of the relationship between the variables.
The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X).
![Page 132: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/132.jpg)
Example
Suppose an investigator wants to determine the extent to which a relationship exists between size of income and the number of years of education the individual has completed. The following table shows the data of ten individuals. Compute and interpret r.
![Page 133: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/133.jpg)
Individual # Y, Income (P,000) X, Education (years) 1 45 20
2 63 19 3 36 16 4 52 20 5 29 12 6 33 14 7 48 16 8 55 18 9 72 20 10 66 22
![Page 134: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/134.jpg)
SolutionPreliminary computations of the above data: n = 10 x = 177 y = 499 xy= 9,173 x2
= 3,221 y2 = 26,793 Therefore,
])()(][)()([
))((2222 yynxxn
yxxynr
83.])499()26793(10][)177()3221(10[
)499)(177()9173(1022
r
The value of r2 which is (.83) 2 = .6889 means that 68.89% of the total variability in income is being explained by its linear relationship with education.
![Page 135: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/135.jpg)
Testing for the Significance of rThe sample correlation coefficient r is a value computed from a random sample of n pairs of measurements. Different random samples of size n from the same population will generally produce different values of r. Therefore, we need to test for the significance of the computed r value. The null hypothesis will be = 0 against the alternative that 0. The rejection of Ho will lead to a conclusion that the existing relationship is significant at alpha level. However, failure to reject Ho would imply that the relationship is not significant and thus it can be attributed to chance.From the above result, test the null hypothesis that there is no linear relationship between income and education.. Use a 0.05 level of significance.
![Page 136: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/136.jpg)
Solution: Ho: = 0H1 : 0 (two-tailed)
Level of Significance: 0.05 / 2 = 0.025Rejection Region: t > 2.306 and t < -2.306
Computation of the Test Statistic:
Decision: Since the value of the test statistic (t = 4.26) is greater
than the tabular value of 2.306, therefore reject the null hypothesis of no linear relationship and conclude that there is a significant relationship between education and income. Specifically, the higher the educational attainment, the higher the income earned
26.4)83(.1
)8(83.
1
)2(22
r
nrt
![Page 137: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/137.jpg)
Regression Analysis
The least squares criterion is used to determine the equation. That is the term S(Y – Y’)2 is minimized.
In Regression Analysis we use the independent variable (X) to estimate the dependent variable (Y).
The relationship between the
variables is linear.
Both variables must be at least
interval scale.
![Page 138: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/138.jpg)
The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.
For each value of X, there is a group of Y values, and these Y values are normally distributed.
Assumptions Underlying Linear Regression
The means of these normal distributions of Y values all lie on the straight line of regression.
The standard deviations of these normal distributions are the same.
![Page 139: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/139.jpg)
Regression Analysis
The regression equation is Y(hat)= a + bX (1)
where Y(hat) is the predicted value of Y for any X.
a is the Y-intercept. It is the estimated Y value when X=0
b is the slope of the line, or the average change in Y’ for each change of one unit in X
The least squares principle is used to obtain a and b.
![Page 140: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/140.jpg)
Deterministic ModelModel (1) is called a deterministic model. It
gives an exact relationship between x and y. This model expresses that y is determined exactly by x and for a given value of x there is one and only one value of y.
![Page 141: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/141.jpg)
Probabilistic ModelHowever, in many cases the relationship between variables is
not exact. For instance, if y is food expenditure and x is income, then model (1) would state that food expenditure is determined by income only and that all households with the same income will spend the same amount of food. But food expenditure is determined by many other variables such as household size, taste and preference which explains why different households with the same income spend different amounts of money for food. Hence, to take these variables into considerations and to make our model complete, we add another term to the right side of model(1) called the random error term ( ).
• The complete regression model is written asy = A + Bx + (2)
![Page 142: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/142.jpg)
The regression model (2) is called a probabilistic model or a statistical relationship.
The random error term () is included in the model to represent the following two phenomena.
• Missing or omitted variables. As mentioned earlier, food expenditure is affected by many variables other than income. The random error term is included to capture the effect of all those missing variables that have not been included in the model.
• Random variation. Human behavior is unpredictable. For example, a household may have many parties during the month and may spend more than usual on food during that month. This variation in food expenditure may be called random variation.
![Page 143: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/143.jpg)
Regression Analysis
The least squares principle is used to obtain a and b. The equations to determine a and b are:
22 )()(
))((
xxn
yxxynb
![Page 144: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/144.jpg)
• An ExampleWe have already established a significant linear relationshipbetween income and education (i.e. number of years ineducation). We may now formulate an equation allowing us topredict the income of a person given the number of years heattended for his education. Referring to the above example,
wefind thatn = 10 Xi = 177 Yi = 499 Xi2 = 3221 Yi2 = 2679 XiYi =
9173 Y(bar) = 49.9 X(bar) = 17.7
![Page 145: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/145.jpg)
Therefore,
And
The model for prediction purposes is given by
87.3)177()3221(10
)499)(177()9173(10
)()(
))((222
xxn
yxxynb
59.18)7.17(87.39.49 xbya
xy 87.358.18
![Page 146: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/146.jpg)
The equation can be interpreted as per additional year in
educational attainment, there corresponds 3.87 or 3,870 pesos
increase in the income of a person.
![Page 147: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/147.jpg)
We can use the regression equation to estimate values of Y.
The estimated income of a person who have spent 16 years of education will be:
34.43)16(87.358.18 Income
![Page 148: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/148.jpg)
Standard Error of Estimate (SEE) – a measure of the variability of the regression line, i.e. the
dispersion around the regression line – it tells how much variation there is in the dependent
variable between the raw value and the expected value in the regression
– this SEE allows us to generate the confidence interval on the regression line as we did in the estimation of means
![Page 149: Goals in Statistic](https://reader037.fdocuments.in/reader037/viewer/2022102805/552faa29550346d52b8b464c/html5/thumbnails/149.jpg)
ExerciseIt is generally known that the number of road accidents is
inversely proportional with road width. The following data show the results of a study indicating the number of accidents occurring annually at roads with different widths:
Road width (in feet) (X)75 52 60 33 22 40 70 35 55 80Number of accidents (Y)40 84 55 92 90 86 38 88 78 32
– Draw the scatter diagram.– Find the correlation coefficient and test for its significance.– Establish a regression equation of the form Y = a + bX– Predict the number of accidents for a 50 ft road width.