Inferential Statistics
description
Transcript of Inferential Statistics
![Page 1: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/1.jpg)
Inferential Statistics
Inferential statistics: The part of statistics that allows researchers to generalize their findings beyond data collected.
Statistical inference: a procedure for making inferences or generalizations about a larger population from a sample of that population
Research is about trying to make valid inferences
![Page 2: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/2.jpg)
How Statistical Inference Works
![Page 3: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/3.jpg)
Basic Terminology Population (statistical population):
Any collection of entities that have at least one characteristic in commonA collection (a aggregate) of measurement about which an inference is desiredEEverything you wish toverything you wish to studystudy
Parameter: The numbers that describe characteristics of scores in the population (mean, variance, standard deviation, correlation coefficient etc.)
![Page 4: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/4.jpg)
N = 28 N = 28 μμ = 44 = 44 σσ² = 1.214² = 1.214
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)Population
![Page 5: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/5.jpg)
Basic Terminology Sample:
A part of the populationA finite number of measurements chosen from a population
Statistics: The numbers that describe characteristics of scores in the sample (mean, variance, standard deviation, correlation coefficient, reliability coefficient, etc.)
![Page 6: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/6.jpg)
nn = = 11 value value … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
X1: 43
X: student body weight
![Page 7: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/7.jpg)
nn = = 22 values values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
x1: 43 x2: 44
X: student body weight
![Page 8: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/8.jpg)
nn = = 33 values values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
x1: 43 x2: 44 x3: 45
X: student body weight
![Page 9: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/9.jpg)
nn = = 44 values values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
x1: 43 x2: 44 x3: 45 x4: 44
x: student body weight
![Page 10: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/10.jpg)
5 values5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
x1: 43 x2: 44 x3: 45 x4: 44x5: 44
a sample that has been selected in such a way that all members of the population have an
equal chance of being picked (A Simple Random Sample )
![Page 11: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/11.jpg)
Basic concept of statistics Measures of central Measures of central tendency
Measures of dispersion & variability
![Page 12: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/12.jpg)
Measures of tendency centralMeasures of tendency centralArithmetic mean (= simple average)
summationmeasurement in population
index of measurement
• Best estimate of population mean is the sample mean, X
n
XX
n
ii
1sample size
![Page 13: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/13.jpg)
Measures of variabilityMeasures of variabilityAll describe how “spread out” the dataAll describe how “spread out” the data
1. Sum of squares,sum of squared deviations from the mean
• For a sample,
2)( XXSS i
![Page 14: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/14.jpg)
2.2. Average or mean sum of Average or mean sum of squares = variance, squares = variance, ss22::
• For a sample,
12
2
n
XXs i )(
Why?
![Page 15: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/15.jpg)
nn – 1 represents the – 1 represents the degrees of degrees of freedomfreedom, , , or number of independent , or number of independent quantities in the estimate quantities in the estimate ss22..
12
2
n
XXs i )(
• therefore, once n – 1 of all deviations are specified, the last deviation is already determined.
01
n
ii XX )(Greek
letter “nu”
![Page 16: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/16.jpg)
3.3. Standard deviation, Standard deviation, ss
• For a sample, 12
n
XXs i )(
• Variance has squared measurement units – to regain original units, take the square root
![Page 17: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/17.jpg)
4.4. Standard error of the meanStandard error of the mean
• For a sample,nssX
2
Standard error of the mean is a measure of variability among the means of
repeated samples from a population.
![Page 18: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/18.jpg)
Basic Statistical Symbols
![Page 19: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/19.jpg)
N = 28 N = 28 μμ = 44 = 44 σσ² = 1.214² = 1.214
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)Population
![Page 20: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/20.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
43
![Page 21: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/21.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
43 44
![Page 22: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/22.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
43 44 45
![Page 23: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/23.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
43 44 45 44
![Page 24: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/24.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
43 44 45 44 44
![Page 25: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/25.jpg)
repeated random samplingrepeated random sampling, each with sample size, , each with sample size, nn = 5 values = 5 values
……
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45
44X
A Population of ValuesBody Weight Data (Kg)
![Page 26: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/26.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
46
![Page 27: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/27.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
46 44
![Page 28: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/28.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
46 44 46
![Page 29: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/29.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
46 44 46 45
![Page 30: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/30.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
46 44 46 45 44
![Page 31: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/31.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45
45X
A Population of ValuesBody Weight Data (Kg)
![Page 32: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/32.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
42
![Page 33: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/33.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
42 42
![Page 34: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/34.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
42 42 43
![Page 35: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/35.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
42 42 43 45
![Page 36: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/36.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45A Population of Values
Body Weight Data (Kg)
42 42 43 45 43
![Page 37: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/37.jpg)
Repeated random samples, Repeated random samples, each with sample size, each with sample size, nn = 5 values = 5 values … …
4444
4444
4444
44
4444
44
4545
4442
43 43
4343
43
4343
46
46 46
46
42
44
45
43X
A Population of ValuesBody Weight Data (Kg)
![Page 38: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/38.jpg)
SummarySample Samplin
g 1Samplin
g 2Samplin
g 2First 43 (-1) 46 (+1) 42 (-1)Second 44 (+0) 44 (-1) 42 (-1)Third 45 (+1) 46 (+1) 43 (+0)Fourth 44 (+0) 45 (+0) 45 (+2)Fifth 44 (+0) 44 (-1) 43 (+0)
Average 44 45 43Sum of square
2 4 6
Mean square
0.50 1.00 1.50
Standard deviation
0.707 1.00 1.225
![Page 39: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/39.jpg)
For a large enough number of large For a large enough number of large samples, the frequency distribution samples, the frequency distribution of the sample means (= sampling of the sample means (= sampling
distribution), approaches a normal distribution), approaches a normal distribution.distribution.
![Page 40: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/40.jpg)
Sample mean
Freq
uenc
y
Normal distribution: bell-shaped curveNormal distribution: bell-shaped curve
![Page 41: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/41.jpg)
Testing statistical hypothesesTesting statistical hypotheses between 2 means between 2 means
1.1. State the research question in State the research question in terms of statistical hypotheses.terms of statistical hypotheses.It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0.
H0: Mean heightof female student is equal to mean height of male student
![Page 42: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/42.jpg)
Then we formulate a statement that Then we formulate a statement that must be true if the null hypothesis must be true if the null hypothesis is false, called the is false, called the alternate alternate hypothesishypothesis = = HHAA . .HA: Mean height of female student
is not equal to mean height of male student
If we reject H0 as a result of sample evidence, then we conclude that HA
is true.
![Page 43: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/43.jpg)
2. Choose an appropriate statistical test that would allow you to reject H0 if H0 were false. E.g., Student’s E.g., Student’s tt test for hypotheses test for hypotheses about meansabout means
William Sealey Gosset
(“Student”)
![Page 44: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/44.jpg)
21
21
XXsXXt
Standard error of the difference
between the sample means
To estimate s(X1 - X2), we must first
know
the relation between both populations.
Mean of sample 2
Mean of sample 1
t Statistic,
![Page 45: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/45.jpg)
How to evaluate the success of this experimental design class
Compare the score of statistics and experimental design of several student
Compare the score of experimental design of several student from two serial classes
Compare the score of experimental design of several student from two different
classes
![Page 46: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/46.jpg)
1. Comparing the score of statistics and experimental experimental design of several student
Similar Student
Dependent
populations
Identical Variance
Different Student
Independent
populations
Identical Variance
Not Identical Variance
![Page 47: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/47.jpg)
Different Student
Independent
populations
Identical Variance
Not Identical Variance
2. Comparing the score of experimental design of several student from two serial classes
![Page 48: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/48.jpg)
3. Comparing the score of experimental design of several student from two classes
Different Student
Independent
populations
Identical Variance
Not Identical Variance
![Page 49: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/49.jpg)
Relation between populationsRelation between populations Dependent populations Independent populations
1. Identical (homogenous ) variance
2. Not identical (heterogeneous) variance
![Page 50: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/50.jpg)
Sample Null hypothesis: The mean difference is equal to o
Dependent Populations
Test statistic Null distributiont with n-1 df
*n is the number of pairscompare
How unusual is this test statistic?
P < 0.05 P > 0.05
Reject Ho Fail to reject Ho
t d do
SEd
![Page 51: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/51.jpg)
Pooled variance:Pooled variance:21
222
2112
sssp
Then,
2
2
1
2
21 ns
ns
s ppXX
Independent Population with homogenous variances
![Page 52: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/52.jpg)
t Y 1 Y 2SE
Y 1 Y 2
SEY 1 Y 2 sp
2 1n1
1n2
21
222
2112
dfdfsdfsdfsp
Independent Population with homogenous variances
![Page 53: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/53.jpg)
When sample sizes are small, the sampling distribution is described better by the t distribution than by
the standard normal (Z) distribution.
Shape of t distribution depends on degrees of freedom, = n – 1.
![Page 54: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/54.jpg)
Z = t(=)t(=25)
t(=1)t(=5)
t
![Page 55: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/55.jpg)
t
Area of Rejection
Area of Acceptance
Area of Rejection
Lower critical value
Upper critical value
0
0.95 0.0250.025For = 0.05
The distribution of a test statistic is divided into The distribution of a test statistic is divided into an area of acceptance and an area of rejection.an area of acceptance and an area of rejection.
![Page 56: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/56.jpg)
Critical t for a test about equality = t(2),
![Page 57: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/57.jpg)
t Y 1 Y 2s12
n1
s22
n2
df
s12
n1
s22
n2
2
s12 n1 2n1 1
s22 n2 2n2 1
Independent Population with heterogenous variances
![Page 58: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/58.jpg)
Analysis of VarianceAnalysis of Variance(ANOVA)(ANOVA)
![Page 59: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/59.jpg)
Independent T-testIndependent T-test Compares the means of one variable for TWO
groups of cases. Statistical formula:
Meaning: compare ‘standardized’ mean difference But this is limited to two groups. What if groups
> 2?• Pair wised T Test (previous example)• ANOVA (Analysis of Variance)
21
21
21
XXXX S
XXt
![Page 60: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/60.jpg)
From T Test to ANOVAFrom T Test to ANOVA11. Pairwise T-Test
If you compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would be: 1 - (0.95 x 0.95 x 0.95) = 14.3% Multiple T-Tests will increase the false alarm.
![Page 61: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/61.jpg)
2. 2. Analysis of Variance In T-Test, mean difference is used.
Similar, in ANOVA test comparing the observed variance among means is used.
The logic behind ANOVA:• If groups are from the same population,
variance among means will be small (Note that the means from the groups are not exactly the same.)
• If groups are from different population, variance among means will be large.
From T Test to ANOVAFrom T Test to ANOVA
![Page 62: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/62.jpg)
What is ANOVA?What is ANOVA? Analysis of Variance A procedure designed to determine if the
manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable.
Assumption:Each independent variable is categorical
(nominal scale). Independent variables are called Factors and their values are called levels.
The dependent variable is numerical (ratio scale)
![Page 63: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/63.jpg)
What is ANOVA?What is ANOVA?The basic idea of Anova:
The “variance” of the dependent variable given the influence of one or
more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is
significantly greater than the “variance” of the dependent variable
(assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.
![Page 64: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/64.jpg)
Pair-t-TestAmir 6 Budi 9Abas 8 Berta 4Abi 10 Bambang 7Aura 6 Banu 5Ana 10 Betty 5
Average 8 6n 5 5Var. sample 4 4
Pooled Var. = 4 tcalctcalc =1.581t-tablet-table 2.306
![Page 65: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/65.jpg)
ANOVA TABLE OF 2 POPULATIONS
S V SS DF Mean square (M.S.)
Between populations
Within populations
SSbetween
1 MSBSSBDFB
SSWithin
(n1-1)+ (n2-1)
SSWDFW
= MSW
=
TOTAL SSTotal n1 + n2 -1
S²
![Page 66: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/66.jpg)
ANOVA TABLE OF 2 POPULATIONS
S V SS DF Mean square (M.S.)
Between populations
Within populations
10 1 10
32 8 4
TOTAL 42 9
Fcalc = 2.50
Ftable = 5.318
![Page 67: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/67.jpg)
Rationale for ANOVARationale for ANOVA• We can break the total variance in a study We can break the total variance in a study
into meaningful pieces that correspond to into meaningful pieces that correspond to treatment effects and error. That’s why treatment effects and error. That’s why we call this Analysis of Variance.we call this Analysis of Variance.
GXThe Grand Mean, taken over all observations.
AX
1AX
The mean of any group.
The mean of a specific group (1 in this case).
iXThe observation or raw data for the ith subject.
![Page 68: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/68.jpg)
The ANOVA ModelThe ANOVA Model)()( AiGAGi XXXXXX
Trial i The grand mean
A treatment
effect
Error
SS Total = SS Treatment + SS Error
![Page 69: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/69.jpg)
Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies.
Use the sample results to test the following hypotheses.
H0: 1=2=3=. . . = kHa: Not all population means are equal
If H0 is rejected, we cannot conclude that all population means are different.
Rejecting H0 means that at least two population means have different values.
Analysis of VarianceAnalysis of Variance
![Page 70: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/70.jpg)
Assumptions for Analysis of VarianceAssumptions for Analysis of Variance
For each population, the response variable is normally distributed.
The variance of the response variable, denoted 2, is the same for all of the populations.
The effect of independent variable is additive
The observations must be independent.
![Page 71: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/71.jpg)
Analysis of Variance:
Between-Treatments Estimate of Population Variance
Within-Treatments Estimate of Population Variance
Comparing the Variance Estimates: The F Test
ANOVA Table
Testing for the Equality of t Population Means
![Page 72: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/72.jpg)
A between-treatments estimate of σ2 is called the mean square due to treatments (MSTR).
The numerator of MSTR is called the sum of squares due to treatments (SSTR).
The denominator of MSTR represents the degrees of freedom associated with SSTR.
Between-Treatments Estimate Between-Treatments Estimate of Population Varianceof Population Variance
2
1( )
MSTR 1
k
j jj
n x x
k
![Page 73: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/73.jpg)
The estimate of 2 based on the variation of the sample observations within each treatment is called the mean square due to error (MSE).
The numerator of MSE is called the sum of squares due to error (SSE).
The denominator of MSE represents the degrees of freedom associated with SSE.
Within-Treatments Estimate Within-Treatments Estimate of Population Varianceof Population Variance
2
1( 1)
MSE
k
j jj
T
n s
n k
![Page 74: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/74.jpg)
Comparing the Variance Estimates: Comparing the Variance Estimates: The The F F Test Test
If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to nT - k.
If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates σamong2
Hence, we will reject H0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution.
![Page 75: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/75.jpg)
Test for the Equality of Test for the Equality of kk Population Population MeansMeans
Hypotheses H0: 1=2=3=. . . = k
Ha: Not all population means are equal
Test StatisticF = MSTR/MSE
![Page 76: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/76.jpg)
Test for the Equality of Test for the Equality of kk Population Population MeansMeans
Rejection Rule Using test statistic: Reject H0 if F > Fa
Using p-value: Reject H0 if p-value < a
where the value of Fa is based on an F distribution with t - 1 numerator degrees of freedom and nT - t denominator degrees of freedom
![Page 77: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/77.jpg)
The figure below shows the rejection region associated with a level of significance equal to where F denotes the critical value.
Sampling Distribution of MSTR/MSESampling Distribution of MSTR/MSE
Do Not Reject H0 Reject H0
MSTR/MSE
Critical ValueF
![Page 78: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/78.jpg)
ANOVA TableANOVA TableSource of Sum of Degrees of MeanSource of Sum of Degrees of MeanVariation Squares Freedom Squares FVariation Squares Freedom Squares FTreatmentTreatment SSTRSSTR kk- 1- 1 MSTR MSTR/MSEMSTR MSTR/MSEErrorError SSESSE nnT T - - kMSEMSETotalTotal SSTSST nnTT - 1 - 1
SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.
k
j
n
iij
j
xx1 1
2 SSESSTR)(SST
![Page 79: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/79.jpg)
What does Anova tell us?What does Anova tell us?
ANOVA will tell us whether we have sufficient evidence to say
that measurements from at least one treatment differ significantly
from at least one other.It will not tell us which ones
differ, or how many differ.
![Page 80: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/80.jpg)
ANOVA vs t-testANOVA vs t-test ANOVA is like a t-test among multiple
data sets simultaneously• t-tests can only be done between two data
sets, or between one set and a “true” value
ANOVA uses the F distribution instead of the t-distribution
ANOVA assumes that all of the data sets have equal variances• Use caution on close decisions if they
don’t
![Page 81: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/81.jpg)
ANOVA – a Hypothesis TestANOVA – a Hypothesis Test H0:
There is no significant difference among the results provided by treatments.
Ha: At least one of the treatments provides results significantly different from at least one other.
![Page 82: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/82.jpg)
Yij = + j + ij
By definition, j = 0
t
j=1
The experiment produces
(r x t) Yij data values.
The analysis produces estimates of t
(We can then get estimates of the ij by subtraction).
Linear Model
![Page 83: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/83.jpg)
Y11 Y12 Y13 Y14 Y15 Y16 … Y1t
Y21 Y22 Y23 Y24 Y25 Y26 … Y2t
Y31 Y32 Y33 Y34 Y35 Y36 … Y3t
Y41 Y42 Y43 Y44 Y45 Y46 … Y4t
. . . . . . … .
. . . . . . … .
. . . . . . … .Yr1 Yr2 Yr3 Yr4 Yr5 Yr6 … Yrt_________________________________________________________________________________ __ __ __ __ __ __Y.1 Y.2 Y.3 Y.4 Y.5 Y.6 … Y.t
1 2 3 4 5 6 … t
Y•1, Y•2, …, are Column Means_ _
![Page 84: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/84.jpg)
Y• • = Y• j /t = “GRAND MEAN”
(assuming same # data points in each column)
(otherwise, Y• • = mean of all the data)
j=1
t
![Page 85: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/85.jpg)
MODEL: Yij = + j + ij
Y• • estimates
Y • j - Y • • estimatesj (= j – ) (for all j)
These estimates are based on Gauss’ (1796)
PRINCIPLE OF LEAST SQUARES
and on COMMON SENSE
![Page 86: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/86.jpg)
MODEL: Yij = + j + ij
If you insert the estimates into the MODEL, (1) Yij = Y • • + (Y•j - Y • • ) + ij.
it follows that our estimate of ij is
(2) ij = Yij - Y•j
<
<
![Page 87: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/87.jpg)
Then, Yij = Y• • + (Y• j - Y• • ) + ( Yij - Y• j)
or, (Yij - Y• • ) = (Y•j - Y• •) + (Yij - Y•j ) { { {(3)
TOTAL
VARIABILITY
in Y
=Variability
in Y
associated
with X
Variability
in Y
associated
with all other
factors
+
![Page 88: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/88.jpg)
If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms
which “cancel”]
(Yij - Y• • )2 = R • (Y•j - Y• •)
2 + (Yij - Y•j)2t r
j=1 i=1 { { {j=1
t t r
j=1 i=1
TSSTOTAL SUM OF
SQUARES
=
=
SSBC SUM OF
SQUARES BETWEEN COLUMNS
+
+
SSW (SSE)SUM OF SQUARES WITHIN COLUMNS( ( (
( ((
![Page 89: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/89.jpg)
ANOVA TABLES V SS DF Mean
square (M.S.)
Among treatment (among columns)
Within Columns (due to error)
SSAc t - 1 MSACSSACt- 1
SSWc (r - 1) •t
SSWc(r-1)•t = MSW
=
TOTAL TSS tr -1
![Page 90: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/90.jpg)
Hypothesis,HO: 1 = 2 = • • • c = 0
HI: not all j = 0
Or
HO: 1 = 2 = • • • • c
HI: not all j are EQUAL
(All column means are equal)
![Page 91: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/91.jpg)
The probability Law of MSB
C MSWc
= “Fcalc” , is
The F - distribution with (t-1, (r-1)t)degrees of freedom
Assuming HO true.
Table Value
![Page 92: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/92.jpg)
Example: Reed ManufacturingExample: Reed ManufacturingFaculty of Agriculture, GMU would like
to know if the teaching quality of xperimental design is similar among
classes .
A simple random sample of 5 student from 3 classes was taken and the grade
of experimental design was collected
![Page 93: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/93.jpg)
Sample DataSample Data
ObservationObservation Advance Broadway Advance Broadway CindyCindy 11 0 066 0909 0404 22 0808 0404 1010 33 1 100 0707 1010 44 0606 0505 0505 55 1010 0505 0606
Sample MeanSample Mean 08 08 0606 0077 Sample VarianceSample Variance 04 04 0404
08 08
Example: Example: Grade of experimental designGrade of experimental design
![Page 94: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/94.jpg)
HypothesesHypotheses
HH00:: 11==22==33
HHaa: Not all the means are equal: Not all the means are equalwhere:where: 1 1 = = Advance class Advance class 2 2 = = Broadway classBroadway class3 3 = = Cindy class Cindy class
Example: Example: Experimental DesignExperimental Design
![Page 95: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/95.jpg)
Mean Square Due to TreatmentsMean Square Due to Treatments Since the sample sizes are all equalSince the sample sizes are all equal
μμ= (= (88 + + 66 + 7)/3 = + 7)/3 = 77 SSTR = 5(SSTR = 5(88 - - 77))22 + 5( + 5(66 - - 77))22 + 5(7 - + 5(7 - 77))22 = =
1010 MSTR = MSTR = 110/(3 - 1) = 0/(3 - 1) = 55
Mean Square Due to ErrorMean Square Due to ErrorSSE = 4(SSE = 4(44) + 4() + 4(44) + 4() + 4(88) = ) = 6464MSE = MSE = 6464/(15 - 3) =/(15 - 3) = 5.33 5.33
=
Example: Example: Experimental DesignExperimental Design
![Page 96: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/96.jpg)
FF - Test - TestIf If HH00 is true, the ratio MSTR/MSE is true, the ratio MSTR/MSE should be should be near 1 because both MSTR and MSE are near 1 because both MSTR and MSE are estimatingestimating 22. . If If HHaa is true, the ratio should be is true, the ratio should be significantly larger than 1 because significantly larger than 1 because MSTR tends to overestimateMSTR tends to overestimate 22..
Example: Example: Experimental DesignExperimental Design
![Page 97: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/97.jpg)
Example: Example: Experimental DesignExperimental Design
Rejection RuleRejection RuleUsing test statistic: Reject Using test statistic: Reject HH00 if if FF > 3.89 > 3.89Using Using pp-value-value : Reject : Reject HH00 if if pp-value -value < .05< .05
where where FF.05.05 = 3.89 is based on an = 3.89 is based on an FF distribution with 2 numerator degrees of distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom and 12 denominator degrees of freedomfreedom
![Page 98: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/98.jpg)
Example: Example: Experimental DesignExperimental Design
Test StatisticTest StatisticFF = MSTR/MSE = = MSTR/MSE = 5.005.00//55..3333 = = 0.9380.938
ConclusionConclusionFF = =0.9380.938 << FF.05.05 = 3.89, so we = 3.89, so we accept accept HH00. . There is no significant different quality There is no significant different quality among experimental design classesamong experimental design classes
![Page 99: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/99.jpg)
ANOVA TableANOVA Table
Source of Source of Sum of Degrees of Sum of Degrees of MeanMean Variation Variation Squares Freedom Squares Freedom Square FSquare Fcalc.calc. Among classesAmong classes 10 10 2 2 5.005.00 0.938 0.938 Within classesWithin classes 6464 12 12 5.335.33 Total Total 7744 1414
Example: Example: Experimental DesignExperimental Design
![Page 100: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/100.jpg)
Step 1Step 1 Select the Select the ToolsTools pull-down menu pull-down menu Step 2Step 2 Choose the Choose the Data AnalysisData Analysis option option Step 3Step 3 Choose Choose Anova: Single FactorAnova: Single Factor
from the list of Analysis Toolsfrom the list of Analysis Tools
Using Excel’s Anova: Using Excel’s Anova: Single Factor Tool Single Factor Tool
![Page 101: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/101.jpg)
Step 4Step 4 When the Anova: Single Factor dialog box When the Anova: Single Factor dialog box appears:appears: Enter B1:D6 in the Enter B1:D6 in the Input RangeInput Range box box Select Grouped By Select Grouped By ColumnsColumns Select Select Labels in First RowLabels in First Row Enter .05 in the Enter .05 in the AlphaAlpha box box Select Select Output RangeOutput Range Enter A8 (your choice) in the Enter A8 (your choice) in the Output RangeOutput Range boxbox Click Click OKOK
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool
![Page 102: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/102.jpg)
Value Worksheet (top portion)Value Worksheet (top portion)1 Observation Advance Broadway Cindy2 1 6 9 4 3 2 8 4 104 3 10 7 105 4 6 5 5 6 5 10 5 6 7
Using Excel’s Anova:Using Excel’s Anova: Single Factor Tool Single Factor Tool
![Page 103: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/103.jpg)
Value Worksheet (bottom portion)Value Worksheet (bottom portion)
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool
10 SUMMARY11 Groups Count Sum Average Variance12 Advance 5 40 8 413 Broadway 5 30 6 414 Cindy 5 35 7 8151617 ANOVA18 Source of Variation SS df MS F P-value F crit19 Among Groups 10 2 5,000 0,9375 0,00331 3,8852920 Within Groups 64 12 5,3332122 Total 74 1423 24
![Page 104: Inferential Statistics](https://reader036.fdocuments.in/reader036/viewer/2022062502/568149f4550346895db725b9/html5/thumbnails/104.jpg)
Using the Using the pp-Value-ValueThe value worksheet shows that the The value worksheet shows that the pp--
value is .00331value is .00331The rejection rule is “The rejection rule is “Reject Reject HH00 if if pp--
value < .05”value < .05”Thus, we reject Thus, we reject HH00 because the because the pp-value -value
= .00331 <= .00331 < = .05= .05We conclude that the We conclude that the quality of among quality of among
experimental design classes is similarexperimental design classes is similar
Using Excel’s Anova: Using Excel’s Anova: Single Factor ToolSingle Factor Tool