Sampling Probability and Inference - SAGE Pub Probability and Inference - SAGE Pub ... and ...
Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)
![Page 1: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/1.jpg)
Probability & Statistical Inference Lecture 8
MSc in Computing (Data Analytics)
![Page 2: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/2.jpg)
Introduction In the previous lecture we were concerned
with the analysis of data where we compared the sample means.
Frequently data contains more that two samples, they may compare several treatments.
In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.
![Page 3: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/3.jpg)
Total Sum of SquaresData set:
14, 12, 10, 6 ,4, 2Group A:
6 ,4, 2Group B:
14, 12, 10Overall Mean : 8Total Sum of Squares:
SST= (14-8)2 + (12-8)2 +
(10-8)2 + (6-8)2 + (4-8)2
+ (2-8)2 =112
![Page 4: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/4.jpg)
Between Group Variation Sum of Squares of
the Model:SSm= na(µ - µa)2 + nb(µ
- µb)2
=3*(8-4)2 + 3*(8-12)2
=96
![Page 5: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/5.jpg)
Within Group Variation Sum of Squares of the
Error:
SSe=
= (14-12)2 + (12-12)2 + (10-12)2 + (8-6)2 + (6-6)2 + (6-4)2 +
= 16
2
1 1
__
)(
k
i
n
jjij xx
![Page 6: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/6.jpg)
Structure of the Data
Group Observation Total Mean
1 x11 x12 .......... x1n x1
2 x21 x22.......... x2n x2
.
.
...........
a xa1 xa2 .......... xan xa
Total
1x
2x
ax
x
![Page 7: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/7.jpg)
ANOVA Table
Source Degrees of Freedom
Sum Of Squares Mean Square
F- Stat
Model a - 1SSM /(a-1)
MSM / MSE
Error n-aSSE /(n-a)
Total n-1SST /(n-1)
2
1
)( xxn
ii
a
j
jj xxn1
2)(
2
1 1
__
)(
a
i
n
jjij xx
Where : n is the sample size and a is the number of groups
![Page 8: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/8.jpg)
ANOVA Table – Original Example
Source Degrees of
Freedom
Sum Of Squares Mean Square
F- Stat
Model 2 - 1 = 1 96 96 24
Error 6 – 2 = 4 16 4
Total 6 – 1 = 5 112
Where : n is the sample size and k is the number of groups
![Page 9: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/9.jpg)
Model Assumptions Independence of observations within and
between samples normality of sampling distribution equal variance - This is also called the
homoscedasticity assumption
![Page 10: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/10.jpg)
The ANOVA Equation We can describe the observations in the
above table usint the following equation:
nj
aiY ijiij ,......,2,1
,......,2,1
Where : n is the sample size and k is the number of groups
![Page 11: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/11.jpg)
ANOVA Hypotheses
We wish to test the hypotheses:
The analysis of variance partitions the total variability into two parts.
![Page 12: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/12.jpg)
Example
![Page 13: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/13.jpg)
Graphical Display of Data
Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment
![Page 14: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/14.jpg)
Example We can use ANOVA to test the hypotheses
that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are:
The ANOVA table is below:
![Page 15: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/15.jpg)
Example The p-value is less than 0.05 therefore the H0
can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.
![Page 16: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/16.jpg)
Demo
![Page 17: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/17.jpg)
Confidence Interval about the mean
For 20% hardwood, the resulting confidence interval on the mean is
![Page 18: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/18.jpg)
Confidence Interval about on the difference of two treatments
For the hardwood concentration example,
![Page 19: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/19.jpg)
An Unbalanced Experiment
![Page 20: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/20.jpg)
Multiple Comparisons Following the ANOVA The least significant difference (LSD) is
If the sample sizes are different in each treatment:
![Page 21: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/21.jpg)
Example: Multi-comparison Test
![Page 22: Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d2c5503460f94a02ff8/html5/thumbnails/22.jpg)
Example: Multi-comparison Test