Analyzing OS Fingerprints using Neural Networks and Statistical Machinery
Analyzing Statistical Results
Transcript of Analyzing Statistical Results
![Page 1: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/1.jpg)
Interpreting Analysis Results
1
![Page 2: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/2.jpg)
2
Chi-square
Gamma
Z and t Statistics
Correlation coefficient and the R2 value
Significance and the p-value
1-sided and 2-sided tests
Our Test Statistics
![Page 3: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/3.jpg)
Variable Types• Nominal, Ordinal, and Interval
Nominal Example: Type of Anesthetic (e.g. Local, General, and Regional)
Ordinal Example: Hospital Trauma Levels (e.g. Level-1, Level-2, Level-3)
Interval Example: Drug Dosage (e.g. mg of substance)
• There are multiple kinds of tests for each type of variable depending on assumptions being made and what is being tested for (e.g. means, variances, association, etc.)
Our focus is on ordinal variablesand testing for associations
3
![Page 4: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/4.jpg)
Does not measure strength of an association, but, rather, it indicates how much evidence there is that the variables are or are not independent
The c2 statistic ranges from 0 to ∞ (always ≥ 0)
The larger the c2 value, the more confident we can be that the variables are not independence
c2 Test of Independence
Example: Assume we have two ordinal variables (Hospital Size and the Relative Cost of the Laryngoscope being used):
H0: There is no association between variables (c2 value is near zero)
H1: There is an association between variables
4
![Page 5: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/5.jpg)
Sample 1
Statistical Independence
5
![Page 6: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/6.jpg)
The c2 Test Statistic
Remember: This number does not reflect the strength of an association. It only provides evidence that an association actually exists.
6
(45%) (40%) (15%)
7753
149
4769
132
18
4926
“Observed Value”
“Expected Value”
Sample 2
![Page 7: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/7.jpg)
The c2 Distribution Okay…so what does c2 = 83 tell us?
With 4 degrees of freedom (d.f.) and an a = 0.10, our benchmark (critical value) for dependence is 7.78
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 5 10 15 20 25 30 35 40
d.f. = 4
7.78
Area = a = 0.10
7
![Page 8: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/8.jpg)
The c2 Distribution
8
H0: There is no association between variables (c2 value is near zero)
H1: There is an association between variables
We reject H0 if our c2 value is > our “benchmark” value of 7.78
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0 5 10 15 20 25 30 35 40
d.f. = 4
7.78
Area = a = 0.10
![Page 9: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/9.jpg)
-0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0 10 20 30 40
d.f. = 5 d.f. = 10 d.f. = 20
Increasing d.f.
Degrees of Freedom = (# of rows – 1) x (# of columns – 1)
Increasing d.f.
3 x 3
5 x 5
9
The c2 Distribution
![Page 10: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/10.jpg)
So, how do we determine the strength and direction of an established association
between variables?
10
![Page 11: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/11.jpg)
Concordant
(x1, y2)
(x2, y3)
X
Y
11
![Page 12: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/12.jpg)
Discordant
(x1, y2)
(x2, y1)
X
Y
12
![Page 13: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/13.jpg)
The Gamma Value (g) Appropriate for ordinal-by-ordinal data
Generally more powerful test statistic than the c2 test statistic (i.e. we are able to detect significant differences with greater ease)
Because ordinal variables can be ordered, we can talk about the direction of association between them (i.e. positive or negative)
13
![Page 14: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/14.jpg)
C = Concordant Pairs
D = Discordant Pairs
lies between -1 and 1, where a value of 0means that there is no relation between the two ordinal variables and | | = 1 represents the strongest associations.
Measure of Association
14
![Page 15: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/15.jpg)
g follows a normal distribution such that our z statistic becomes:
Comparing our z statistic with a, we can determine if the variables are statistically independent
Testing Independence with g
z =s.e.
Standard Error (s.e.) = sample standard deviation / √ n
15
H0: There is no association between variables (g value is near zero)
H1: There is an association between variables
![Page 16: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/16.jpg)
Testing Independence with g
Using g to test for independence does not rely on the degrees of freedom
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
-4 -3 -2 -1 0 1 2 3 4
Assuming an a of 0.10 then our benchmark is z0 = 1.285
If z > z0 = 1.285 then reject H0
Rejection of the H0 means that the variables are not independent (see previous slide)
1.285
Area = a = 0.10
16
![Page 17: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/17.jpg)
Why do we not just use g and forget about c2 since g is a more
powerful statistic?
17
![Page 18: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/18.jpg)
c2 and g An ordinal measure of association (i.e. g ) may equal 0
when the variables are actually statistically dependent
c2 = 437, g = 0
18
![Page 19: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/19.jpg)
Association vs. Causation
If then ?
Stock Market
Hemline Theory = Association without Causation
19
![Page 20: Analyzing Statistical Results](https://reader034.fdocuments.in/reader034/viewer/2022052601/5596a7941a28abf1318b45aa/html5/thumbnails/20.jpg)
Association vs. Causation Causation can often be obscure or counterintuitive so
how do we tell the difference from Association? Performing controlled comparisons
Increasing the variable resolutions (i.e. the number
of data points)
Sequence Analysis (time becomes a variable)
20