The Importance of NormalityOrange Empire – February 8, 2016
Larry Bartkus
What is Normality?
– The normal (or Gaussian) distribution is a very commonly
occurring continuous probability distribution
– Normal distributions are extremely important in statistics and are
often used in the natural and social sciences
– The Gaussian distribution is sometimes informally called the bell
curve
– Physical quantities that are expected to be the sum of many
independent processes (such as measurement errors) often
have a distribution very close to the normal
Normal Data vs Non-normal Data
Commonly found in nature
Easy to identify
No major mathematical
manipulations are necessary
Easy to explain and justify
Solid predictions can readily be
made
Often difficult to recognize and
sometimes hidden
Can be confused with outliers
Statistical manipulations are
sometimes needed to utilize
distribution
Difficult to explain or justify
Normal Data Non-normal Data
Normal Distribution
Non-normal Distribution
Time (t)0 200.00
f(t)
0.04
Lognormal
λ = 0
Weibull
λ = 0
Exponential
λ = 0
Probability Density Function
Why Graph Data
12108642
Median
Mean
8.07.57.06.56.0
1st Q uartile 5.0000
Median 7.0000
3rd Q uartile 9.0000
Maximum 12.0000
6.1712 7.8288
6.0000 8.0000
1.9867 3.1952
A -Squared 0.27
P-V alue 0.660
Mean 7.0000
StDev 2.4495
V ariance 6.0000
Skewness -0.000000
Kurtosis -0.544920
N 36
Minimum 2.0000
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev
95% Confidence Intervals
Summary for A
Why Graph Data
108642
Median
Mean
8.07.57.06.56.0
1st Q uartile 5.0000
Median 7.5000
3rd Q uartile 9.0000
Maximum 11.0000
6.1791 7.8209
5.7354 8.0000
1.9677 3.1646
A -Squared 0.61
P-V alue 0.101
Mean 7.0000
StDev 2.4260
V ariance 5.8857
Skewness -0.216101
Kurtosis -0.923506
N 36
Minimum 2.0000
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev
95% Confidence Intervals
Summary for B
Assessing for Normality – Distribution Analyzer
Assessing for Normality – Distribution Analyzer
Assessing for Normality – Anderson Darling
1612840
Median
Mean
65432
1st Q uartile 1.3757
Median 2.5724
3rd Q uartile 4.8224
Maximum 16.4155
2.4541 5.5224
1.8680 3.7050
3.2721 5.5232
A -Squared 2.77
P-V alue < 0.005
Mean 3.9883
StDev 4.1086
V ariance 16.8803
Skewness 1.90297
Kurtosis 3.02385
N 30
Minimum 0.3286
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev95% Confidence Intervals
Summary for C2
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
Assessing for Normality – Anderson Darling
Assessing for Normality – Ryan Joiner
151050-5
99
95
90
80
70
60
50
40
30
20
10
5
1
C2
Pe
rce
nt
Mean 3.988
StDev 4.109
N 30
RJ 0.863
P-Value <0.010
Probability Plot of C2
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
Normal
Assessing for Normality – Ryan Joiner
0.2600.2590.2580.2570.256
99.9
99
95
90
80
706050403020
10
5
1
0.1
EW Data 4
Pe
rce
nt
Mean 0.2581
StDev 0.0006815
N 60
RJ 1.000
P-Value >0.100
Probability Plot of EW Data 4
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
Normal
Sensitive to Normality
Non-Sensitive
Confidence intervals on means
T-tests
ANOVA, including DOE and
Regression
X-bar Charts
Sensitive
Confidence Intervals on standard
deviations
Tolerance Intervals, Reliability
Intervals
Variance tests
Cpk, Cp, Ppk, Pp
I-Charts
1) Measurement Resolution
Equipment lacks granularity
2) Data Shift
During collection of from a process a
shift has occurred
3) Multiple Sources of Data
Multiple operators machines,
Lots of Material, etc.
4) Truncated Data
There’s a stop somewhere
or there’s a sorting process
7 Common Reasons for Failing Normality
0.2600.2590.2580.2570.256
99.9
99
95
90
80
706050403020
10
5
1
0.1
EW Data 4
Pe
rce
nt
Mean 0.2581
StDev 0.0006815
N 60
RJ 1.000
P-Value >0.100
Probability Plot of EW Data 4
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
Normal
12.011.210.49.68.88.07.2
Median
Mean
10.410.310.210.110.09.9
1st Q uartile 9.610
Median 10.083
3rd Q uartile 10.771
Maximum 12.618
10.121 10.370
9.934 10.264
0.747 0.923
A -Squared 2.11
P-V alue < 0.005
Mean 10.245
StDev 0.826
V ariance 0.682
Skewness 0.656745
Kurtosis -0.222344
N 172
Minimum 9.002
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev95% Confidence Intervals
Summary for N(10,1) Truncate at 9
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
5) Presence of Outliers
Data has one or more points
that are anomalies. These differ
greatly from the rest of the population.
6) Too Much Data
When there are too many
data points involved in
the assessment. >100
7) Underlying Distribution is Not Normal
Some processes are not intended to be
normal such as time, microbio, pull force
7 Common Reasons for Failing Normality
12011010090
Median
Mean
116114112110108
1st Q uartile 104.40
Median 112.90
3rd Q uartile 116.80
Maximum 122.40
107.48 113.94
107.73 116.08
6.27 11.03
A -Squared 0.70
P-V alue 0.060
Mean 110.71
StDev 7.99
V ariance 63.87
Skewness -1.17564
Kurtosis 1.99926
N 26
Minimum 86.20
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev95% Confidence Intervals
Summary for EW Data 2
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
8765432
Median
Mean
5.105.055.004.954.90
1st Q uartile 4.3000
Median 5.0000
3rd Q uartile 5.6000
Maximum 8.2000
4.9042 5.0306
4.9000 5.1000
0.9762 1.0658
A -Squared 0.54
P-V alue 0.167
Mean 4.9674
StDev 1.0190
V ariance 1.0384
Skewness -0.0380120
Kurtosis 0.0171328
N 1000
Minimum 1.5000
A nderson-Darling Normality Test
95% C onfidence Interv al for Mean
95% C onfidence Interv al for Median
95% C onfidence Interv al for StDev95% Confidence Intervals
Summary for N(5,1) Rounded to 0.1
Project: NORMALITY EXAMPLES 2011-06-22.MPJ
Non-normal Data
Let’s First Treat the Data as Normal
Non-normal Data
Non-normal Data
Non-normal Data
Non-normal Data
Non-normal Data
Non-normal Data
Group Discussion
– What are the consequences of failing a capability
that should be acceptable due to an error in the
normality distribution assumption?
– What are the consequences of accepting a
capability study as satisfactory when it should fail
due to an error in the normality distribution
assumption?
Any Questions?
Top Related