Complexity Challenges to the Discovery of Relationships in...
Transcript of Complexity Challenges to the Discovery of Relationships in...
Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data
CPT John R. BrenceUnited States Military Academy
Donald E. Brown, PhDUniversity of Virginia
2Complexity Challenges in Eddy Current NDT Data
Outline
• Background Information• Eddy Current Non-destructive Tests (NDT)• Approach• Algorithms• Results • Interpretation of Results• Conclusions• Future Research• Questions
3Complexity Challenges in Eddy Current NDT Data
Background1 of 2
• Many commercial & military aircraft reached or exceeded original design life– USAF aircraft 20-35+ years old– KC-135 (40 yrs. old) extended for 25 years– Civilian airlines Boeing 727 family – Introduced in
60’s
• Corrosion is a serious threat especially to older aircraft– Significant increase in maintenance costs– Increasing concern about structural integrity
4Complexity Challenges in Eddy Current NDT Data
BackgroundAloha Flight 243
2 of 2
5Complexity Challenges in Eddy Current NDT Data
Relationship Hypothesis1 of 2
• Relationship between calibration specimen results & classifying corrosion on KC-135 parts
• Current artificial corrosion processes show similar characteristics as those found in naturally corroded lap joints
6Complexity Challenges in Eddy Current NDT Data
Relationship Hypothesis2 of 2
Natural Corrosion(KC-135 Specimen)
Artificial Corrosion(Calibration Specimen)
7Complexity Challenges in Eddy Current NDT Data
Eddy Current Non-destructive Tests
• Problems with current visual representation– Requires considerable expertise to create and
interpret– Need for visual clarity leads to data generalization,
averaging, or overlooked points ~ accuracy?– Missed corrosion may cause a catastrophic accident
8Complexity Challenges in Eddy Current NDT Data
ApproachOutline
• Data acquisition• Data transformation & consistency• Model development & feature selection• Model training & testing • Model Evaluation• Model Selection
9Complexity Challenges in Eddy Current NDT Data
ApproachGraphic
DatatransformationData acquisition Consistent
Data
No
Datamining
Modelselection
Model development
Model testing& evaluation
Iterate
YesFeatureselection
Variabletransform
DatatransformationData acquisition Consistent
Data
No
Datamining
Modelselection
Model development
Model testing& evaluation
Iterate
YesFeatureselectionFeatureselection
Variabletransform
10Complexity Challenges in Eddy Current NDT Data
ApproachData Acquisition
• Institute of Aerospace Research
– Calibration specimens– Retired KC-135 specimens
• Data
– Induced Voltage measurements from multi-frequency scans– Calibration specimen E1
n
m
1
Scan Direction
nm
n
m
1
Scan DirectionScan Direction
nm
11Complexity Challenges in Eddy Current NDT Data
ApproachData Transformation & Consistency
1 of 5
Eddy Current Specimen E1 data
– 4 different scan frequencies are the 4 predictor variables (5.5 kHz, 8 kHz, 17 kHz, & 30 kHz)
– Merged 4 scan frequency files into one file
12Complexity Challenges in Eddy Current NDT Data
ApproachData Transformation & Consistency
2 of 5
5.5 kHz 8 kHz
17 kHz 30 kHz
13Complexity Challenges in Eddy Current NDT Data
ApproachData Transformation & Consistency
3 of 5
Image20
Image60
Finagle
Image25
Results from PicView Program
Starred areas show whichpicture is used to model specific loss area.
14Complexity Challenges in Eddy Current NDT Data
ApproachData Transformation & Consistency
4 of 5
Specimen E1: EC Bitmap ≠ data set
0%5%7.5%10%
12.5%
15%
17.5%
40% 35% 30%
45% 50% 27.5%
22.5%20% 25%
Specimen E1: EC data set format
10%12.5%15%17.5%
20%
22.5%
25%
45% 40% 7.5%
50% 35% 5%
30%27.5% 0%
15Complexity Challenges in Eddy Current NDT Data
ApproachData Transformation & Consistency
5 of 5Eddy Current Sensitivity to Milled Thickness Loss
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
0 5 10 15 20 25 30 35 40 45 50
percent material removed
5.5 kHz
8 kHz
17 kHz
30 kHz
Specimen E1, Eddy Current Scan Data Mapping Validation
-0.50
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
0 5 10 15 20 25 30 35 40 45 50
percent material loss (%)av
erag
e vo
ltage
resp
onse
(V)
5.5 kHz (k5)8 kHz (k8)17 kHz (k17)30 kHz (k30)
Graph resulting fromdata transformationGraph from original study
16Complexity Challenges in Eddy Current NDT Data
ApproachModel Development & Feature Selection
• Eddy Current– Four predictors and one response variable– Looked at histograms of variables to categorize
the observation’s distribution – Used scaling and transformations of predictors
• Feature Selection (E.G., regression)– Stepwise, Forward, Backward selection– Maximum R2
adj, Mallows CP
17Complexity Challenges in Eddy Current NDT Data
ApproachModel Training & Testing
• Calibration specimen data used for training ~ Eddy Current specimen E1
• Training and Test data configuration (in general)
– 75% training (120,456 observations)– 25% test (40,152 observations)
18Complexity Challenges in Eddy Current NDT Data
ApproachModel Training Evaluation
• Akaike Information Criterion• Schwartz Criterion• Coefficient of multiple determination (R2)• Adjusted R2
• Mallows Cp• Mean Absolute Error• Mean Squared Error
19Complexity Challenges in Eddy Current NDT Data
ApproachModel Selection
Selection of best modeling methodology based on root mean squared error calculation on test set
( )2
1
1 ∑=
−=N
jjSETTEST yy
NMSE
20Complexity Challenges in Eddy Current NDT Data
Algorithms1 of 4
Multiple Regression
Considered polynomial, interaction, and transformed terms
ipipiii XXXY εββββ +++++= −− 1,12,21,10 ...
21Complexity Challenges in Eddy Current NDT Data
Algorithms2 of 4
Regression TreesLeast Squares example
Y Mean Value = 15Std dev = 0.425
Y Mean Value = 45Std dev = 0.334
Y Mean Value = 35Std dev = 0.401
Y Mean Value = 22.5Std dev = 0.396
Y Mean Value = 7.5Std dev = 0.297
Y Mean Value = 5Std dev = 0.185
Y Mean Value = 15Std dev = 0.501
Is k5 ≤ 1.16
Y Mean Value = 12.5Std dev = 0.446
Y Mean Value = 30Std dev = 0.420
YES NO
Is k8 ≤ 0.85 Is k17 ≥ 0.42YES NO YES NO
Is k30 ≤ 0.25YES NO
Y Mean Value = 15Std dev = 0.425
Y Mean Value = 45Std dev = 0.334
Y Mean Value = 35Std dev = 0.401
Y Mean Value = 22.5Std dev = 0.396
Y Mean Value = 7.5Std dev = 0.297
Y Mean Value = 5Std dev = 0.185
Y Mean Value = 15Std dev = 0.501
Is k5 ≤ 1.16
Y Mean Value = 12.5Std dev = 0.446
Y Mean Value = 30Std dev = 0.420
YES NO
Is k8 ≤ 0.85 Is k17 ≥ 0.42YES NO YES NO
Is k30 ≤ 0.25YES NO
22Complexity Challenges in Eddy Current NDT Data
Algorithms3 of 4
Polynomial Networks
Doublet
Input A
Input B
Input C
Input D
Triplet
Single
Doublet
Normalizers
1st Layer
2d Layer
3d Layer
Unitizers
Output
Doublet
Input A
Input B
Input C
Input D
Triplet
Single
Doublet
Normalizers
1st Layer
2d Layer
3d Layer
Unitizers
Output
23Complexity Challenges in Eddy Current NDT Data
Algorithms4 of 4
Ordinal Logistic Regression
1
0
P(Y≤ j)
P(Y≤ 2)
P(Y≤ 3)P(Y≤ 1)
Predictor Value(s)
1
0
P(Y≤ j)
P(Y≤ 2)
P(Y≤ 3)P(Y≤ 1)
1
0
P(Y≤ j)
P(Y≤ 2)
P(Y≤ 3)P(Y≤ 1)
Predictor Value(s)
24Complexity Challenges in Eddy Current NDT Data
Results Multiple Regression
1 of 7
• The more complex the model, the better it did with both training and test datasets
• Best model incorporated transformed 4th
order polynomial and interaction terms
• Problem ~ Heteroscedasticity(non-constant variance)
25Complexity Challenges in Eddy Current NDT Data
Results Regression Trees
2 of 7
• Program limitations for data size – 60,000 observation training dataset– 60,000 observation test dataset
• Least squares tree tested 2611 trees
• Least absolute deviation tested 172 trees
26Complexity Challenges in Eddy Current NDT Data
Results Regression Trees
3 of 7
• Least absolute deviation regression tree was the best
– Fewer nodes 819 vs. 1857– Smaller Complexity value: –1.0 vs 37.6– Smaller Root MSE for test set
27Complexity Challenges in Eddy Current NDT Data
Results Regression Trees
4 of 7
Least Squares Regression Tree
Least Absolute Deviation Regression Tree
28Complexity Challenges in Eddy Current NDT Data
Results Polynomial Networks
5 of 7
29Complexity Challenges in Eddy Current NDT Data
Results Ordinal Logistic Regression
6 of 7
• The more complex the model, the better it did with both training and test datasets
• Best model incorporated transformed 4th
order polynomial and interaction terms
30Complexity Challenges in Eddy Current NDT Data
Results Model Selection
7 of 7
RT MSE VAR5.388 29.0305.610 31.4684.872 23.7390.566 0.320LAD Regression Tree
ModelOverall Model Comparision by Test Set
Multiple Regression Model 8Logistic Regression Model 8
Polynomial Network
31Complexity Challenges in Eddy Current NDT Data
Interpretation of ResultsMore Complex = Better Model
1 of 3
x1
x2
Stitching Effect
Y=0 Y=1 Y=2 Y=4
x1
x2
Stitching Effect
Y=0 Y=1 Y=2 Y=4
Dataset requirescomplex parametric models
Or
Non-parametric models
32Complexity Challenges in Eddy Current NDT Data
Interpretation of ResultsMore Complex = Better Model
2 of 3
STITCHINGEFFECT
33Complexity Challenges in Eddy Current NDT Data
Interpretation of ResultsWhy LAD RT does well
3 of 3
LAD regression tree does well because:
– Robust in presence of heteroscedasticity
– Partitioning provides for improved accuracy
– Uses “stitching” to capture nonlinearities
34Complexity Challenges in Eddy Current NDT Data
Conclusions1 of 2
Maintenance Operations
– Showed that an algorithm can be developed to assist operators in maintenance decisions
– Showed how to transform and clean the eddy current data for analysis
– Provided a basis for choosing among competing algorithms for actual implementation
35Complexity Challenges in Eddy Current NDT Data
Conclusions2 of 2
Methodological
– Provided a formal approach for comparing different data mining techniques on real corrosion data
– Showed that real data sets can produce highly complex relationships (contrast with Ockham’srazor) and that models can be found to handle these complexities
– Demonstrated the power of tree-based methods to treat nonlinearities in the data through “stitching”, which was formerly thought to be a disadvantage
36Complexity Challenges in Eddy Current NDT Data
Future Research• Classification algorithms
• If time-lapse available ~ time series analysis
• Spatial models (correlation between corrosion areas)
• Other non-parametric techniques
• Application of a “known” naturally corroded specimen as test dataset
Questions ?
NASA’s “Vomit Comet”