Design of Experiments Applied to Flight Testing · 2006. 11. 14. · Understanding Industrial...
Transcript of Design of Experiments Applied to Flight Testing · 2006. 11. 14. · Understanding Industrial...
-
Les BordelonUS Air Force SES RetiredNATO Lecture Series SCI-176Mission Systems EngineeringNovember 2006
Design of ExperimentsApplied
to Flight Testing
-
2
Why Is This Important?(Bottom line up front)
Every testing organization should establish policy and procedures and formulate testing philosophy and testing approaches to assure scientific validity and maximum efficiency in accomplishing ground and flight tests.
DOE covers almost all of the above—procedures, testing philosophy, testing approach directly related to scientific “validity” and maximum efficiency
-
3
Why Discuss This Are Here
Typical test engineers—talented “…those smart guys…”
Within discipline (AE, EE, ME, CE) and generally in flight testing Technical experts
However—limited knowledge in test design…alpha, beta, sigma, delta, p, & n or N
Test design is not an art…it is a science Make decisions too important to be left to professional
opinion alone…our decisions should be based on mathematical fact We owe it to our customers (those who pay us to work) We owe it to “those guys on the pointy end of the sword”
(those who ultimately use our products) We don’t want to deny the warfighter a better weapon system We don’t want to send warfighter a degraded system we think is better
-
4
Overview
Background on Design of Experiments Challenge of Test What’s the BESTBEST method of test? How we Design Operational and
Developmental Tests – 4 Stages Summary
-
5
A beer and a blemish …
1906 – W.T. Gossett, a Guinness chemist
Draw a yeast culture sample
Yeast in this culture? Guess too little –
incomplete fermentation; too much -- bitter beer
He wanted to get it rightright
1998 – Mike Kelly, an engineer at contact lens company
Draw sample from 15K lot How many defective lenses? Guess too little – mad
customers; too much -- destroy good product
He wanted to get it rightright
http://www.laconstancia.com/IMAGES/guiness.jpghttp://www.cibavision.co.uk/clw/images/lens.gif
-
6
What is Design of Experiments (DOE)
Designed Experiment (n). Purposeful control of the inputs (factors) in such a way as to deduce their relationships (if any) with the output (responses). Systematic approach to investigation of a system
or process Series of structured tests in which
Planned changes are made to input variablesEffects of these changes on defined outputs are
assessed Maximizes information gained while minimizing
resources
-
7
Why Use Design of Experiments
The T & E Team must be able to Mathematically cite a test design’s planned false positive and false
negative values False positive (alpha) error—concluding a difference exists
when in fact there is none False negative (beta) error—failing to detect a difference when in
fact there is one Test and evaluate the system, as appropriate, over a broad
battlespace Provide with test results the actual (i.e., measured) false positive
and false negative values
Focus on achieving high Confidence (alpha) and Power (beta) over a broad battlespace—assures scientific validity and maximum efficiency
-
8
Historical DOE Timeline11800
1850
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000 1. Source: Appendix K --Understanding Industrial Designed Experiments, Schmidt and Launsby,1998
Hypothesis Tests (Gauss)
Factorial Experiments and ANOVA (Fisher)
Regression Concepts (Person, Galton)
2k Factorial Designs (Yates)Formalized Hypothesis Tests (Neyman, Pearson)
t-test (Gosset)
Taguchi develops his MethodsCentral Composite Designs (Box, Wilson)
Fractional Factorial Designs (Finney, Rao)
2k-p Fractional Factorial Resolution (Box, Hunter)
Optimal Designs (Kiefer, Wolfowitz)Box-Behnken Designs (Box, Behnken)
Algorithm for D-optimal designs (Johnson, Nachtsheim)
Detecting dispersion effects ratio of variances (Montgomery)
Least Squares (Gauss, Legendre)Factorial Designs and
ANOVA are DOE. DOE was first
developed and used in crop trials by Sir R. A.
Fisher, a mathematician and
geneticist
-
9
The central test challenge …
In all our testing – we reach into the bowl (reality) and draw a sample of operational performance
Consider an “Improved Missile A” Suppose an historical 80% hit rate
Is the new version at least as
good? We don’t know in advance which
bowl we are handed … The one where the system works
or, The one where the system doesn’t
The central challenge of test – what’s in the bowl?
-
10
Start -- Blank Sheet of Paper
Let’s draw a sample of n shots How many is enough to get it right?
3 – because that’s how much $/time we have 8 – because I’m an 8-guy 10 – because I’m challenged by fractions 30 – because something good happens at 30!
Let’s start with 10 and see …
-
11
We seek to balance our chance of errors
Putting them together, we see we can trade one error for the other (α for β)
We can also increase sample size to decrease our risks in testing
These statements are not opinion – they are mathematical fact and an inescapable challenge in testing
There are two other ways out … factorial designs and real-valued Multi Objective Problems (MOPS)
Getting it right: Confidence in stating results; Power to find small differences
Missile A OK -- 80% We Should Field
050
100150200250300350
3 4 5 6 7 8 9 10Hits
Freq
uenc
y
Missile A Poor -- 70% Pk -- We Should Fail
050
100150200250300
3 4 5 6 7 8 9 10Hits
Freq
uenc
y
Wrong 65% of time
Wrong 10% of time
-
12
A Drum Roll, Please …
For α = β = 5%, δ = 10% degradation in Pk
N≈133
But if we measure miss distance for same confidence and power
N≈11
-
13
Recap – First Two Challenges
Challenge 1: effect of sample size on errors – Depth of Test
Challenge 2: measuring best response – rich and Relevant
So -- it matters how many we do and it matters what we measure
There is a third challenge – Breadth of testing – searching the employment environment or battlespace
-
14
Mission Planning
TargetsWeatherThreatsAttacks
Weapons
Ground Checks
Bore sightVideo/Displays
SlewTrackFOV
Flight Checks
Bore sightVideoSlewTrackFOV
Attacks
Planned Planned Random
RandomLow/Med/High
Formations
TargetAcquisition
DesignationRDR/TGPNAV/HUD
Track/Lock Launch
SwitchologyWFOV/NFOV
EO contrast
Range cuesMax/Min
Flight profileWeapon guideImpact point
Battlespace Breadth: Missile A Attack Process
The test team (Ops, Analyst, PM, eng) break process down What are the events,
steps, outcomes? What are the ops
choices and conditions at each step?
What is the size of the ops test space and how do we measure success?
Result – Battlespace Conditions
-
15
Type Measure of PerformanceTarget acquisition rangeseeker lock-on rangelaunch rangemean radial impact distanceprobability of target impactreliabilityInteroperabilityhuman factorstech datasupport equipmenttactics
Objective
Subjective
Battlespace Conditions for Missile A Case
Present B/D/G variants seekers degrading Replace EO/IR versions with new H and K
variants Operational Question: Does Missile A still
perform at least as well as previously?
Conditions Settings # LevelsMissile Variant: H, K, B, G 4Launch Platform: Aircraft A, Aircraft B, Aircraft C 3Launch Rail: LR 1, LR 2 2Target: Point, Area 2Time of Day: Dawn/Dusk, Mid-Day 3Environment: Forest, Desert, Snow 3Weather: Clear (+7nm), Haze (3-7nm), Low Ceiling/Visibility (
-
16
Actual Missile A Test – A great success
Extensive captive carry 22 sorties Approx 100 sim
shots Old/new seekers
on each wing to equalize Wx
3 platforms Wet and Dry
Sites Results – approx
2x acq/trk range 9 shots
comparable to current performance
Run# Target Type Altitude Sun Angle Missile Type Cueing Comments*1 Tank/Truck 18,000’ Sun at 6 V2 Visual2 Tank/Truck 18,000’ Sun at 6 V3 “3 Tank/Truck 1500’ Sun at 3/9 V2 NAV/GPS4 Tank/Truck 1500’ Sun at 3/9 V3 “5 Tank/Truck 1500’ Sun at 6 V2 Radar6 Tank/Truck 1500’ Sun at 6 V3 “7 Tank/Truck 18,000’ Sun at 3/9 V2 Infra-Red8 Tank/Truck 18,000’ Sun at 3/9 V3 “
Run# Target Type Altitude Sun Angle Missile Type Cueing Comments*1 Tank/Truck 18,000’ Sun at 6 V2 Visual2 Tank/Truck 18,000’ Sun at 6 V1 “3 Tank/Truck 1500’ Sun at 3/9 V2 NAV/GPS4 Tank/Truck 1500’ Sun at 3/9 V1 “5 Tank/Truck 1500’ Sun at 6 V2 Radar6 Tank/Truck 1500’ Sun at 6 V1 “7 Tank/Truck 18,000’ Sun at 3/9 V2 Infra-Red8 Tank/Truck 18,000’ Sun at 3/9 V1 “
F-16 - #2 Left Wing: Missile A-1 on LR-2 Right Wing: Missile A-2 on LR-2
* After simulated pickle, simulate missile flyout by overflying target and recording seeker video.
Mission: MAV-1Wet Range Range Time: 0700-0900 (Dawn) Target: Point (Tank) Weather: As ScheduledLaunch Airspeed: 400 KIAS F-16 - #1 Left Wing: Missile A-1 on LR-2 Right Wing: Missile A-2 on LR-2
Typical Missile A Run Card
-
17
What’s BestBest? Many methods of test have been tried
Process: Radar Ground
Mapping
Inputs (test conditions)Inputs (test conditions) Outputs (Ops Procedures)Outputs (Ops Procedures)
Target location accuracyAngle off noseAircraft tail number
Calibration date of radar
Target Elevation
Target Range
Time of last doppler update
Target RCS
Operator Skill Level
Altitude
Intuition One factor at a time (OFAT) Scenario (best guess or case) +2 (DWWDLT, Rent-a-TIS)
Our second example will characterize the Aircraft A Radar Target Location Error as a function of several factors.
-
18
Radar Target Location Error
How big is the test space? Consider 9 factors at two levels each
For example, set angle off nose to 15 or 30 degrees If we tested each possible combination, how many
would there be?
29 = 512How should we examine this test space?
-
19
Intuition
One way to examine the test space is to rely on intuition or subject matter expertise
Intuition can greatly benefit a test; however… Requires deep subject matter
knowledge that is not always available
Typically only “discover” what you already believe to be true
Lacks objective proof of conclusions
Intuition can assist with test design but it should never be the sole strategy
-
20
One Factor At Time (OFAT)
Find nominal configuration for all nine factors and perform test
Vary first factor (keep others constant) and perform test
Determine the best setting for the first factor and fix it for the remainder of the test
Vary the next factor and determine its best setting
Continue until all factor settings have been determined Experienced
High
< 5 minutes
< week
Far
02
30 degrees
Large
High
High Setting
NewOperator skill
LowTarget elevation
> 30 minutesDoppler update time
> weekRadar cal date
NearTarget range
04Aircraft tail number
15 degreesAngle off nose
SmallTarget RCS
LowAltitude
Low Settingactor
Nominal settings are circled
-
21
OFAT Design looks like
11 combinations tested (10 repetitions of each combination)
No math model to predict response for other 501 combos
No ability to estimate if variables interact
At two levels each variable we have 2x2x2… = 29 = 512 combinations
Total number of combinations = ?
The Variables (Factors) Case A B C D E F G H J Y1 … Y10 Avg1 low low low low low low low low low2 hi low low low low low low low low3 best hi low low low low low low low4 best hi low low low low low low5 best hi low low low low low6 best hi low low low low7 best hi low low low8 best hi low low9 best hi low10 best hi11 best
We’re going to run 10 because we’re 10-guys and that’s what we do.
E. Target range
B. Target RCSA. Aircraft altitude
F. Time since radar calibration
D. Aircraft tail number
J. Years OSO experience
G. Time since doppler update
C. Angle off nose
H. Target elevation
-
22
Typical OFAT Test – Cookie Baking …ScatterplotFirst Time
TIME
PRO
CES
S YI
ELD
66
68
70
72
74
76
78
80
82
84
40 60 80 100 120 140 160 180
Scatterplot
Next Temperature
TEMP
PRO
CES
S YI
ELD
66
68
70
72
74
76
78
80
82
84
205 215 225 235 245 255
Objective is to maximize “cookie goodness”
TIME
TE
MP
205
215
225
235
245
255
60 80 100 120 140 160 180 200 220
82
6870 8075
68
78
75
76
706570
75
80
Set TempVary Time
Set “best” timeVary Temp
-
23
TIME
TE
MP
205
215
225
235
245
255
60 80 100 120 140 160 180 200 220
826870 8075
68
78
75
76
70
OFAT Assumption ...
If contours are not aligned with axes, we miss the optimum.
Problem is not simply to find best time and then best temperature -- we must find if the variables interact
(Response contours are often ridges, saddles and other shapes at an angle to our control axes)
70%
75%
80%
90%
True op
timum 90
+%
OFAT op
timum 80
%
Orthogonal DOE Factorial Design
-
24
Aircraft A Radar: No interaction
Parallel slopes of A with B -- response at level of B does not depend on level of A
Also -- can talk about variable settings independently -- “15 degrees off nose always best”
Simple model:
Y=b0+b1*Range+b2*Angle
Effect of Users on OB Query TimeTime to Respond (sec)
Number of Concurrent Users
Resp
onse
Tim
e (s
ec)
-10
0
10
20
30
40
50
1. 2.
FILE_SIZsmall FILE_SIZlarge
Lateral Error vs. Angle and Range
__ 15 deg off nose
---- 30 deg off nose
Range to Target FarNear
Late
ral E
rror
-
25
Aircraft A Radar: The Meaning of Interaction
OFAT cannot detect interaction
Nonparallel slopes of A with B -- response at level of B does depend on level of A
Also -- must talk about variable settings together -- “Long range targets detected best at 15 degrees off nose”
More complex model with interaction term:
Y=b0+b1* Range +b2* Angle + b3* Range * Angle
Effect of Network Load on OB Query Time Time to respond to query (sec)
Number of Concurrent Users
Time t
o Res
pond
-10
0
10
20
30
40
50
1. 2.
FILE_SIZsmall FILE_SIZlarge
Range to TargetNear Far
Late
ral E
rror
__ 15 deg off nose
---- 30 deg off nose
Lateral Error vs. Angle and Range
-
26
Scenario or Case Design
Factor settings are based on scenarios
Examples Easy, Hard Scenarios High, Low Risk
Scenarios Large, Small Scenarios Location Specific
Scenarios Generally leaves a
number of questions unanswered
Experienced
High
< 5 minutes
< week
Far
02
30 degrees
Large
High
High Setting
NewOperator skill
LowTarget elevation
> 30 minutesDoppler update time
> weekRadar cal date
NearTarget range
04Aircraft tail number
15 degreesAngle off nose
SmallTarget RCS
LowAltitude
Low SettingFactor
CASE 1
-
27
Scenario or Case DesignCase A B C D E F G H J Y1 … Y17 Avg
1 low hi low hi low hi low low low2 hi hi low low low low low low low3 low low hi low hi low low low low4 low low low low low hi low low low5 low low low low low low low hi hi6 low low low low low hi hi low low7 low low low low low low low low low8 low low hi hi hi low low low low9 low hi low low hi low low low low10 low low low low low low low low low11 hi hi hi hi hi hi hi hi hi
Best guess means choosing those combinations the subject expert feels most likely contain the answer. Usually, the organization has a magic number of replications (3,8,30…) they believe will be “good” in some unspecified sense. Often they say “significant” or “data is normal.”
Case A B C D E Ybar1 low hi hi low low 202 hi low low hi low 93 low hi hi low low 34 low low low low low 85 hi hi low low low 76 low hi hi low low 87 low low hi hi hi 248 low low hi low low 169 low hi low hi low 210 hi hi hi hi hi 3011 low low low low low 6
If runs 1-11 were best guess, what can we conclude?1. Does factor E affect the average?2. Can we separate effects of B and C?3. What D, E combo is missing?
-
28
Radar Mapping in One Mission
Problem: Characterize Aircraft A radar coordinate accuracy for variety of operational setups.
Design: Responses: absolute error Conditions: angle, side of nose, tail number, target, and range to target. 4 replicates.
Results: Similar accuracy across scan volume, target type, tail number.
Result: Single two-aircraft mission answered accuracy questions raised by 7 previous missions using conventional test methods.
Left side Right side
15 miles
30 milesAngle Angle
Ang
ular
Erro
r (m
ils)
Angular Error in Target Coordinates – Aircraft A Radar Mapping
-
29
How We Design Tests in 4 Stages
I. Project Description and Decomposition1. Statement of the Problem—why this, why now2. Objective of the Experiment—screen, model, characterize, optimize, compare…3. Response to Variables—the process outputs4. Potential Causal Variables—look for underlying physical variables
II. Plan the Test Matrix1. Establish Experimental Constraints—duration, analysis cycle time, reporting times…2. Rank Order Factors—experimental, fixed, noise…3. Select Statistical Design—factorial, 2k, fractional factorial, nested, split plot…4. Write Test Plan to include sample matrices, sample data, and sample output
III. Produce the Observations1. Randomize run order and block as needed2. Execute and control factors—respond to unexpected—record anomalies
IV. Ponder the Results1. Acquire, reduce, look, explore, analyze…2. Draw conclusions, redesign, assess results, and plan further tests (as needed)…
PlanIn Front In Back
Face East Face West Face East Face WestEyes Open Left Hand 0.43 0.58 0.52 0.40
Right Hand 0.62 0.29 0.28 0.36Eyes Closed Left Hand 0.62 0.57 0.47 0.40
Right Hand 0.42 0.26 0.42 0.47
PonderProcessManpower Materials
Methods Machines
Response to
EffectCauses
Causes
Measurements
Milieu (Environment)
Cause-Effect (CNX)
Diagram
Produce
-
30
Other Recent Project highlights …
2X increase in Missile A firing envelope in a week’s testing 50% reduction in expenditures and runs with increased
data fidelity 45% reduction in sorties and helped define test objectives
and procedures 33% reduction in bombs 25% reduction in resources using screening method $36K in lab costs saved for $144K Expeditionary-
Deployable Oxygen Concentrator System test
-
31
Summary
By the nature of Test, we face inescapable risks in getting it rightright Fielding a poor performing system Failing to field a good system Risks are knowable -- we want to quantify and manage these risks
We need to adopt a disciplined, structured, systematic approach—Design of Experiments meets the need The recommendations are too important to be left to professional
opinion alone—they need to be mathematical fact DOE is not Statistics DOE is a test strategy: 12 steps in 4 blocks DOE is a test design tool
DOE supplies the tool to explore multiple conditions while retaining power and confidence
DOE will help put the Capital “E” back in “T&e” Following these practices gives the BestBest tests for our resources
-
32
Backups
-
33
Challenge 3: Broader -- How Do Designed Experiments Solve This?
Designed Experiment (n). Purposeful control of the inputs (factors) in such a way as to deduce their relationships (if any) with the output (responses).
Shooting Missiles
Inputs (Conditions)Inputs (Conditions)
Missile A
Slew Sensor (TP, Radar)
Target Type Platform
Outputs (MOPs)Outputs (MOPs)
Hits/misses
RMS Trajectory Dev
P(damage)
Miss distance (m)
-
34
Cause and Effect Diagram to Elicit Missile A Shot Test Conditions
Manpower Materials
Methods Machines
Acq Rng (km)
Measurements
Milieu (Environment)
Cause Labels:C -- Constant (procedures)N -- Noise no controlX -- Control Variable
N Variable Units Range Priority Exp. Control Design Range1 Target Qual many H X trk, tank, bld2 Platform Qual Acft A/B H X Acft A/B3 Time of Day hour 0-24 H X Dawn, Noon4 Humidity gm/m3 30-Jun H X Dry/Wet5 Tgt Velocity mph 0 - 40 H C stationary6 Operator Skill Qual L,M,H M C H- captive runs7 Missile Model Qual V1234 L C V12348 Wind m/s 0-10 L N measure, R
Table of Conditions
-
35
Coded Formulation Example -- Factorials (Solution)
Note the geometric shape you built
All variables are orthogonal (right angles)
General algorithm to build -- vary A for half of runs, B for half of A and C for half of B ...
Case Mav Ver Target T.O. Day1 -1 -1 -12 -1 -1 13 -1 1 -14 -1 1 15 1 -1 -16 1 -1 17 1 1 -18 1 1 1
Msl VerT.
O. D
ayTarget
-
+-
- +
+
-
36
Projection Property of Factorial Designs
Two level factorials are powerful for exploring Form the basis for many more elegant designs Can be augmented to fit more complicated models
Target
Mav
Ver
Time of Day
Mav Version
Factorials Project (Collapse) Into Replicates in Fewer
Variables
Mav Ve
r
Time of Day
-
37
Desired End State
Leadership comprehend what is DOE and why implement DOE Most organizations leadership support use of design of
experiments where appropriate Many new programs can benefit from the application of DOE
-
38
Why DOE -- PostscriptPerfect Knowledge
Current Level of Knowledge
GAP
Proc
ess K
now
ledg
e
We use DOE to interrogate the process and improve our knowledge of how our process works. The goal is a systematic method to efficiently and unambiguously improve our outcomes.
Compared to any other systematic method, DOE designs:–Yield BetterBetter process understanding –Can be planned and analyzed FasterFaster–Are Cheaper,Cheaper, using only 20-80% of usual resources
-
39
Why: It’s a way to build better tests DOE is faster, less expensive, and more
informative than alternative methods Results are objective, credible, and
easy to defend Planning and Analysis are systematic
and require less effort It’s a way to succeed if you lack test
experience Why not ...
“If DOE is so good, why haven’t I heard of it before?”
“Aren’t these ideas new and unproven?”“It takes too many assumptions -- it’s all
theoretical!”
“To call in the statistician after the experiment is ... asking him to perform a postmortem examination: he may be able to say what the experiment died of.”
Address to Indian Statistical Congress, 1938.
DOE Founder DOE Founder Sir Ronald A. FisherSir Ronald A. Fisher
But Why Learn DOE?
-
40
Why DOE tests are best: scientific # runs (deep) and robust design (broad)
Factorial (crossed) designs let us learn more from the same number of assets
We can also use Factorials to reduce assets while maintaining confidence and power
Or we can combine the two
15 deg 30 deg4 4
4 reps 1 var
15 deg 30 deg10 mille 1 120 mile 1 110 mille 1 120 mile 1 1
Truck Tgt
Bldg Tgt
1 reps 3 vars
15 deg 30 deg10 mile 2 220 mile 2 2
2 reps 2 vars
15 deg 30 deg10 mile 120 mile 110 mile 120 mile 110 mile 120 mile 110 mile 120 mile 1
Tail 236
Tail 103Truck Tgt
Bldg Tgt
Truck Tgt
Bldg Tgt
½ rep 4 vars
All four Designs share the same power and confidence
-
41
Challenge 3: Systematically Search the Relevant Battlespace
Factorial (crossed) designs let us learn more from the same number of assets
We can also use Factorials to reduce assets while maintaining confidence and power
Or we can combine the two
MSA-V1 MSA-V24 4
4 reps 1 var
MSA-V1 MSA-V2Truck 1 1Tank 1 1Truck 1 1Tank 1 1
Site A (Wet)
Site B (Dry)
1 reps 3 vars
MSA-V1 MSA-V2Truck 2 2Tank 2 2
2 reps 2 vars
MSA-V1 MSA-V2Truck 1Tank 1Truck 1Tank 1Truck 1Tank 1Truck 1Tank 1
Midday (Stable)
Dawn (Transition)
Site A (Wet)
Site B (Dry)
Site A (Wet)
Site B (Dry)
½ rep 4 vars
All four Designs share the same power and confidence