Design of Experiments Applied to Flight Testing · 2006. 11. 14. · Understanding Industrial...

Les BordelonUS Air Force SES RetiredNATO Lecture Series SCI-176Mission Systems EngineeringNovember 2006

Design of ExperimentsApplied

to Flight Testing

2

Why Is This Important?(Bottom line up front)

Every testing organization should establish policy and procedures and formulate testing philosophy and testing approaches to assure scientific validity and maximum efficiency in accomplishing ground and flight tests.

DOE covers almost all of the above—procedures, testing philosophy, testing approach directly related to scientific “validity” and maximum efficiency

3

Why Discuss This Are Here

Typical test engineers—talented “…those smart guys…”

Within discipline (AE, EE, ME, CE) and generally in flight testing Technical experts

However—limited knowledge in test design…alpha, beta, sigma, delta, p, & n or N

Test design is not an art…it is a science Make decisions too important to be left to professional

opinion alone…our decisions should be based on mathematical fact We owe it to our customers (those who pay us to work) We owe it to “those guys on the pointy end of the sword”

(those who ultimately use our products) We don’t want to deny the warfighter a better weapon system We don’t want to send warfighter a degraded system we think is better

4

Overview

Background on Design of Experiments Challenge of Test What’s the BESTBEST method of test? How we Design Operational and

Developmental Tests – 4 Stages Summary

5

A beer and a blemish …

1906 – W.T. Gossett, a Guinness chemist

Draw a yeast culture sample

Yeast in this culture? Guess too little –

incomplete fermentation; too much -- bitter beer

He wanted to get it rightright

1998 – Mike Kelly, an engineer at contact lens company

Draw sample from 15K lot How many defective lenses? Guess too little – mad

customers; too much -- destroy good product

He wanted to get it rightright

http://www.laconstancia.com/IMAGES/guiness.jpghttp://www.cibavision.co.uk/clw/images/lens.gif

6

What is Design of Experiments (DOE)

Designed Experiment (n). Purposeful control of the inputs (factors) in such a way as to deduce their relationships (if any) with the output (responses). Systematic approach to investigation of a system

or process Series of structured tests in which

Planned changes are made to input variablesEffects of these changes on defined outputs are

assessed Maximizes information gained while minimizing

resources

7

Why Use Design of Experiments

The T & E Team must be able to Mathematically cite a test design’s planned false positive and false

negative values False positive (alpha) error—concluding a difference exists

when in fact there is none False negative (beta) error—failing to detect a difference when in

fact there is one Test and evaluate the system, as appropriate, over a broad

battlespace Provide with test results the actual (i.e., measured) false positive

and false negative values

Focus on achieving high Confidence (alpha) and Power (beta) over a broad battlespace—assures scientific validity and maximum efficiency

8

Historical DOE Timeline11800

1850

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

2000 1. Source: Appendix K --Understanding Industrial Designed Experiments, Schmidt and Launsby,1998

Hypothesis Tests (Gauss)

Factorial Experiments and ANOVA (Fisher)

Regression Concepts (Person, Galton)

2k Factorial Designs (Yates)Formalized Hypothesis Tests (Neyman, Pearson)

t-test (Gosset)

Taguchi develops his MethodsCentral Composite Designs (Box, Wilson)

Fractional Factorial Designs (Finney, Rao)

2k-p Fractional Factorial Resolution (Box, Hunter)

Optimal Designs (Kiefer, Wolfowitz)Box-Behnken Designs (Box, Behnken)

Algorithm for D-optimal designs (Johnson, Nachtsheim)

Detecting dispersion effects ratio of variances (Montgomery)

Least Squares (Gauss, Legendre)Factorial Designs and

ANOVA are DOE. DOE was first

developed and used in crop trials by Sir R. A.

Fisher, a mathematician and

geneticist

9

The central test challenge …

In all our testing – we reach into the bowl (reality) and draw a sample of operational performance

Consider an “Improved Missile A” Suppose an historical 80% hit rate

Is the new version at least as

good? We don’t know in advance which

bowl we are handed … The one where the system works

or, The one where the system doesn’t

The central challenge of test – what’s in the bowl?

10

Start -- Blank Sheet of Paper

Let’s draw a sample of n shots How many is enough to get it right?

3 – because that’s how much $/time we have 8 – because I’m an 8-guy 10 – because I’m challenged by fractions 30 – because something good happens at 30!

Let’s start with 10 and see …

11

We seek to balance our chance of errors

Putting them together, we see we can trade one error for the other (α for β)

We can also increase sample size to decrease our risks in testing

These statements are not opinion – they are mathematical fact and an inescapable challenge in testing

There are two other ways out … factorial designs and real-valued Multi Objective Problems (MOPS)

Getting it right: Confidence in stating results; Power to find small differences

Missile A OK -- 80% We Should Field

050

100150200250300350

3 4 5 6 7 8 9 10Hits

Freq

uenc

y

Missile A Poor -- 70% Pk -- We Should Fail

050

100150200250300

3 4 5 6 7 8 9 10Hits

Freq

uenc

y

Wrong 65% of time

Wrong 10% of time

12

A Drum Roll, Please …

For α = β = 5%, δ = 10% degradation in Pk

N≈133

But if we measure miss distance for same confidence and power

N≈11

13

Recap – First Two Challenges

Challenge 1: effect of sample size on errors – Depth of Test

Challenge 2: measuring best response – rich and Relevant

So -- it matters how many we do and it matters what we measure

There is a third challenge – Breadth of testing – searching the employment environment or battlespace

14

Mission Planning

TargetsWeatherThreatsAttacks

Weapons

Ground Checks

Bore sightVideo/Displays

SlewTrackFOV

Flight Checks

Bore sightVideoSlewTrackFOV

Attacks

Planned Planned Random

RandomLow/Med/High

Formations

TargetAcquisition

DesignationRDR/TGPNAV/HUD

Track/Lock Launch

SwitchologyWFOV/NFOV

EO contrast

Range cuesMax/Min

Flight profileWeapon guideImpact point

Battlespace Breadth: Missile A Attack Process

The test team (Ops, Analyst, PM, eng) break process down What are the events,

steps, outcomes? What are the ops

choices and conditions at each step?

What is the size of the ops test space and how do we measure success?

Result – Battlespace Conditions

15

Type Measure of PerformanceTarget acquisition rangeseeker lock-on rangelaunch rangemean radial impact distanceprobability of target impactreliabilityInteroperabilityhuman factorstech datasupport equipmenttactics

Objective

Subjective

Battlespace Conditions for Missile A Case

Present B/D/G variants seekers degrading Replace EO/IR versions with new H and K

variants Operational Question: Does Missile A still

perform at least as well as previously?

Conditions Settings # LevelsMissile Variant: H, K, B, G 4Launch Platform: Aircraft A, Aircraft B, Aircraft C 3Launch Rail: LR 1, LR 2 2Target: Point, Area 2Time of Day: Dawn/Dusk, Mid-Day 3Environment: Forest, Desert, Snow 3Weather: Clear (+7nm), Haze (3-7nm), Low Ceiling/Visibility (

16

Actual Missile A Test – A great success

Extensive captive carry 22 sorties Approx 100 sim

shots Old/new seekers

on each wing to equalize Wx

3 platforms Wet and Dry

Sites Results – approx

2x acq/trk range 9 shots

comparable to current performance

Run# Target Type Altitude Sun Angle Missile Type Cueing Comments*1 Tank/Truck 18,000’ Sun at 6 V2 Visual2 Tank/Truck 18,000’ Sun at 6 V3 “3 Tank/Truck 1500’ Sun at 3/9 V2 NAV/GPS4 Tank/Truck 1500’ Sun at 3/9 V3 “5 Tank/Truck 1500’ Sun at 6 V2 Radar6 Tank/Truck 1500’ Sun at 6 V3 “7 Tank/Truck 18,000’ Sun at 3/9 V2 Infra-Red8 Tank/Truck 18,000’ Sun at 3/9 V3 “

Run# Target Type Altitude Sun Angle Missile Type Cueing Comments*1 Tank/Truck 18,000’ Sun at 6 V2 Visual2 Tank/Truck 18,000’ Sun at 6 V1 “3 Tank/Truck 1500’ Sun at 3/9 V2 NAV/GPS4 Tank/Truck 1500’ Sun at 3/9 V1 “5 Tank/Truck 1500’ Sun at 6 V2 Radar6 Tank/Truck 1500’ Sun at 6 V1 “7 Tank/Truck 18,000’ Sun at 3/9 V2 Infra-Red8 Tank/Truck 18,000’ Sun at 3/9 V1 “

F-16 - #2 Left Wing: Missile A-1 on LR-2 Right Wing: Missile A-2 on LR-2

* After simulated pickle, simulate missile flyout by overflying target and recording seeker video.

Mission: MAV-1Wet Range Range Time: 0700-0900 (Dawn) Target: Point (Tank) Weather: As ScheduledLaunch Airspeed: 400 KIAS F-16 - #1 Left Wing: Missile A-1 on LR-2 Right Wing: Missile A-2 on LR-2

Typical Missile A Run Card

17

What’s BestBest? Many methods of test have been tried

Process: Radar Ground

Mapping

Inputs (test conditions)Inputs (test conditions) Outputs (Ops Procedures)Outputs (Ops Procedures)

Target location accuracyAngle off noseAircraft tail number

Calibration date of radar

Target Elevation

Target Range

Time of last doppler update

Target RCS

Operator Skill Level

Altitude

Intuition One factor at a time (OFAT) Scenario (best guess or case) +2 (DWWDLT, Rent-a-TIS)

Our second example will characterize the Aircraft A Radar Target Location Error as a function of several factors.

18

Radar Target Location Error

How big is the test space? Consider 9 factors at two levels each

For example, set angle off nose to 15 or 30 degrees If we tested each possible combination, how many

would there be?

29 = 512How should we examine this test space?

19

Intuition

One way to examine the test space is to rely on intuition or subject matter expertise

Intuition can greatly benefit a test; however… Requires deep subject matter

knowledge that is not always available

Typically only “discover” what you already believe to be true

Lacks objective proof of conclusions

Intuition can assist with test design but it should never be the sole strategy

20

One Factor At Time (OFAT)

Find nominal configuration for all nine factors and perform test

Vary first factor (keep others constant) and perform test

Determine the best setting for the first factor and fix it for the remainder of the test

Vary the next factor and determine its best setting

Continue until all factor settings have been determined Experienced

High

< 5 minutes

< week

Far

02

30 degrees

Large

High

High Setting

NewOperator skill

LowTarget elevation

> 30 minutesDoppler update time

> weekRadar cal date

NearTarget range

04Aircraft tail number

15 degreesAngle off nose

SmallTarget RCS

LowAltitude

Low Settingactor

Nominal settings are circled

21

OFAT Design looks like

11 combinations tested (10 repetitions of each combination)

No math model to predict response for other 501 combos

No ability to estimate if variables interact

At two levels each variable we have 2x2x2… = 29 = 512 combinations

Total number of combinations = ?

The Variables (Factors) Case A B C D E F G H J Y1 … Y10 Avg1 low low low low low low low low low2 hi low low low low low low low low3 best hi low low low low low low low4 best hi low low low low low low5 best hi low low low low low6 best hi low low low low7 best hi low low low8 best hi low low9 best hi low10 best hi11 best

We’re going to run 10 because we’re 10-guys and that’s what we do.

E. Target range

B. Target RCSA. Aircraft altitude

F. Time since radar calibration

D. Aircraft tail number

J. Years OSO experience

G. Time since doppler update

C. Angle off nose

H. Target elevation

22

Typical OFAT Test – Cookie Baking …ScatterplotFirst Time

TIME

PRO

CES

S YI

ELD

66

68

70

72

74

76

78

80

82

84

40 60 80 100 120 140 160 180

Scatterplot

Next Temperature

TEMP

PRO

CES

S YI

ELD

66

68

70

72

74

76

78

80

82

84

205 215 225 235 245 255

Objective is to maximize “cookie goodness”

TIME

TE

MP

205

215

225

235

245

255

60 80 100 120 140 160 180 200 220

82

6870 8075

68

78

75

76

706570

75

80

Set TempVary Time

Set “best” timeVary Temp

23

TIME

TE

MP

205

215

225

235

245

255

60 80 100 120 140 160 180 200 220

826870 8075

68

78

75

76

70

OFAT Assumption ...

If contours are not aligned with axes, we miss the optimum.

Problem is not simply to find best time and then best temperature -- we must find if the variables interact

(Response contours are often ridges, saddles and other shapes at an angle to our control axes)

70%

75%

80%

90%

True op

timum 90

+%

OFAT op

timum 80

%

Orthogonal DOE Factorial Design

24

Aircraft A Radar: No interaction

Parallel slopes of A with B -- response at level of B does not depend on level of A

Also -- can talk about variable settings independently -- “15 degrees off nose always best”

Simple model:

Y=b0+b1*Range+b2*Angle

Effect of Users on OB Query TimeTime to Respond (sec)

Number of Concurrent Users

Resp

onse

Tim

e (s

ec)

-10

0

10

20

30

40

50

1. 2.

FILE_SIZsmall FILE_SIZlarge

Lateral Error vs. Angle and Range

__ 15 deg off nose

---- 30 deg off nose

Range to Target FarNear

Late

ral E

rror

25

Aircraft A Radar: The Meaning of Interaction

OFAT cannot detect interaction

Nonparallel slopes of A with B -- response at level of B does depend on level of A

Also -- must talk about variable settings together -- “Long range targets detected best at 15 degrees off nose”

More complex model with interaction term:

Y=b0+b1* Range +b2* Angle + b3* Range * Angle

Effect of Network Load on OB Query Time Time to respond to query (sec)

Number of Concurrent Users

Time t

o Res

pond

-10

0

10

20

30

40

50

1. 2.

FILE_SIZsmall FILE_SIZlarge

Range to TargetNear Far

Late

ral E

rror

__ 15 deg off nose

---- 30 deg off nose

Lateral Error vs. Angle and Range

26

Scenario or Case Design

Factor settings are based on scenarios

Examples Easy, Hard Scenarios High, Low Risk

Scenarios Large, Small Scenarios Location Specific

Scenarios Generally leaves a

number of questions unanswered

Experienced

High

< 5 minutes

< week

Far

02

30 degrees

Large

High

High Setting

NewOperator skill

LowTarget elevation

> 30 minutesDoppler update time

> weekRadar cal date

NearTarget range

04Aircraft tail number

15 degreesAngle off nose

SmallTarget RCS

LowAltitude

Low SettingFactor

CASE 1

27

Scenario or Case DesignCase A B C D E F G H J Y1 … Y17 Avg

1 low hi low hi low hi low low low2 hi hi low low low low low low low3 low low hi low hi low low low low4 low low low low low hi low low low5 low low low low low low low hi hi6 low low low low low hi hi low low7 low low low low low low low low low8 low low hi hi hi low low low low9 low hi low low hi low low low low10 low low low low low low low low low11 hi hi hi hi hi hi hi hi hi

Best guess means choosing those combinations the subject expert feels most likely contain the answer. Usually, the organization has a magic number of replications (3,8,30…) they believe will be “good” in some unspecified sense. Often they say “significant” or “data is normal.”

Case A B C D E Ybar1 low hi hi low low 202 hi low low hi low 93 low hi hi low low 34 low low low low low 85 hi hi low low low 76 low hi hi low low 87 low low hi hi hi 248 low low hi low low 169 low hi low hi low 210 hi hi hi hi hi 3011 low low low low low 6

If runs 1-11 were best guess, what can we conclude?1. Does factor E affect the average?2. Can we separate effects of B and C?3. What D, E combo is missing?

28

Radar Mapping in One Mission

Problem: Characterize Aircraft A radar coordinate accuracy for variety of operational setups.

Design: Responses: absolute error Conditions: angle, side of nose, tail number, target, and range to target. 4 replicates.

Results: Similar accuracy across scan volume, target type, tail number.

Result: Single two-aircraft mission answered accuracy questions raised by 7 previous missions using conventional test methods.

Left side Right side

15 miles

30 milesAngle Angle

Ang

ular

Erro

r (m

ils)

Angular Error in Target Coordinates – Aircraft A Radar Mapping

29

How We Design Tests in 4 Stages

I. Project Description and Decomposition1. Statement of the Problem—why this, why now2. Objective of the Experiment—screen, model, characterize, optimize, compare…3. Response to Variables—the process outputs4. Potential Causal Variables—look for underlying physical variables

II. Plan the Test Matrix1. Establish Experimental Constraints—duration, analysis cycle time, reporting times…2. Rank Order Factors—experimental, fixed, noise…3. Select Statistical Design—factorial, 2k, fractional factorial, nested, split plot…4. Write Test Plan to include sample matrices, sample data, and sample output

III. Produce the Observations1. Randomize run order and block as needed2. Execute and control factors—respond to unexpected—record anomalies

IV. Ponder the Results1. Acquire, reduce, look, explore, analyze…2. Draw conclusions, redesign, assess results, and plan further tests (as needed)…

PlanIn Front In Back

Face East Face West Face East Face WestEyes Open Left Hand 0.43 0.58 0.52 0.40

Right Hand 0.62 0.29 0.28 0.36Eyes Closed Left Hand 0.62 0.57 0.47 0.40

Right Hand 0.42 0.26 0.42 0.47

PonderProcessManpower Materials

Methods Machines

Response to

EffectCauses

Causes

Measurements

Milieu (Environment)

Cause-Effect (CNX)

Diagram

Produce

30

Other Recent Project highlights …

2X increase in Missile A firing envelope in a week’s testing 50% reduction in expenditures and runs with increased

data fidelity 45% reduction in sorties and helped define test objectives

and procedures 33% reduction in bombs 25% reduction in resources using screening method $36K in lab costs saved for $144K Expeditionary-

Deployable Oxygen Concentrator System test

31

Summary

By the nature of Test, we face inescapable risks in getting it rightright Fielding a poor performing system Failing to field a good system Risks are knowable -- we want to quantify and manage these risks

We need to adopt a disciplined, structured, systematic approach—Design of Experiments meets the need The recommendations are too important to be left to professional

opinion alone—they need to be mathematical fact DOE is not Statistics DOE is a test strategy: 12 steps in 4 blocks DOE is a test design tool

DOE supplies the tool to explore multiple conditions while retaining power and confidence

DOE will help put the Capital “E” back in “T&e” Following these practices gives the BestBest tests for our resources

32

Backups

33

Challenge 3: Broader -- How Do Designed Experiments Solve This?

Designed Experiment (n). Purposeful control of the inputs (factors) in such a way as to deduce their relationships (if any) with the output (responses).

Shooting Missiles

Inputs (Conditions)Inputs (Conditions)

Missile A

Slew Sensor (TP, Radar)

Target Type Platform

Outputs (MOPs)Outputs (MOPs)

Hits/misses

RMS Trajectory Dev

P(damage)

Miss distance (m)

34

Cause and Effect Diagram to Elicit Missile A Shot Test Conditions

Manpower Materials

Methods Machines

Acq Rng (km)

Measurements

Milieu (Environment)

Cause Labels:C -- Constant (procedures)N -- Noise no controlX -- Control Variable

N Variable Units Range Priority Exp. Control Design Range1 Target Qual many H X trk, tank, bld2 Platform Qual Acft A/B H X Acft A/B3 Time of Day hour 0-24 H X Dawn, Noon4 Humidity gm/m3 30-Jun H X Dry/Wet5 Tgt Velocity mph 0 - 40 H C stationary6 Operator Skill Qual L,M,H M C H- captive runs7 Missile Model Qual V1234 L C V12348 Wind m/s 0-10 L N measure, R

Table of Conditions

35

Coded Formulation Example -- Factorials (Solution)

Note the geometric shape you built

All variables are orthogonal (right angles)

General algorithm to build -- vary A for half of runs, B for half of A and C for half of B ...

Case Mav Ver Target T.O. Day1 -1 -1 -12 -1 -1 13 -1 1 -14 -1 1 15 1 -1 -16 1 -1 17 1 1 -18 1 1 1

Msl VerT.

O. D

ayTarget

-

+-

- +

+

36

Projection Property of Factorial Designs

Two level factorials are powerful for exploring Form the basis for many more elegant designs Can be augmented to fit more complicated models

Target

Mav

Ver

Time of Day

Mav Version

Factorials Project (Collapse) Into Replicates in Fewer

Variables

Mav Ve

r

Time of Day

37

Desired End State

Leadership comprehend what is DOE and why implement DOE Most organizations leadership support use of design of

experiments where appropriate Many new programs can benefit from the application of DOE

38

Why DOE -- PostscriptPerfect Knowledge

Current Level of Knowledge

GAP

Proc

ess K

now

ledg

e

We use DOE to interrogate the process and improve our knowledge of how our process works. The goal is a systematic method to efficiently and unambiguously improve our outcomes.

Compared to any other systematic method, DOE designs:–Yield BetterBetter process understanding –Can be planned and analyzed FasterFaster–Are Cheaper,Cheaper, using only 20-80% of usual resources

39

Why: It’s a way to build better tests DOE is faster, less expensive, and more

informative than alternative methods Results are objective, credible, and

easy to defend Planning and Analysis are systematic

and require less effort It’s a way to succeed if you lack test

experience Why not ...

“If DOE is so good, why haven’t I heard of it before?”

“Aren’t these ideas new and unproven?”“It takes too many assumptions -- it’s all

theoretical!”

“To call in the statistician after the experiment is ... asking him to perform a postmortem examination: he may be able to say what the experiment died of.”

Address to Indian Statistical Congress, 1938.

DOE Founder DOE Founder Sir Ronald A. FisherSir Ronald A. Fisher

But Why Learn DOE?

40

Why DOE tests are best: scientific # runs (deep) and robust design (broad)

Factorial (crossed) designs let us learn more from the same number of assets

We can also use Factorials to reduce assets while maintaining confidence and power

Or we can combine the two

15 deg 30 deg4 4

4 reps 1 var

15 deg 30 deg10 mille 1 120 mile 1 110 mille 1 120 mile 1 1

Truck Tgt

Bldg Tgt

1 reps 3 vars

15 deg 30 deg10 mile 2 220 mile 2 2

2 reps 2 vars

15 deg 30 deg10 mile 120 mile 110 mile 120 mile 110 mile 120 mile 110 mile 120 mile 1

Tail 236

Tail 103Truck Tgt

Bldg Tgt

Truck Tgt

Bldg Tgt

½ rep 4 vars

All four Designs share the same power and confidence

41

Challenge 3: Systematically Search the Relevant Battlespace

Factorial (crossed) designs let us learn more from the same number of assets

We can also use Factorials to reduce assets while maintaining confidence and power

Or we can combine the two

MSA-V1 MSA-V24 4

4 reps 1 var

MSA-V1 MSA-V2Truck 1 1Tank 1 1Truck 1 1Tank 1 1

Site A (Wet)

Site B (Dry)

1 reps 3 vars

MSA-V1 MSA-V2Truck 2 2Tank 2 2

2 reps 2 vars

MSA-V1 MSA-V2Truck 1Tank 1Truck 1Tank 1Truck 1Tank 1Truck 1Tank 1

Midday (Stable)

Dawn (Transition)

Site A (Wet)

Site B (Dry)

Site A (Wet)

Site B (Dry)

½ rep 4 vars

All four Designs share the same power and confidence

Design of Experiments Applied to Flight Testing · 2006. 11. 14. · Understanding Industrial...

Documents

Transcript of Design of Experiments Applied to Flight Testing · 2006. 11. 14. · Understanding Industrial...