Introduction to the Design and Analysis of...

IntroductiontotheDesignandAnalysisofExperiments

VioletR.SyrotiukSchoolofComputing,Informatics,andDecisionSystemsEngineering

1

ComplexEngineeredSystems

•  Whatmakesanengineeredsystemcomplex?–  Itslargesize.– Humanscontroltheirstructure,operation,andevolutionovertime.

•  Examplesofcomplexengineeringnetworks:– Thepowergrid.– Transportationnetworks.– Computernetworks(e.g.,theInternet,etc.).

2

Experimentation

•  ExperimentationisawaytoimproveourunderstandingofCESs.

•  Experimentsareusedwidelyfor,e.g.:– Processcharacterizationandoptimization.–  Improvethereliabilityandperformanceofproductsandprocesses.

– Product/processdesignanddevelopment.– Achieveproductandprocessrobustness.

3

“Allexperimentsaredesignedexperiments;somearepoorlydesigned,somearewell-designed.”

George E. P. Box

4

Engineering Experiments

• Reduce time to design/develop new products & processes

• Improve performance of existing processes

• Improve reliability and performance of products

• Achieve product & process robustness

• Evaluation of materials, design alternatives, settingcomponent & system tolerances, etc.

AGeneralModelofaProcess/System†

†“DesignandAnalysisofExperiments,”byDouglasC.Montgomery,Wiley,8thedition,2013.

5

FourErasintheHistoryofDoE

•  Theagriculturalorigins,1918-1940s:– R.A.Fisherandhisco-workers.– Profoundimpactonagriculturalscience.– Factorialdesigns,ANOVA.

•  Thefirstindustrialera,1951-late1970s:– BoxandWilson,responsesurfaces.– Applicationsinthechemicalandprocessindustries.

6

FourErasintheHistoryofDoE(cont’d)

•  Thesecondindustrialera,late1970s-1990:– Qualityimprovementinitiativesinmanycompanies.

– Taguchiandrobustparameterdesign,processrobustness.

•  Themodernera,beginningcirca1990.

7

ExperimentationinCENs

•  Fromarecentworkshopreport†:“Thescienceofexperimentdesigniswidelyusedinscienceandengineeringdisciplines,butisoftenignoredinthestudyofcomplexengineerednetworks.Thisinturnhasledtoashortageofsimulationsthatwecanbelievein,ofexperimentsdrivenbyempiricaldata,andofresultsthatarestatisticallyilluminatingandreproducibleinthisfield.”

†NetworkingandInformationTechnologyResearchandDevelopment(NITRD),LargeScaleNetworking(LSN),WorkshopReportonComplexEngineeredNetworks,September2012.

8

FactorialDesigns

•  Inafactorialexperiment,allpossiblecombinationsoffactorlevelsaretested.

•  Thegolfexperiment:–  Typeofdriver.–  Typeofball.– Walkingversusriding.–  Typeofbeverage.–  Timeofround,etc.

9

TheExperimentalDesign

•  AnexperimentisgivenbyanN×karray.– Thekcolumnscorrespondtothefactors.

•  EachfactorFi,1≤i≤khasasetoflevelsLi.•  EachoftheNrowscorrespondstoatestinwhicheachfactorFiissettoalevelinLi.

•  Forthetwo-factorfactorialexperiment:Ball Driver

1 B O

2 B R

3 T O

4 T R 10

FactorialDesignswithSeveralFactors

11

AFractionalFactorial

12

StatisticalRigour

•  Understandingcommonstatisticalmethodsisinvaluableinbeingabletorepresentresultscoherentlyandaccurately.

•  Whenmeasuringasystemthatdoesnothavefixedbehaviour:– Performmultiplemeasurements(replicates).– Statisticalerrorarisesfromvariationthatisuncontrolled;itisgenerallyunavoidable.

13

SampleDistributions•  Therelationshipbetweenthemeasurementsofcentrality(mean,median,andmode)givehintsaboutthedistributionofthedatacollected.

sense of how the data is distributed and what theexpected behavior of the system will be.

In general, if the mean and median are rather close,but the mode is vastly different (or there are two candi-dates for the mode), a bimodal or multi-modal distribu-tion is suggested (see Figure 1b). As described above inSection 3.2.3, the standard deviation of a bimodal distri-bution can be quite large, which can serve as a check onthe assumption that a distribution is normal.

It is important to note that these guidelines are notfool-proof; comparing the mean, median, and mode canonly suggest the type of distribution from which datawas collected. Unfortunately, there is no rule of thumbthat always works, and when in doubt, the best course ofaction is to plot the data, look at it, and try to determinewhat is happening.

It is critical to select the appropriate metric of cen-trality in order to properly present data. “No mathemati-cal rule can tell us which measure of central tendencywill be most appropriate for any particular problem.Proper decisions rest upon knowledge of all factors in agiven case, and upon basic honesty” [Gould96].

6.2 Expressing VariationMeasures of centrality are not sufficient to completelydescribe a data set. It is often helpful to include a mea-sure of the variance of the data. A small variance impliesthat the mean is a good representative of the data,whereas a large variance implies that it is a poor one. Inthe papers we surveyed, we found that fewer than 15%of experiments included some measure of variance.

The most commonly used measure of variance isthe standard deviation, which is a measure of howwidely spread the data points are. As a rule of thumb, ina normal distribution, about 2/3 of the data falls withinone standard deviation of the mean (in either direction,on the horizontal axis). 95% of the data falls within twostandard deviations of the mean, and three standarddeviations account for more than 99% of the data.

For example, in Figure 1a, which follows a normaldistribution, the mean, median, and mode are equal, andthe standard deviation is approximately 40% of themean. However, in Figure 1b, which shows a bimodaldistribution, the mean, median, and mode are quite dif-ferent, and the standard deviation is 75% of the mean3.Figure 1c shows an exponential distribution where themedian and mode are close, but rather different than themean. (We discuss techniques for determining the distri-bution of a data set in Section 6.3.)

3. The large standard deviation here is because the distributionis bimodal, but bimodal distributions do not necessarily haveto have a large standard deviation. The peaks of a bimodaldistribution can be close together; in this example they are not.

Another metric for analyzing the usefulness of themean in an experiment is the margin of error. The mar-gin of error expresses a range of values about the meanin which there is a high level of confidence that the truevalue falls. For example, if one were concluding that thelatency of a disk seek is within four percent of the mean,

Figure 1. Sample distributions. The relationshipbetween the mean, median, and mode give hints about thedistribution of the data collected. In a normal distribution,the mean is representative of the data set, while in anexponential distribution, the mode and median are morerepresentative. In a bimodal distribution, no single metricaccurately describes the data.

A: Normal Distribution

B: Bimodal Distribution

C: Exponential Distribution

0

20

40

60

80

100

120

0 5 10 15 20

MeanMedian

Mode

0

30

60

90

120

150

180

0 5 10 15 20 25 30 35 40

MeanMedian

Mode

0

50

100

150

200

250

0 2000 4000 6000 8000

MeanMedian

Mode














0

20

40

60

80

100

120

0 5 10 15 20

MeanMedian

Mode

0

30

60

90

120

150

180

0 5 10 15 20 25 30 35 40

MeanMedian

Mode

0

50

100

150

200

250

0 2000 4000 6000 8000

MeanMedian

Mode














0

20

40

60

80

100

120

0 5 10 15 20

MeanMedian

Mode

0

30

60

90

120

150

180

0 5 10 15 20 25 30 35 40

MeanMedian

Mode

0

50

100

150

200

250

0 2000 4000 6000 8000

MeanMedian

Mode

14

ExpressingVariation•  Measuresofcentralityarenotsufficienttocompletelydescribeadataset.

•  Itisoftenhelpfultoincludeameasureofthevarianceofthedata.– Asmallvarianceimpliesthatthemeanisagoodrepresentativeofthedata,whereasalargevarianceimpliesthatitisapoorone.

•  Themostcommonlyusedmeasureofvarianceisthestandarddeviation.

15

MarginofError

•  Anothermetricforanalyzingtheusefulnessofthemeanisthemarginoferror.– Themarginoferrorexpressesarangeofvaluesaboutthemeaninwhichthereisahighlevelofconfidencethatthetruevaluefalls.

16

GraphingandErrorMargins

•  Thevalueoftheerrormarginsdepicttheresultsincompletelydifferentways.†

the margin of error would be four percent. Assumingthat this margin of error had been computed for a 0.05level of significance, then if the experiment wererepeated 100 times, 95 of those times the observedlatency would be within four percent of the value com-puted in the corresponding experiment.

Figure 2 is an example of the importance of show-ing the margin of error. In our example, Figure 2a is putforward to support a claim that a new technique hasreduced latency by 10%. However, this graph does notinclude any indication of the margin of error, or confi-dence intervals on the data. If the margin of error issmall, as in Figure 2b, it is reasonable to believe thatlatency has been reduced. Figure 2c, however, shows amargin of error that is as large as the stated improve-ment. The 10% reduction in latency falls within theerror bars, and might have arisen from experimentalerror.

It is very useful to be able to place results in thecontext of an error margin, and it is essential to be ableto do so when trying to determine the value of a newtechnique.

A related problem, which appears when measure-ments are taken, is mistaking measurement precision formeasurement accuracy. For example, on many versionsof Unix, gettimeofday() returns the current time inmicroseconds (its precision), but is only updated everyten milliseconds (its accuracy). Timing measurementstaken using gettimeofday() on these systems will berounded up (or down) to nearest multiple of ten millisec-onds. In situations such as these, it is critical to be awarenot only of how precise a measurement is, but also howaccurate. On a system with a 10ms clock granularity, itis a waste of time to attempt to make distinctions at themicrosecond level.

6.3 Probability Distributions and TestingAs stated above, normal distributions are commonlyfound in nature, but rarely found in computer science.When measuring experimental systems, one is morelikely to encounter other types of distributions. Unfortu-nately, it is not a trivial task to correctly identify whichdistribution best models a given a dataset. From a statis-tical point of view, an estimate of the mean and standarddeviation of can be calculated from measured data, butwithout knowing the actual distribution, it is impossibleto calculate the true mean and standard deviation. Fortu-nately, there are simple methods for determining the dis-tribution of a dataset.

Plotting a histogram of the values in a sampled dataset is easy way to get an idea of what type of distributionthe data follows. Figure 1 shows examples of severalcommon distributions with noticeably different shapes.Normal distributions (Figure 1a) are common in the nat-

ural sciences, and often represent the characteristics ofrepeated samples of a homogenous population. As men-tioned in Section 6.1, skewed distributions often occurwhen some phenomenon limits either the high or lowvalues in a distribution. Personal income is an exampleof a skewed distribution. An exponential distribution(Figure 1c) might be seen when modeling a continuousmemoryless system, such as inter-arrival time of net-

Figure 2. Graphing and Error Margins. The value ofthe error margins will depict the results in completelydifferent ways.

A: Latency Improvement without error margins

B: Latency Improvement with small error margins

C: Latency Improvement with large error margins

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2












60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2












60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

60

80

100

120

140

0 2 4 6 8 10 12

Late

ncy

(ms)

Number of packets

Graph1Graph2

†C.Small,N.Ghosh,H.Saleeb,M.Selzter,andK.Smith,“DoesSystemsResearchMeasureUp?”HarvardComputerScienceGroup,TechnicalReportTR-16-97.

17

ProbabilityDistributions&Testing

•  Plottingahistogramofthevaluesinasampleddatasetiseasywaytogetanideaofwhattypeofdistributionthedatafollows.

•  TheΧ2testcanbeusedtodetermineifsampleddatafollowsaspecificdistribution.– Χ2canbeusedtoobtainap-valuefromafamilyofΧ2distributions;thelargerthep-value,thehighertheprobabilitythatthemeasureddistributionmatchesthecandidatedistribution.

18

BasicStatisticalConcepts•  Hypothesistesting:Astatementeitherabouttheparametersofaprobabilitydistributionortheparametersofamodel.

H0:μ1=μ2 (nullhypothesis) H1:μ1≠μ2 (alternativehypothesis)

•  Ifthenullhypothesisisrejectedwhenitistrue,atypeIerrorhasoccurred.

•  Ifthenullhypothesisisnotrejectedwhenitisfalse,atypeIIerrorhasbeenmade.

19

AnalysisofVariance(ANOVA)

•  Analysisofthefixedeffectsmodel.– Estimationofmodelparameters.

•  Modeladequacychecking.– Thenormalityassumption.– Residuals.Plotsintime,versusfittedvalues,versusothervariables.

•  Practicalinterpretationofresults.

20

ResponseSurfaceMethodologyFramework

•  Factorscreening.•  Findingtheregionoftheoptimum.

•  Modellingandoptimizationoftheresponse.

21

OtherAspectsofRSM

•  Robustparameterdesignandprocessrobustnessstudies.– Findlevelsofcontrollablevariablesthatoptimizemeanresponseandminimizevariabilityintheresponsetransmittedfrom“noise” variables.

– OriginalapproachesduetoTaguchi(1980s).– ModernapproachbasedonRSM.

22

Summary

•  Thereismuchknownaboutdesigningandanalyzingexperiments!– Followgoodpractices,toimproverepeatabilityandreproducibilityofyourexperiments.

23

Introduction to the Design and Analysis of...

Documents

Transcript of Introduction to the Design and Analysis of...