Factorial Experiments2^k

Post on 14-Apr-2017

227 views 0 download

Transcript of Factorial Experiments2^k

1

Statistics

CSE 807

2

Experimental Design and Analysis

How to:• Design a proper set of experiments for measurement or

simulation.• Develop a model that best describes the data obtained.• Estimate the contribution of each alternative to the

performance.• Isolate the measurement errors.• Estimate confidence intervals for model parameters.• Check if the alternatives are significantly different.• Check if the model is adequate.

3

Example

• Personal workstation design.• Processor:68000, Z80, or 8086.• Memory size: 512K, 2M, or 8M bytes.• Number of Disks: One, two, three, or four.• Workload: Secretarial, managerial, or scientific.• User education: High school, college, or Post-

graduate level.

4

Terminology• Response Variable: Outcome.

E.g., throughput, response time.• Factors: Variables that affect the response variable.

E.g., CPU type, memory size, number of disk drivers, workload used, and user’s educational level.

Also called predictor variables or predictors.• Levels: The value that a factor can assume.

E.g., the CPU type has three levels: 68000, 8080, or Z80.# of disk drives has four levels. Also called treatment.

5

Terminology (cont’d)• Primary Factors: The factors whose effects need to

be quantified.E.g., CPU type, memory size only, and number of disk

drives.• Secondary Factors: “Factors whose impact need

not be quantified.E.g., the work loads.

• Replication: Repetition of all or some experiments.

6

Terminology (cont’d)

• Design: The number of experiments, the factor level and number of replications for each experiment.E.g., Full Factorial design with 5 replications: 3 X 3 X 4 X 3 X 3 or 324 experiments, each repeated five

times.• Experimental Unit: Any entity that is used for

experiments.E.g., users. Generally, no interest in comparing the units.Goal - minimize the impact of variation among the units.

7

Terminology (cont’d)

• Interaction => Effect of one factor depends upon the level of the other.

Non-interacting Factors Interacting Factors

A1 A2B1B2

36

58

A1 A2B1B2

36

59

8

Common Mistakes in Experimentation

1. The variation due to experimental error is ignored.2. Important parameters are not controlled.3. Effects of different factors are not isolated.4. Simple one-factor-at-a-time designs are used5. Interactions are ignored.6. Too many experiments are conducted.

Better: two phases.

9

Types of Experimental Designs

• Simple Designs: Vary one factor at a time

– #of Experiments =

Not statistically efficient.Wrong conclusions if the factors have interaction.Not recommended.

k

iin

1

)1(1

10

Types of Experimental Designs (cont’d)

• Full Factorial Design: All combinations.

– # of Experiments =

Can find the effect of all factors.Too much time and money.May try 2k design first

k

iin

1

11

Types of Experimental Designs (cont’d)

• Fractional Factorial Designs: Save time and expense.Less information.May not get all interactions.Not a problem if negligible interactions.

12

A Sample Fractional Factorial Design.

ExperimentNumber CPU Memory

LevelWorkload

TypeEducational

Level123456789

680006800068000Z80Z80Z80808680868086

512K2M8M

512K2M8M

512K2M8M

ManagerialScientificSecretarialScientificSecretarialManagerialSecretarialManagerialScientific

High SchoolPost-graduate

CollegeCollege

High SchoolPost-graduatePost-graduate

CollegeHigh School

13

Exercise• The performance of a System being designed depends

upon the following three factors:a. CPU type: 68000, 8086, 80286b. Operating System type: CPM, MS-DOS, UNIXc. Disk drive type: A, B, CHow many experiments are required to analyze the performance ifa. There is significant interaction among factors.b. There is no interaction among factorsc. The interactions are small compared to main effects.

14

2k Factorial Designs

• k factors, each at two levels.• Easy to analyze.• Helps in sorting out impact of factors.• Good at the beginning of study.• Valid only if the effect is unidirectional.

E.g., memory size, the number of disk drives

15

22 Factorial Designs• Two factors, each at two levels

Performance in MIPSCacheSize

Memory size4M Bytes 16M Bytes

4575

1525

1K2K

-1 if 4M bytes memory

1 if 16M bytes memory

-1 if 1M bytes cache

1 if 2M bytes cache

{{

xA=

xB=

16

Modely = q0 + qAxA + qBxB +qABxAxB

15= q0 - qA - qB + qAB

45= q0 + qA - qB - qAB

25= q0 - qA + qB - qAB

75= q0 + qA + qB + qAB

y = 40 + 20xA + 10xB + 5xAxB

Interpretation: Mean performance = 40 MIPSEffect of memory = 20 MIPS

Effect cache = 10 MIPSInteraction between memory and cache = 5 MIPS

17

Computation of EffectsExperiment A B y

1234

-11-11

-1-111

y1y2y3y4

Model: y = q0 + qAxA + qBxB +qABxAxB

Substitution:y1 = q0 - qA - qB + qAB

y2 = q0 + qA - qB - qAB

y3 = q0 - qA + qB - qAB

y4 = q0 + qA + qB + qAB

18

Computation of Effects (cont’d)Solution:q0 =1/4 (y1 + y2 + y3 + y4)

qA =1/4 (-y1 + y2 - y3 + y4)

qB =1/4 (-y1 - y2 + y3 + y4)

qAB =1/4 (y1 - y2 - y3 + y4)

Notice that effects are linear combinations of responses.Sum of the coefficients is zero => contrasts.Notice: qA = Column A x Column y

qB = Column B x Column y

qAB = Column A x Column B x Column y

19

Sign Table Method

I A B AB y1111

-11-11

-1-111

1-1-11

15452575

16040

8020

4010

205

TotalTotal/4

20

Allocation of Variation• Importance of a factor = proportion of the

variation explained

• Sample variance of• Variation of y Numerator

= sum of squares total (SST)

12

)(

2

2

1

2

2

2

i

i

y

yysy

22

1

2)(i

i yy

21

Allocation of Variation (cont’d)For a 22 design:

Variation due to Variation due toVariation due to interaction

SST = SSA + SSB + SSABFraction explained byVariation Variance

222222 222 ABBA qqqSST

22

22

2

2

B

A

qSSBB

qSSAA

222 ABqSSAB

SSTSSAA

22

DerivationModel:yi = q0 + qAxAi + qBxBi +qABxAixBi

Notice1. The sum of entries in each column is zero:

;0;0;04

1

4

1

4

1

i

BiAii

Bii

Ai xxxx

4)(

4

4

4

1

2

4

1

2

4

1

2

iBiAi

iBi

iAi

xx

x

x2. The sum of the squares of entries in each column is 4:

23

Derivation (cont’d)• 3. The columns are orthogonal (inner

product of any two columns is zero):

0)(

0)(

0

4

1

4

1

4

1

iBiAiBi

iBiAiAi

iBiAi

xxx

xxx

xx

24

Derivation (cont’d)

Sample mean

0

4

141

4

141

4

141

4

104

1

4

104

1

4

141

)(

q

xxqxqxqq

xxqxqxqq

y

y

iBiAiAB

iBiB

iAiA

i

BiAiABBiBAiAi

ii

25

Derivation (cont’d)Variation of y

222

4

1

224

1

224

1

22

4

1

24

1

24

1

2

4

1

2

4

1

2

444

0)()()(

)()()(

)(

)(

ABBA

iBiAiAB

iBiB

iAiA

iBiAiAB

iBiB

iAiA

iBiAiABBiBAiA

ii

qqq

xxqxqxq

xxqxqxq

xxqxqxq

yy

Product terms

26

ExampleMemory-cache study:

40)75254515(41 y

Total Variation

222

2222

4

1

2

54104204

2100)3515525(

)(

i

i yy

Total variation = 2100Variation due to memory = 1600 (76%)Variation due to cache = 400 (19%)Variation due to interaction = 100 (5%)

27

Case Study: Interconnection NetMemory interconnection networks:

Omega and Crossbar.Memory reference patterns:

random and MatrixFixed factors:1. Number of processors was fixed at 16.2. Queued requests were not buffered but blocked.3. Circuit switching instead of packet switching.4. Random arbitration instead of round robin.5. Infinite interleaving of memory => no memory back

contention.

28

22 Design for Interconnection NetworksFactors Used in the Interconnection Network Study

LevelSymbol Factor -1 1AB

Type of the networkAddress Pattern Used

CrossbarRandom

Omegamatrix

ResponseA B Throughput T 90%Transit N Response R-11-11

-1-111

0.06410.42200.79220.4717

3524

1.6552.3781.2622.190

29

Interconnection Network Study (cont’d)

Para-meter

Mean Estimate Variation Explained

q0

qA

qB

qAB

0.57250.0595-0.1257-0.0346

3.5-0.51.00.0

1.871-0.1450.4130.051

17.2%77.0%5.8%

20%80%0%

10.9%87.8%1.3%

T N R T N R

30

Interpretation of Results• Average throughput = 0.5725• Most effective factor = B = reference pattern

=> The address patterns chosen are very different.• Reference pattern explains 0.1257 (77%) of

variation• Effect of network type = 0.0595

Omega networks = Average + 0.0595Crossbar networks = Average - 0.0595Difference between the two = 0.119

• Slight interaction (0.0346) between reference pattern and network type.

31

General 2k Factorial Designsk factors at two levels each.2k experiments.2k effects:

k main effects

3

2

k

kTwo factor interactions

Three factor interactions...

32

2k Design ExampleThree factors in designing a machine:Cache sizeMemory sizeNumber of processors

Factor Level -1 Level 1ABC

Memory SizeCache SizeNumber of Processors

4MB1kB1

16MB2kB2

33

2k Design Example (cont’d)CacheSize

4M Bytes 16M Bytes

1K Byte2K Byte

1 Proc1410

2 Proc 1 Proc 2 Proc4650

2234

5886

I A B C AB AC BC ABC y11111111

-11

-11

-11

-11

-1-111

-1-111

-1-1-1-11111

1-1-111

-1-11

1-11

-1-11

-11

11

-1-1-1-111

-111

-11

-1-11

1422103446585086

32040

8010

405

16020

405

162

243

91

TotalTotal/8

34

Analysis

4512 8 72 32 200 3200 200 800) 1 3 2 5 20 5 10 ( 8

) ( 22 2 2 2 2 2 2

2 2 2 2 2 2 2 3

ABC BC AC AB C B Aq q q q q q q

SST

=18%+4%+71%+4%+1%+2%+0%=100%

Number of Processors (C) is the most important factor

35

ExerciseAnalyze the 23 design:

A1 A2

B1

B2

C1

10040

C2 C1 C2

1530

12020

1050

a. Quantify main effects and all interactions.b. Quantify percentages of variation explained.c. Sort the variables in the order of decreasing importance