C ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080...

12
IIE Transactions (2011) 43, 471–482 Copyright C “IIE” ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080/0740817X.2010.532854 A cautious approach to robust design with model parameter uncertainty DANIEL W. APLEY 1,and JEONGBAE KIM 2 1 Department of Industrial Engineering & Management Sciences, Northwestern University, Evanston, IL 60208-3119, USA E-mail: [email protected] 2 Korea Telecom Headquarters, 206 Jungja-Dong Bundang- Gu, Seongnam, Kyunggi, Korea, 463-711 E-mail: [email protected] Received September 2009 and accepted August 2010 Industrial robust design methods rely on empirical process models that relate an output response variable to a set of controllable input variables and a set of uncontrollable noise variables. However, when determining the input settings that minimize output variability, model uncertainty is typically neglected. Using a Bayesian problem formulation similar to what has been termed cautious control in the adaptive feedback control literature, this article develops a cautious robust design approach that takes model parameter uncertainty into account via the posterior (given the experimental data) parameter covariance. A tractable and interpretable expression for the posterior response variance and mean square error is derived that is well suited for numerical optimization and that also provides insight into the impact of parameter uncertainty on the robust design objective. The approach is cautious in the sense that as parameter uncertainty increases, the input settings are often chosen closer to the center of the experimental design region or, more generally, in a manner that mitigates the adverse effects of parameter uncertainty. A brief discussion on an extension of the approach to consider model structure uncertainty is presented. Keywords: Robust parameter design, cautious control, model uncertainty, Bayesian estimation, quality control, variation reduction, Six Sigma 1. Introduction In robust parameter design, which has received consider- able attention from academia and industry, one optimally selects the levels of a set of controllable variables (a.k.a. in- puts) in order to minimize variability in an output response variable, while keeping the mean of the response variable close to a target. The component of the response variabil- ity that can be affected by adjusting the inputs is typically assumed to be due to a set of uncontrollable (a.k.a. noise) variables. Hence, minimizing response variability amounts to choosing the inputs so that the output response is robust or insensitive to variations in the noise variables. Two main approaches to this problem are Taguchi’s robust parame- ter design (Taguchi, 1986; Nair, 1992; Wu and Hamada, 2000), which employs signal-to-noise ratios and crossed- array experimental designs, and response surface method- ology in conjunction with combined-array designs (Vining and Myers, 1990; Shoemaker et al., 1991; Myers et al., 1992; Lucas, 1994; Khattree, 1996). This article is focused on the response surface approach, which is often advocated because of its stricter adherence to *Corresponding author well-established techniques for statistical modeling, analy- sis, and experimental design. Consider the following re- sponse surface model, which is widely assumed in robust design studies (Myers and Montgomery, 2002). The output response y is represented as y = α + β g(x) + γ w + w Bx + ε (1) where x = [ x 1 , x 2 ,..., x p ] is a vector of p controllable input variables, w = [w 1 , w 2 , ..., w m ] is a vector of m uncontrollable noise variables, and ε is the model residual error. It is assumed that w is random with mean zero and known covariance matrix w (typically diagonal) and that ε is normally distributed with mean zero and variance σ 2 , independent of w. Each element of the l -length vector g(x) = [g 1 (x), g 2 (x), . . . , g l (x)] is a known function of the p controllable input variables. The scalar α, the l -length vector β, the m-length vector γ, and the m × p matrix B comprise the model parameters (excluding σ , which we treat differently), which we denote collectively by the vector θ = [α β γ b 1 , b 2 ··· b p ] , where b i denotes the i th column of B. If g(x) = x, for example, the model includes the main effects of x and its interactions with w. A more common choice for g(x) in robust design studies is g(x) = 0740-817X C 2011 “IIE” Downloaded By: [[email protected]] At: 16:21 1 April 2011

Transcript of C ISSN: 0740-817X print / 1545-8830 online DOI: 10.1080...

  • IIE Transactions (2011) 43, 471–482Copyright C© “IIE”ISSN: 0740-817X print / 1545-8830 onlineDOI: 10.1080/0740817X.2010.532854

    A cautious approach to robust design with model parameteruncertainty

    DANIEL W. APLEY1,∗ and JEONGBAE KIM2

    1Department of Industrial Engineering & Management Sciences, Northwestern University, Evanston, IL 60208-3119, USAE-mail: [email protected] Telecom Headquarters, 206 Jungja-Dong Bundang- Gu, Seongnam, Kyunggi, Korea, 463-711E-mail: [email protected]

    Received September 2009 and accepted August 2010

    Industrial robust design methods rely on empirical process models that relate an output response variable to a set of controllable inputvariables and a set of uncontrollable noise variables. However, when determining the input settings that minimize output variability,model uncertainty is typically neglected. Using a Bayesian problem formulation similar to what has been termed cautious control in theadaptive feedback control literature, this article develops a cautious robust design approach that takes model parameter uncertaintyinto account via the posterior (given the experimental data) parameter covariance. A tractable and interpretable expression for theposterior response variance and mean square error is derived that is well suited for numerical optimization and that also providesinsight into the impact of parameter uncertainty on the robust design objective. The approach is cautious in the sense that as parameteruncertainty increases, the input settings are often chosen closer to the center of the experimental design region or, more generally, ina manner that mitigates the adverse effects of parameter uncertainty. A brief discussion on an extension of the approach to considermodel structure uncertainty is presented.

    Keywords: Robust parameter design, cautious control, model uncertainty, Bayesian estimation, quality control, variation reduction,Six Sigma

    1. Introduction

    In robust parameter design, which has received consider-able attention from academia and industry, one optimallyselects the levels of a set of controllable variables (a.k.a. in-puts) in order to minimize variability in an output responsevariable, while keeping the mean of the response variableclose to a target. The component of the response variabil-ity that can be affected by adjusting the inputs is typicallyassumed to be due to a set of uncontrollable (a.k.a. noise)variables. Hence, minimizing response variability amountsto choosing the inputs so that the output response is robustor insensitive to variations in the noise variables. Two mainapproaches to this problem are Taguchi’s robust parame-ter design (Taguchi, 1986; Nair, 1992; Wu and Hamada,2000), which employs signal-to-noise ratios and crossed-array experimental designs, and response surface method-ology in conjunction with combined-array designs (Viningand Myers, 1990; Shoemaker et al., 1991; Myers et al.,1992; Lucas, 1994; Khattree, 1996).

    This article is focused on the response surface approach,which is often advocated because of its stricter adherence to

    *Corresponding author

    well-established techniques for statistical modeling, analy-sis, and experimental design. Consider the following re-sponse surface model, which is widely assumed in robustdesign studies (Myers and Montgomery, 2002). The outputresponse y is represented as

    y = α + β′g(x) + γ′w + w′Bx + ε (1)

    where x = [x1, x2, . . . , xp]′ is a vector of p controllableinput variables, w = [w1, w2, . . . , wm]′ is a vector of muncontrollable noise variables, and ε is the model residualerror. It is assumed that w is random with mean zero andknown covariance matrix �w (typically diagonal) and thatε is normally distributed with mean zero and varianceσ 2, independent of w. Each element of the l-length vectorg(x) = [g1(x), g2(x), . . . , gl (x)]′ is a known function of thep controllable input variables. The scalar α, the l-lengthvector β, the m-length vector γ, and the m × p matrixB comprise the model parameters (excluding σ , whichwe treat differently), which we denote collectively by thevector θ = [α β′ γ′ b′1, b′2 · · · b′p]′, where bi denotes the i thcolumn of B. If g(x) = x, for example, the model includesthe main effects of x and its interactions with w. A morecommon choice for g(x) in robust design studies is g(x) =

    0740-817X C© 2011 “IIE”

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 472 Apley and Kim

    [x1, x2, . . . , xp, x21 , x22 , . . . x

    2p, x1x2, x1x3, . . . , x1xp, x2x3, . . . ,

    xp−1xp]′, in which case the model is full quadratic in x.The standard dual response approach is to select x to

    minimize:

    Varε,w(y |θ, σ ) = (γ + Bx)′�w(γ + Bx) + σ 2, (2)subject to the constraint that the mean

    Eε,w(y |θ, σ ) = α + β′g(x) (3)equals some specified target T. Alternatively, one may selectx to minimize a Mean Square Error (MSE) objective func-tion Eε,w[(y − T)2 |θ,σ ]. The subscripts on the varianceand expectation operators indicate which random variablesthe operations are with respect to, and we have written themas conditioned on the model parameters θ and σ . The rea-son for the latter is that when optimizing x, one generallyestimates the parameters using experimental design andanalysis techniques and then views the estimates a if theywere the true parameters. Hence, parameter uncertaintydue to estimation error is generally neglected, which, asnoted in Shoemaker et al. (1991), could actually result inan increase in the response variance.

    This article develops a method for taking parameter un-certainty into consideration when choosing the input set-tings. The objective is to find robust design input settingsfor which the response is robust to parameter estimationerrors, as well as to the noise w. The Bayesian MSE, whichis refered to as the Cautious Robust Design (CRD) objec-tive function in this article, is minimized. Here, the CRDobjective function is defined as

    JCRD(x) = Eε,w,θ,σ [(y − T)2 | Y], (4)where Y denotes the observed response values over the ex-periment from which the parameters are estimated. Thesubscripts θ and σ are added on the expectation operatorto indicate that it is with respect to the posterior distribu-tion of the parameters, given the data Y, in addition to thedistributions of w and ε. It will be shown that JCRD(x) canbe expressed as a quite tractable function of x, θ̂, σ̂ 2, �θ,�w, and T, where θ̂ and σ̂ 2 denote the posterior means(point estimates) of θ and σ 2, and �θ denotes the poste-rior covariance matrix of θ. Thus, minimizing JCRD(x) willyield optimal x settings that are a function of the posteriorcovariance �θ, thereby taking into account parameter un-certainty. The Bayesian strategy of minimizing an objectivefunction of the form of Equation (4) bears close resem-blance to what has been referred to as cautious control inthe adaptive control literature (Åström and Wittenmark,1995). Hence the use of the CRD terminology.

    The CRD objective function (or the posterior varianceof the response in an analogous dual response CRD for-mulation, which will be considered later in this article) isa natural extension of the standard robust design objectivefunction and, hence, should have familiar conceptual ap-peal to practitioners. The resulting CRD approach has anumber of attractive characteristics. It leads to a relatively

    simple, closed-form expression for the objective function.Other Bayesian approaches for considering parameter un-certainty in robust design (reviewed in Section 2) requireMonte Carlo simulation to calculate the objective function.A different Monte Carlo simulation must be conductedfor each x of interest, which prohibits analytical optimiza-tion of the objective function and complicates numericaloptimization. The analytical expressions provide a conve-nient smooth function that can be easily evaluated withinan optimization routine. More generally, these analyticalexpressions provide insight into the mechanisms behindrobustness to parameter uncertainty or the lack thereof.

    The format of the remainder of this article is as fol-lows. Section 2 reviews prior Bayesian and frequentistapproaches for considering model structure and parame-ter uncertainty in experimental-based process optimizationand robust design. Section 3 discusses the prior and poste-rior distributions for the parameters and provides expres-sions for θ̂, σ̂ 2, and �θ. These in turn are used in Section 4to develop a tractable expression for JCRD(x). For a specialcase of the model (1) that is linear in x (i.e., g(x) = x), aclosed-form expression exists for the x that minimizes theCRD objective function. This provides insight into howCRD ensures robustness to parameter uncertainty, whichis discussed in Section 5. In Section 6, a leaf spring man-ufacturing example from the literature is used to illustrateCRD and compare it to standard robust design in whichparameter uncertainty is neglected. Although the CRD ob-jective function considers variability due to parameter un-certainty on par with variability due to the noise variables,it has a natural decomposition into (i) the standard robustdesign expression for the response variability that resultswhen the parameters are treated as known; and (ii) the ad-ditional response variability due to parameter uncertainty.Section 7 continues the leaf spring example to illustrate howthis decomposition provides insight into whether the cur-rent experiment yields sufficient information for optimizingthe process versus whether additional experimentation isneeded to reduce parameter uncertainty. Section 8 discussesthe distinctions between parameter uncertainty and noisevariability. Section 9 considers the implications of havingconstraints on the input variables, which are common inrobust design optimization problems. Section 10 briefly dis-cusses an extension of the CRD approach that considersuncertainty in the model structure. Section 11 discusses anextension of the CRD concepts to dual-response robustdesign, in which the objective is to minimize Varε,w,θ,σ [y |Y], subject to the constraint Eε,w,θ,σ [y | Y] = T. Section 12concludes the article.

    2. Review of prior work on model and parameteruncertainty in experimental-based processoptimization and robust design

    Cautious adaptive feedback control strategies that in-volve a Bayesian MSE objective function have been widely

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • Cautious robust design 473

    investigated (Åström and Wittenmark, 1995). Recently,Apley (2004) and Apley and Kim (2004) have investigatednon-adaptive versions of cautious control. Their work wasin the context of automatic feedback control, in which theinput settings x are actively adjusted online as each newresponse observation is obtained. In the context of robustdesign, in which the online input settings are held fixedat some optimized values based on an offline experiment,the use of the Bayesian MSE objective function (4) fol-lows Apley and Kim (2002) and Kim (2002). A numberof other approaches have also been proposed for takinginto account model uncertainty in robust design and, moregenerally, in experimental-based process optimization. Boxand Hunter (1954) developed a confidence region for thevalues of x that constitute a stationary point for (e.g., thatmaximize or minimize) a response surface. The confidenceregion is with respect to uncertainty in the parameters,which is taken into account from a frequentist perspectivevia the dependence of the confidence region on the distri-bution of the parameter estimates. Peterson et al. (2002) de-veloped an alternative confidence region that distinguishessaddle points from minima/maxima and that can handleconstraints on the inputs. In another frequentist approach,Myers and Montgomery (2002, p. 576) derived an unbi-ased estimate of Varε,w(y| θ,σ ) in Equation (2), which in-volves parameter uncertainty via the dependence of theirbias correction term on the covariance matrix of θ̂. Theyrecommended minimizing the unbiased variance estimateand also discussed a graphical approach in which one plotsunbiased estimates of the mean and variance in Equations(2) and (3) as functions of x, while simultaneously display-ing a confidence region for the true x settings that minimizeVarε,w(y| θ,σ ). Miró-Quesada and Del Castillo (2004) usedthe same bias correction term as Myers and Montgomery(2002), but their objective was to minimize an unbiased es-timate of Varθ̂,w (ŷ (w) |θ, σ ), where ŷ (w) is Equation (1)with θ̂ substituted for θ. Parameter uncertainty was takeninto account from a frequentist perspective by virtue of thevariance operation being with respect to θ̂, as well as w,which resulted in an expression that was a function of thecovariance matrix of θ̂. In a sense that will be discussedin Section 11, the CRD objective function that is adoptedresults in a more complete accounting of parameter uncer-tainty. Sahni et al. (2009) considered model uncertainty inthe context of mixture-process optimization. Monroe et al.(2010) considered the effects of model uncertainty on theselection of optimal designs for accelerated life tests.

    A number of Bayesian approaches have also been pro-posed for taking model uncertainty into account. Chipman(1998) considered the posterior distribution of {θ, σ} | Yand then recommended a Monte Carlo simulation in whichvalues of {θ, σ} are drawn from their posterior distribu-tion and substituted into Equations (2) and (3) (or anyother robust design criterion). A separate Monte Carlosimulation is conducted for each value of x of interest.The average and/or sample variance of Equations (2) and

    (3) over the Monte Carlo simulation can guide a designerin choosing x settings for which the response mean andvariance are robust to uncertainty in θ. Chipman (1998)also considered uncertainty in the model structure. Usingthe approach of Box and Meyer (1993), Chipman calcu-lated the posterior probabilities that each model withinsome class (e.g., all models consisting of subsets of the in-dividual terms in Equation (1)) is the true one, and thenwithin the Monte Carlo simulation they drew the modelstructures, as well as the parameters, from their posteriordistributions.

    Peterson (2004) and Miró-Quesada et al. (2004) pro-posed a Bayesian approach in which one calculates theposterior (given Y) probability that y falls within somespecified tolerance interval. The posterior distribution ofy| Y considers uncertainty/randomness in w, ε, θ, and σ .Peterson (2004) considered the noiseless case (i.e., termsinvolving w absent from Equation (1)), and Miró-Quesadaet al. (2004) extended the approach to include noise. Ra-jagopal and Del Castillo (2005) and Rajagopal et al. (2005)further extended the approach to incorporate uncertaintyin the model structure, the former treating the noiselesscase and the latter including noise. They used the approachof Box and Meyer (1993), also used by Chipman (1998), tocalculate the posterior model probabilities. For the analyseswith no noise, they utilized analytical expressions for theposterior distribution of y| Y (a t-distribution under a cer-tain choice of priors). For the analyses with noise terms inthe model, they relied heavily on Monte Carlo simulationto calculate the objective function for each x of interest, asin Chipman (1998).

    Relative to the aforementioned Bayesian approaches forrobust design with noise, a primary advantage of the pro-posed approach is that it is possible to derive a relativelysimple closed-form analytical expression for the objectivefunction. As mentioned in the Introduction, the analyticalexpression provides insight into robustness issues; facili-tates optimization; and offers a natural decomposition ofthe response variability into the standard robust designcomponent (i.e., assuming θ coincides with θ̂) and the ad-ditional component due to parameter uncertainty. This al-lows one to conveniently plot the individual componentsversus x. A plot of the additional variability due to pa-rameter uncertainty is informative when deciding whetherfurther experimentation is necessary to reduce parameteruncertainty, which is illustrated with examples later. More-over, graphical exploration of each component plot is quiteuseful if there are other design considerations (qualitativeor quantitative) that are difficult to incorporate into a for-mal mathematical optimization criterion, as is often thecase in practical robust design problems.

    Another obvious difference between the approach pro-posed in this article and the approaches of Peterson (2004)and Miró-Quesada et al. (2004) is that the underlyingdesign criteria are quite different. When deciding whichmethod is more appropriate, one should also consider

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 474 Apley and Kim

    which criterion is more physically meaningful for the prob-lem at hand. For some problems, minimizing the posteriorMSE may be more meaningful than maximizing the poste-rior probability that y falls within some specified toleranceinterval, and vice versa for other problems.

    3. Prior and posterior distributions for the parameters

    Let Y denote the n × 1 vector of observations of y obtainedfrom an experiment, over which the x and w settings werevaried according to some design matrix Z. By this it ismeant that for the n observations, the model (1) can bewritten as

    Y = Zθ + ε,where ε is an n × 1 Gaussian random vector with zeromean and covariance matrix σ 2In, and In denotes the n × nidentity matrix. If k denotes the dimension of the parametervector θ, each column of the n × k matrix Z correspondsto a single term in Equation (1). Each row of Z consists of a“1” (corresponding to the intercept term α) and the valuesof {gi (x): i = 1, 2, . . . , l}, {w j : j = 1, 2, . . . , m} and {xiw j ,i = 1, 2, . . . , p; j = 1, 2, . . . , m} for a single experimentalrun.

    For the linear Gaussian model a common choice of priordistributions, which will be assumed in this work, is an un-informative (locally flat) prior for log(σ ) and θ |σ ∼ Nk(µ,σ 2�). In other words, the prior distribution of σ is ∝ 1/σ ,and the prior distribution of θ given σ is multivariate nor-mal with mean µ (some specified k × 1 vector) and covari-ance matrix � (some specified k × k matrix). One oftenselects � to be diagonal with diagonal entries {φi : i = 1,2, . . . , k}, in which case σ 2φi is the prior variance of θ i ,the i th element of θ. Most of the prior Bayesian treatmentsof parameter uncertainty reviewed in Section 2 have as-sumed these priors. Notice that letting φi → ∞ representsminimal prior knowledge of θ i .

    Under these priors, it can be shown (Bunke and Bunke,1986, p. 439) that the posterior distribution of θ given Yand σ is

    θ | Y, σ ∼ Nk(µY, σ 2�Y),where µY = [�−1+ Z′Z]−1[�−1µ + Z′Y], and �Y = [�−1+Z′Z]−1. It can also be shown that the posterior distributionof σ−2 | Y is gamma. The full posterior distributions willnot be needed to derive an expression for JCRD(x), as willbe seen in the following section. All that are needed are theposterior mean and covariance of θ | Y and the posteriormean of σ 2 | Y. Because σ−2 | Y is gamma, it follows thatthe posterior mean of σ 2 | Y is (Bunke and Bunke, 1986,A 2.22)

    σ̂ 2 = Eσ [σ 2 | Y]= [θ̂−µ]′�−1[θ̂ − µ] + [Y − Zθ̂]′[Y − Zθ̂]

    n − 2 ,(5)

    where

    θ̂ = Eθ,σ [θ | Y] = Eσ [Eθ[θ | σ, Y] | Y] = Eσ [µY | Y] = µY= [�−1 + Z′Z]−1[�−1µ + Z′Y], (6)

    and

    �θ = Eθ,σ [(θ−θ̂)(θ−θ̂)′ | Y]= Eσ [Eθ[(θ−θ̂)(θ−θ̂)′ | σ, Y] | Y]= Eσ [σ 2�Y | Y] = σ̂ 2�Y = σ̂ 2[�−1 + Z′Z]−1, (7)

    are the posterior mean and covariance of θ | Y. Noticethat with minimal prior knowledge of θ (i.e., �−1 → 0k),Equations (6) and (7) reduce to the standard least squaresparameter estimates and covariance matrix, respectively,albeit using a different estimate of σ 2. For minimal priorknowledge of θ, Equation (5) reduces to the residual sumof squares [Y − Zθ̂]′[Y − Zθ̂], divided by n − 2.

    Joseph (2006) and Joseph and Delaney (2007) investi-gated an alternative choice of prior covariance for θ thatconsisted of choosing � so that the resulting prior dis-tribution of the response is in agreement (over some fullfactorial grid in the design space) with a specified Gaussianrandom process model for the response. This may be usefulin highly fractionated designs in which one prefers to retainmany high-order terms in the model.

    4. A closed-form expression for the CRD objectivefunction

    The results of the previous section yield a rather tractableexpression for JCRD(x). Toward this end, let θ̃ = θ − θ̂denote the vector of parameter estimation errors, and let α̃,β̃, γ̃, and B̃ be defined similarly. Substituting θ = θ̂ + θ̃for the parameters in Equation (1) gives:

    y − T = {α̂ + β̂′ g(x) + γ̂′w + w′B̂x − T} + {α̃ + β̃ g(x)+ γ̃′w + w′B̃x} + ε. (8)

    Since the posterior distribution of θ | Y is multivari-ate normal with mean θ̃ and covariance �θ, the posteriordistribution of θ̃ | Y is multivariate normal with mean zeroand the same covariance. Moreover, θ̃ | Y is independentof w, which denotes some future noise that is independentof the experimental data Y. Although θ̃ | Y is not indepen-dent of ε | Y (because their distributions both depend onσ ), it is straightforward to show that they are uncorrelated,which is the property that is needed in the following.

    Substituting Equation (8) into Equation (4) gives:

    JCRD(x) = (α̂ + β̂′g(x) − T)2 + (γ̂ + B̂ x)′�w(γ̂ + B̂ x)+ �α + g′(x)�βg(x) + x′ A x + 2g′(x)�βα+ 2x′a + d + σ̂ 2, (9)

    where �α = Eθ,σ (α̃2 | Y), �β = Eθ,σ (β̃β̃′ | Y) and �βα =Eθ,σ (β̃α̃ | Y) denote the posterior variances/covariances

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • Cautious robust design 475

    of α and β, and we define A = Eθ,σ (B̃′�wB̃ | Y), d =Eθ,σ (γ̃′�wγ̃ | Y), and a = Eθ,σ (B̃′�wγ̃ | Y). In arriving atterms like x′Ax in Equation (9), we have used the relation-ship Ew,θ,σ [(w′ B̃ x)2 | Y] = Ew,θ,σ (x′ B̃′ ww′B̃ x | Y) = Eθ,σ(Ew(x′ B̃′ ww′B̃ x | θ, σ, Y) | Y) = x′ Eθ,σ (B̃′ �wB̃ | Y)x =x′Ax. The scalar d, the p × p matrix A, and the p × 1 vectora can be readily constructed from the covariance matrices�w and �θ using the relationships d = Eθ,σ (γ̃′�wγ̃ | Y) =trace(�γ�w), Ai, j = Eθ,σ (b̃′i�wb̃ j | Y) = trace(�bi b j �w),and ai = Eθ,σ (b̃′i�wγ̃ | Y) = trace(�bi γ�w), where�bi b j = Eθ,σ (b̃i b̃′j | Y) and �bi γ = Eθ,σ (b̃i γ̃′ | Y). Noticethat �α, �β, �βα, �γ, �bi b j , and �bi γ are all directlyavailable as submatrices of �θ. Consequently, given g(x),θ̂, σ̂ 2, �θ, �w, and T, Equation (9) can be easily evaluatedanalytically, without the need for Monte Carlo simulation.

    Equation (9) has a revealing interpretation in terms ofthe effects of parameter uncertainty on the posterior MSE.Since Eε,w,θ,σ (y | Y) = α̂ + β̂′ g(x), the first term in Equa-tion (9) is the component of the MSE due to differencesbetween the posterior mean of y and the target. The sec-ond term represents the variance of y that is due to therandom noise variables w, under the assumption that thetrue parameters are equal to their estimates. This assump-tion is often referred to as Certainty Equivalence (CE) in theadaptive control literature. Borrowing this terminology, theanalogous CE objective function is the familiar standardrobust design expression:

    JCE(x) = Eε,w[(y − T)2 |θ = θ̂, σ = σ̂ ] = (α̂ + β̂′g(x) − T)2+ (γ̂ + B̂ x)′�w(γ̂ + B̂ x) + σ̂ 2. (10)

    Notice that we can write Equation (9) as JCRD(x) =JCE(x) +Jθ(x), where

    Jθ(x) = �α + g′(x)�βg(x) + x′ A x+ 2g′(x)�βα + 2x′a + d (11)

    represents the additional MSE due to parameter uncer-tainty. As parameter uncertainty decreases (i.e., as �θ →0k), all of the terms in Equation (11) disappear. As param-eter uncertainty increases, Jθ(x) increases, because eachterm in Equation (11) is proportional to elements of theposterior covariance �θ.

    5. A closed-form solution for xCRD with only lineareffects

    When g(x) = x, the model (1) reduces to the linear effectsmodel y = α + β′x + γ′w + w′Bx + ε, in which case wecan obtain a closed-form solution for the optimal CRDsettings when there are no constraints on x (constraintsare discussed in Section 9). Although the linear model isnot as broadly applicable as the model with quadratic g(x),the closed-form solution for the linear case provides insightinto the nature of CRD.

    Substituting g(x) = x in Equation (9) and setting thepartial derivative equal to zero gives the optimal CRD inputsettings:

    xCRD = [β̂β̂′ + B̂′ �wB̂ + �β + A]−1{(T − α̂)β̂− B̂′�wγ̂ − �βα − a}. (12)

    In contrast, the input settings that minimize the analo-gous CE objective function (10) are

    xCE = [β̂β̂′ + B̂′ �wB̂]−1{(T − α̂)β̂ − B̂′�wγ̂}, (13)

    which follows from Equation (12) with all parameter co-variance terms set equal to zero. Notice that if p > m + 1,the matrix β̂β̂

    ′ + B̂′ �wB̂ in Equation (13) is not invertible,and the inputs that minimize JCE are not unique. In thiscase, replacing the inverse of β̂β̂

    ′ + B̂′ �wB̂ by its singularvalue decomposition pseudoinverse corresponds to takingxCE to be the minimum-norm solution. This minimum-norm solution is used for all of the CE examples in thisarticle.

    Comparing Equations (12) and (13), the origin of theterm cautious in CRD becomes more apparent. Larger pa-rameter uncertainty (as measured by, say, the eigenvalues ofthe positive semi-definite matrices �β and A) results in theinverse of the matrix in brackets in Equation (12), and thusxCRD, being smaller than if parameter uncertainty wereneglected. Strictly speaking, larger parameter uncertaintycauses xCRD to be closer to the center of the experimentaldesign region, but this translates to xCRD being smaller if xis coded so that the zero vector represents the center. In thissense, the optimal settings for x are chosen more cautiouslyin CRD than if parameter uncertainty is neglected.

    The effects of parameter uncertainty on xCRD are evenmore apparent in the special case that a fractional factorialdesign (with no aliasing of terms that are included in themodel) is used and x and w are transformed to a scalefor which they take on values of ±1 over the experiment.With an orthogonal design matrix Z, and assuming a non-informative prior for θ, the posterior covariance becomes�θ = σ̂ 2n−1Ik. Thus, �βα and a are zero, �β = σ̂ 2n−1Ipand A = σ̂ 2n−1 trace(�w)Ip, which, when substituted intoEquation (12), yields:

    xCRD = [β̂β̂′ + B̂′ �wB̂ + σ̂ 2n−1{1 + trace(�w)}Ip]−1× {(T − α̂)β̂ − B̂′�wγ̂}.

    When parameter uncertainty (as measured by σ̂ 2n−1, theposterior variance of the estimated parameters) is zero,xCRD coincides with xCE. As parameter uncertainty in-creases, the xCRD settings shrink monotonically toward 0,the center of the experimental design region.

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 476 Apley and Kim

    Table 1. Description of variables in the leaf spring example

    Level

    Variable Represents Low High

    x1 Temperature 1840 1880x2 Heating time 25 23x3 Hold down time 2 3x4 Transfer time 12 10w1 Quench temperature 130–150 150–170

    6. An example

    The use of the proposed approach is illustrated with datafrom an experiment involving the manufacture of truckleaf springs, originally analyzed in Pignatiello and Ram-berg (1985) and later in Chipman (1998). There are fourcontrollable variables and one noise variable, whose de-scriptions are given in Table 1. All temperatures are in de-grees Fahrenheit, and all times are in seconds. The highand low values in Table 1 correspond to ±1 values for xand w, which are expressed in coded units for the analy-ses in this article. The output variable y is the free heightof the leaf spring, for which the target is T = 8 inches.The experiment was three replicates of a crossed-array de-sign with a two-level fractional factorial in x, the data forwhich are shown in Table 2. No quadratic effects can beestimated with this experiment, and, as shown in Chipman(1998), all x-by-x interactions appear insignificant. Hence,throughout the analysis, the linear effects model y = α +β′x + γw1 + w1Bx + ε discussed in Section 5 is used (i.e.,Equation (1) with g(x) = x and m = 1).

    Let φi → ∞ in order to represent minimal prior knowl-edge of θ, in which case Equations (5) and (6) yield the pointestimates α̂ = 7.636, β̂ = [0.111, −0.088, −0.014, 0.052]′,γ̂ = −0.062, B̂ = [0.016, 0.037, 0.005, −0.018], and σ̂ 2 =(0.186)2. Since the design matrix was orthogonal, Z′Z= nI10 = 48I10, and the parameter covariance be-comes �θ = σ̂ 2[�−1 + Z′Z]−1 = n−1σ̂ 2I10 = (0.0268)2I10.The noise variance was assumed to be �w = 1. Neglectingparameter uncertainty, the CE input settings from Equa-

    Table 2. Response data for the leaf spring example

    w1

    x1 x2 x3 x4 −1 1−1 −1 −1 −1 7.78 7.5 7.78 7.25 7.81 7.12

    1 −1 −1 1 8.15 7.88 8.18 7.88 7.88 7.44−1 1 −1 1 7.5 7.5 7.56 7.56 7.5 7.5

    1 1 −1 −1 7.59 7.63 7.56 7.75 7.75 7.56−1 −1 1 1 7.94 7.32 8 7.44 7.88 7.44

    1 −1 1 −1 7.69 7.56 8.09 7.69 8.06 7.62−1 1 1 −1 7.56 7.18 7.62 7.18 7.44 7.25

    1 1 1 1 7.56 7.81 7.81 7.5 7.69 7.59

    x1 x2

    JCRD

    x1 x2

    JCE

    x1 x2

    Jθθ

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    Fig. 1. Plots of JCRD, JCE, and Jθ versus the first two inputs forthe leaf spring example. JCRD results in a smaller optimal valuefor x1, for which the adverse effects of parameter uncertainty arelessened.

    tion (13) are xCE = [3.43, 0.24, −0.01, 0.09]′. In compari-son, the CRD input settings from Equation (12) are xCRD =[2.51, −0.45, −0.10, 0.38]′.

    Figure 1 plots JCRD(x), JCE(x), and Jθ (x) versus x1 andx2, with x3 and x4 held fixed at −0.10 and 0.38, respectively(their optimal CRD values). Notice that as x1 increases,

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • Cautious robust design 477

    JCRD increases more so than JCE, because Jθ(x) increasesas one attempts to extrapolate to values of x1 beyond theexperimental region. This is the primary net effect of pa-rameter uncertainty in this example, and it has the bene-ficial consequence of helping to ensure that the CRD x1setting is closer to the center of the experimental regionwithout adding an explicit constraint (explicit constraintswill be discussed in Section 9). Recall that the CE settingfor x1 is 3.43, which is far outside the experimental region.The CRD setting for x1 is 2.51, which, while still far enoughoutside the experimental region to cause concern, is sub-stantially smaller than the CE setting. The posterior MSEscorresponding to the optimal CE and CRD input settingsare JCRD(xCE) = 0.053 and JCRD(xCRD) = 0.048. Hence, amodest 10% improvement in the posterior MSE is achievedby taking into account parameter uncertainty when select-ing the optimal input settings. For this example, there was arelatively large number of experimental runs (n = 48) and,correspondingly, a relatively small level of parameter un-certainty. In the next section it will be demonstrated thatthe differences between the CRD and CE are much morepronounced for larger parameter uncertainty.

    7. Assessing the impact of parameter uncertainty and theneed for further experimentation

    For higher levels of parameter uncertainty than in the pre-ceding example, the posterior MSE when using xCRD maybe much lower than when using xCE, as will be demon-strated shortly. However, even if the CRD settings are used,one may find that the inflation of the MSE due to parameteruncertainty is still unacceptably large. The decompositionof the CRD objective function (9) into JCRD(x) = JCE(x)+Jθ(x), where JCE(x) and Jθ(x) are given by Equations(10) and (11), respectively, can aid in assessing whether thisis the case. Recall that Jθ(x) represents the additional MSEdue to parameter uncertainty and that when parameter un-certainty disappears, Jθ(x) reduces to zero, and JCRD(x)reduces to JCE(x). A simple plot of Jθ(x) versus x, as inthe bottom panel of Fig. 1, can be used to assess the neteffect of parameter uncertainty in terms of its direct impacton the robust design objective. This, in turn, provides in-sight into whether additional experimentation is necessaryto reduce parameter uncertainty, which is illustrated with acontinuation of the leaf spring example.

    Example continued: The n = 48 response observationsin Table 2 represent three replicates of a 25−1 frac-tional factorial design in x and w. Suppose, instead, thatonly a single replicate was conducted, resulting in onlyn = 16 runs, and that σ̂ increased from 0.186 to 0.372(i.e., doubled). Rather than arbitrarily choose one of thethree replicates from Table 2 to retain, it will be sim-ply assumed that the point estimates from the previoussection (α̂ = 7.636, β̂ = [0.111, −0.088, −0.014, 0.052]′,γ̂ = −0.062, B̂ = [0.016, 0.037, 0.005, −0.018] and σ̂ 2 =

    x1x2

    JCRD

    x1x2

    JCE

    x1x2

    Jθθ

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    -10

    12

    34

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    Fig. 2. Plots of JCRD, JCE, and Jθ versus the first two inputs for themodified leaf spring example with n = 16 and σ̂ = 0.372, insteadof n = 48 and σ̂ = 0.186. The effects of parameter uncertaintynow dominate, resulting in a more cautious CRD setting for x1.

    (0.372)2) came from a single replicate. In terms of theanalysis for this case, the net effect is that the pa-rameter covariance matrix increases by a factor of 12:From �θ = (0.0268)2I10 to �θ = σ̂ 2[�−1 + Z ′Z]−1 =n−1σ̂ 2I10 = (0.0928)2I10. The CE input settings, which ne-glect parameter uncertainty, remain unchanged from their

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 478 Apley and Kim

    earlier values: xCE = [3.43, 0.24, −0.01, 0.09]′. In contrast,the CRD input settings change from xCRD = [2.51, −0.45,−0.10, 0.38]′ (for the original Fig. 1 example) to xCRD =[1.10, −0.66, −0.11, 0.40]′ (for the example with smallern and larger σ̂ ). This has the benefit of further reducingthe magnitude of the largest input x1 to the point that it isalmost within the experimental region.

    Figure 2 plots JCRD(x) and its two components, JCE(x)and Jθ(x), versus x1 and x2. The other two inputs, x3 and x4,were held fixed at their optimal CRD values of −0.11 and0.40, respectively. A few points are worth noting. JCRD(x)increases dramatically as x1 increases, because of the verypronounced component due to parameter uncertainty (Jθ,plotted in the bottom panel of Fig. 2). The contribution ofJθ(x) to the MSE is much larger for the higher levels of pa-rameter uncertainty considered in this example, especiallyfor large values of x1. Consequently, JCRD(x) penalizes largevalues of x1 more than was the case for the Fig. 1 example,which explains why the optimal x1 setting is substantiallysmaller in this case. The posterior MSEs corresponding tothe optimal CE and CRD input settings are JCRD(xCE) =0.360 and JCRD(xCRD) = 0.219. Thus, with the larger pa-rameter uncertainty in this example, neglecting parameteruncertainty when selecting the inputs results in a 64% largerposterior MSE than when CRD is used.

    At the optimal CRD inputs, JCRD(xCRD) = 0.219 de-composes into its two components: JCE(xCRD) = 0.170,and Jθ(xCRD) = 0.049. The contribution of parameter un-certainty to the posterior MSE is roughly 29% of the CEcontribution. Based on this, one might decide that furtherexperimentation is required to reduce the parameter un-certainty. When assessing whether further experimentationis needed, one should also consider the relative contribu-tions of the two components of JCRD at the optimal CEinputs xCE. These are the input settings that would be usedif there were no parameter uncertainty and the true param-eters coincided with the point estimates. Because the pointestimates are the posterior mean of θ, they do, after all,represent one’s best guess at the true parameters. Follow-ing this line of reasoning, one might be interested in thefollowing questions.

    1. What benchmark MSE could be achieved in the hypo-thetical scenario that we know the true parameters andthey happen to coincide with their point estimates?

    2. How much will the reality of parameter uncertainty addto the MSE if we use the inputs optimized under thehypothetical benchmark scenario?

    The answers to these two questions are precisely JCE(xCE)and Jθ(xCE), respectively. For the results shown in Fig. 2, wehave JCE(xCE) = 0.138 and Jθ(xCE) = 0.221. This hypothet-ical benchmark MSE of 0.138 is better than JCE(xCRD) =0.170, the analogous value using the cautious inputs xCRD.Based on this, one might hesitate to so quickly rule outusing the potentially very good xCE settings in favor of

    the more-robust-to-parameter-uncertainty xCRD settings.However, neither should one just go ahead and use xCE,considering that parameter uncertainty at xCE could be ex-pected to almost triple the MSE (from 0.138 to 0.138 +0.221 = 0.360). One might conclude from this analysis thatfurther experimentation is necessary to reduce parameteruncertainty to levels at which one can use input settingswith greater confidence.

    Further experimentation also encompasses confirma-tion experiments, which are generally considered soundpractice in any response surface optimization. In the pre-ceding example, it had appeared that xCE may result ina lower MSE than xCRD if one neglects parameter un-certainty (JCE(xCE) = 0.138 versus JCE(xCRD) = 0.170).However, after performing the analyses described in thepreceding paragraph, it is also clear that the MSE couldin fact be substantially higher (e.g., 0.360 versus 0.138)because of parameter uncertainty if one uses xCE. Conse-quently, if one wished to entertain the notion of using xCE inhopes of achieving a lower MSE, then at the very least oneshould run a confirmation experiment at xCE. After run-ning a confirmation experiment, the entire analysis shouldbe repeated to update all relevant posterior distributions.

    8. Parameter versus noise uncertainty

    In the CRD paradigm, when formulating the objectivefunction and solving for the optimal inputs, no distinc-tion has been made between parameter uncertainty andnoise uncertainty. However, the two forms of uncertaintyare of course very different: The noise variables w are truerandom variables and will vary from part-to-part or batch-to-batch. In contrast, the parameters θ are fixed (but un-known) variables. They have been assigned a probabilitydistribution only as a convenient means of quantifying theiruncertainty. Instead of considering the uncertainties in thenoise and parameters on par in the CRD criterion, it isstraightforward to keep them distinct by decomposing

    JCRD(x) = JCE(x) + Jθ(x) = (α̂ + β̂′g(x) − T)2+(γ̂ + B̂ x)′�w(γ̂ + B̂ x) + Jθ(x) + σ̂ 2,

    similar to what was done when plotting the individual com-ponents JCE(x) and Jθ(x) in Figs. 1 and 2.

    The term (γ̂ + B̂ x)′�w(γ̂ + B̂ x) represents the contri-bution of noise variability to the MSE, whereas the termJθ(x) represents the contribution of parameter uncertainty.The term (α̂ + β̂′g(x) − T)2 represents the contribution ofan off-target mean. These terms could all be plotted indi-vidually to understand their relative contributions to theMSE, keeping the effects of noise variability distinct fromparameter uncertainty. Considering them together by min-imizing JCRD(x) is simply a convenient means to mitigatingthe adverse impact of parameter uncertainty.

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • Cautious robust design 479

    9. Input constraints and extrapolation beyond theexperimental region

    In robust design optimization, it is common to incorpo-rate constraints on the inputs. A constraint can result ifthe inputs are physically or economically constrained tolie in a region; or they can result simply because the ex-perimental region is finite, and extrapolation beyond theexperimental region is viewed as risky. Regarding the lat-ter, CRD inherently penalizes extrapolation beyond theexperimental region, where the effects of parameter un-certainty are greater. However, constraining the inputs tobe near the experimental region while optimizing the CEobjective function may also possess an inherent form ofcaution.

    To illustrate this, let xCE,con denote the input settingsthat minimize JCE(x) under the constraint that x lies withinthe experimental region (e.g., that each element of x liesbetween −1 and 1). For the example of Fig. 2, numericaloptimization reveals that xCE,con = [1, −1, −1, 1]′, for whichthe posterior MSE is JCRD(xCE,con) = 0.246. This is only amodest 12% larger than the posterior MSE for the optimalCRD inputs [JCRD(xCRD) = 0.219].

    The examples of Figs. 1 and 2 are regular orthogonaldesigns for which �βα and a are zero and, hence, the effectof parameter uncertainty is the lowest at the origin x =0 (see Equation (11)). In this case, minimizing JCE whileconstraining x to be closer to the origin inherently resultsin more cautious input settings than if JCE is minimizedwithout constraints. Furthermore, suppose one conducteda ridge analysis in which JCE is optimized under the con-straint that Jθ = λ for a range of values λ > 0. Because theCRD solution lies somewhere along the ridge path, thereexists some λ > 0 for which the constrained CE optimiza-tion solution coincides with the CRD solution.

    The situation becomes more nuanced for non-orthogonal designs. Consider a further modification of theleaf spring example with everything the same as in the ex-ample in Fig. 2, except that only n = 13 runs are conducted.Suppose that the three omitted runs (relative to the exam-ple in Fig. 2, for which n = 16) are three of the four runsat the {x1, x2} = {1, −1} corner. In particular, suppose theomitted runs are {x1, x2, x3, x4, w} = {1, −1, −1, 1, −1},{1, −1, 1, −1, −1}, and {1, −1, −1, 1, 1}. In this case,the optimal CRD input settings are xCRD = [0.62, −0.08,0.17, 0.38]′, for which JCRD(xCRD) = 0.278. Figure 3 plotsJCRD(x), JCE(x), and Jθ(x), versus x1 and x2, with x3 and x4held fixed at their optimal CRD values of 0.17 and 0.38, re-spectively. Notice that the effects of parameter uncertaintyare much higher at the x1, x2 = 1, −1 corner in the bottompanel of Fig. 3.

    In comparison, the optimal constrained CE inputs arexCE,con = [1, −1, −1, 1]′ (from numerical optimization), forwhich the posterior MSE is JCRD(xCE,con) = 0.413. This isalmost 50% larger than the posterior MSE for the optimalCRD inputs [JCRD(xCRD) = 0.278]. The reason why simply

    x1x2

    JCRD

    x1x2

    JCE

    x1x2

    Jθθ

    -1-0.5

    00.5

    1

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    -1-0.5

    00.5

    1

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    -1-0.5

    00.5

    1

    -1

    -0.5

    0

    0.5

    10

    0.1

    0.2

    0.3

    0.4

    0.5

    Fig. 3. Plots of JCRD, JCE, and Jθ versus the first two inputs forthe modified leaf spring example with n = 13 and σ̂ = 0.372.Because there was only one run at {x1, x2} = {1,−1}, there ishigher uncertainty at this corner.

    constraining the CE inputs to the experimental region didnot provide sufficient caution in this case is that the optimalconstrained CE settings happened to fall in the corner ofthe experimental region for which the effects of parameteruncertainty were large (the three omitted runs were

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 480 Apley and Kim

    deliberately chosen so as to create this scenario). Insituations like this, in which the design matrix is notorthogonal, and the effects of parameter uncertaintyare larger in certain directions of the input space, CRDautomatically accounts for the nuanced characteristics ofthe parameter uncertainty.

    Inclusion of model structure uncertainty within the CRDframework, as discussed in the following section, wouldtend to further penalize extrapolation outside the experi-mental region. More generally, it would penalize choosinginput settings that are far from design points. Consider thetwo-level fractional factorial design with no center pointsthat was used in the example of Fig. 2. Under the linearmodel assumption, the smallest uncertainty is at the ori-gin (see Equation (11) and the bottom panel of Fig. 2),even though there were no design points there. However,if one considers the possible presence of a quadratic term,then the uncertainty would be much larger at the origin.Similarly, the possible presence of quadratic terms wouldgreatly inflate the uncertainty as one extrapolates outsidethe range of the experimental region.

    10. Consideration of model structure uncertainty

    Following the approach of Chipman (1998), which wasalso adopted by Apley and Kim (2002), Kim (2002), Ra-jagopal and Del Castillo (2005), Rajagopal et al. (2005),and Ng (2010) when considering model structure uncer-tainty in robust design, the CRD results can be extended toaccount for model structural uncertainty. Let {S1, S2, . . . ,Sq} denote the set of all candidate model structures, whereeach model structure consists of subsets of the gi (x), w j ,and xiw j interaction terms in Equation (1). Under certainassumptions on the prior probabilities, the posterior prob-abilities {π1, π2, . . . , πq} that each model structure holdscan be calculated in much the same manner as the posteriordistributions for the parameters (refer to Box and Meyer(1993) or Chipman (1998) for details). The CRD strategywould be to select the x settings to minimize the weightedsum:

    J(x) =q∑

    i=1πi Ji (x),

    where Ji (x) = Eε,w,θ,σ [(y − T)2 | Si ] is the MSE from Equa-tion (5) under the assumption that the model structure Siholds. Since some of the models will exclude subsets of thecontrollable inputs, uncontrollable noise, and/or their in-teractions, one must be careful in summing the Ji (x) acrossthe different models in any optimization algorithm. Themost straightforward way to do this is to include all of theinput and noise variables in each Ji (x). If Si excludes a par-ticular main effect or interaction term, the correspondingelement of θ̂ (and the variance/covariance for that param-eter) would be set equal to zero when forming Ji (x).

    11. A dual-response version of CRD

    The CRD problem has been formulated as minimizing thesingle MSE criterion Eε,w,θ,σ [(y − T)2 | Y]. It is straight-forward to extend the CRD approach to the analogousdual response criteria in which Varε,w,θ,σ [y | Y] is mini-mized subject to the constraint Eε,w,θ,σ [y | Y] = T. Then,it can be written that

    Eε,w,θ,σ [y | Y] = α̂ + β̂′g(x),and

    Varε,w,θ,σ [y | Y]=Eε,w,θ,σ [(y−T)2 | Y]−(Eε,w,θ,σ [y | Y]−T)2= JCRD(x)−(α̂ + β̂′g(x)−T)2=(γ̂+B̂x)′�w(γ̂+B̂x)+�α+g′(x)�βg(x)

    + x′Ax + 2g′(x)�βα + 2x′a + d + σ̂ 2,(14)

    where Equation (9) has been used for JCRD(x). For this for-mulation, the use of Lagrange multipliers when performingthe constrained optimization may be helpful. For the spe-cial case that g(x) = x, the variance expression is quadraticin x, and the constraint is linear. In this case, using La-grange multipliers, it is straightforward to show (see Kim(2002)) that the closed-form solution is

    xCRD,dual =D−1β̂

    (T − α̂ + z′D−1β̂

    )

    β̂T

    D−1β̂−D−1z, (15)

    where D = B̂′ wB̂ + β+ A, and z = a + βα + B̂′ wγ̂.The dual-response CRD approach bears some resem-

    blance to the two frequentist approaches mentioned inSection 2 for taking parameter uncertainty into accountin dual-response robust design. As a dual-response objec-tive function, Miró-Quesada and Del Castillo (2004) pro-posed using an unbiased estimate of Varθ̂,w (ŷ (w) |θ, σ ),where ŷ (w) = α̂ + β̂′ g(x) +γ̂′w + w′B̂x. Their objective re-duces to minimizing (γ̂ + B̂x)′ w(γ̂ + B̂x) + g′(x) �β̂ g(x),where the parameter estimates and covariance matrix �β̂are from standard least squares. For comparison purposes,suppose a non-informative prior is assumed in the CRDapproach, so that the posterior parameter estimates andcovariance matrix coincide with their least squares coun-terparts. The objective function of Miró-Quesada and DelCastillo (2004) is missing a number of terms related to pa-rameter uncertainty that are present in the CRD objectivefunction (14). In particular, the quantity x′Ax + 2x′a + d ismissing. Consequently, although their approach considersuncertainty in β in the same manner as does CRD, it doesnot take into account uncertainty in B. It is worth notingthat the quantity x′Ax + 2x′a + d appears at one point inthe derivations of Miró-Quesada and Del Castillo (2004),but it is later subtracted out as a result of the fact thatit is precisely the bias correction suggested by Myers and

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • Cautious robust design 481

    Montgomery (2002, p. 576) when estimating the responsevariance.

    To contrast the two approaches, reconsider the exam-ple of Fig. 3, discussed in Section 9. From Equation (15),the CRD inputs are xCRD,dual = [2.10, −0.86, 0.40, 1.18]′,for which the posterior MSE is JCRD(xCRD,dual) = 0.480.In comparison, the optimal inputs using the criterion ofMiró-Quesada and Del Castillo (2004), which are given byEquation (15) with A, a, and �βα all set equal to zero,are xMdC = [2.17, −0.85, 0.31, 1.00]′, for which the poste-rior MSE is JCRD(xMdC) = 0.481. Two things are apparent.First, the dual-response solution of Miró-Quesada and DelCastillo (2004) is very similar to the dual-response CRD so-lution for this example. Second, both of the dual-responsesolutions call for much larger input settings and incur sub-stantially higher posterior MSE than the non-dual CRDsolution. Recall that xCRD = [0.62, −0.08, 0.17, 0.38]′, forwhich JCRD(xCRD) = 0.278. Evidently, even though param-eter uncertainty is incorporated into the objective function(14), the hard constraint that the posterior response meanα̂ + β̂′g(x) equals the target can outweigh the penalty thatEquation (14) places on using input settings for which theeffects of parameter uncertainty are large.

    Myers and Montgomery (2002, p. 576) recommend min-imizing (γ̂ + B̂x)′�w(γ̂ + B̂x) − x′Ax − 2x′a −d + σ̂ 2,which they derive as an unbiased estimate of Varε,w(y |θ,σ ). Notice the minus sign on the three bias correctionterms. From a frequentist perspective, the minus signs arereasonable if the objective is purely to obtain an unbiasedestimate of the response variance. From a Bayesian per-spective, however, it is counterproductive as a strategy forminimizing the response variance in a manner that takesinto account parameter uncertainty. It would tend to callfor input settings that make x′Ax + 2x′a larger, which, fromEquation (14), serves to increase the posterior responsevariance, rather than decrease it. Even from a frequentistperspective, it has some undesirable properties as an objec-tive function. Because A is positive semi-definite, includingthe −x′Ax term in the objective function will tend to callfor larger x settings than if the standard variance objectivefunction (γ̂ + B̂x)′�w(γ̂ + B̂x) +σ̂ 2 is used. In fact, if pa-rameter uncertainty in B is large enough that B̂

    ′�wB̂ − A

    has a negative eigenvalue, their objective function to beminimized (the unbiased expression for the variance) is un-bounded from below, which will call for infinitely large xsettings. The cautious approach, on the other hand, tendsto call for smaller x settings as parameter uncertainty in-creases, which seems a more intuitively appealing way tomitigate the adverse effects of parameter uncertainty.

    12. Conclusions

    This article has investigated a Bayesian MSE objectivefunction as a means of taking parameter uncertainty into

    account in robust design optimization. A key feature of thisapproach is that a tractable, closed-form expression for theCRD objective function is obtained as a function of theinput settings. The only posterior information needed tocalculate the CRD objective function is the posterior mean(i.e., point estimates) and posterior covariance matrix ofthe parameters, for which simple expressions have beenprovided in Section 3. The resulting CRD objective func-tion (9) is a quadratic function of g(x) and x and, hence, isa well-behaved one for numerical optimization, at least fortypical g(x) considered in robust design studies.

    The term cautious robust design is fitting. As has beendemonstrated, it tends to call for more cautious x settingsthan if parameter uncertainty is neglected. By cautious it ismeant x settings at which the effects of parameter uncer-tainty are mitigated. For rotatable, orthogonal designs, forwhich parameter uncertainty is the same in all directions ofthe parameter space, the cautious settings amount to onesthat are closer to the center of the experimental design re-gion. On the other hand, for non-orthogonal experimentaldesigns such as in the example of Fig. 3, the cautious na-ture of the CRD solution is more nuanced. The effect ofparameter uncertainty is larger in certain directions or re-gions of the input space, and CRD automatically takes thisinto account by selecting x settings that avoid this.

    It has been shown that the CRD objective function,which is a Bayesian MSE, decomposes naturally into threecomponents: the first component is the square of the re-sponse mean deviation from target; the second componentis the certainty equivalence variance—i.e., what would re-sult due to noise and random error variability if there wereno parameter uncertainty; and the third component is theadditional variance due to the uncertainty in the param-eters, characterized by their posterior distribution. It hasbeen demonstrated that this decomposition is particularlyuseful in determining whether further experimentation isnecessary to reduce parameter uncertainty. The assessmentof parameter uncertainty is entirely objective oriented, inthe sense that the third component quantifies the extentto which parameter uncertainty inflates the MSE objectivefunction. The current authors believe that this is a verydirect and appropriate way of quantifying parameter un-certainty and facilitating efficient experimentation.

    A dual-response version of CRD has also been investi-gated. In situations with no parameter uncertainty, a dual-response criterion may have conceptual advantage over asingle MSE criterion, since the former constrains the re-sponse mean to be on target. In the presence of parameteruncertainty, however, the dual-response constraint that theposterior mean is on target is less meaningful: Enforcing theconstraint α̂ + β̂′g(x) = T will generally not ensure that theactual response mean α + β′g(x) equals the target for a spe-cific realization of α and β. Ironically, it could happen thatthe response mean using the CRD MSE criterion is closerto the target than when the constraint α̂ + β̂′g(x) = T is

    Downloaded By: [[email protected]] At: 16:21 1 April 2011

  • 482 Apley and Kim

    enforced. Furthermore, it was demonstrated in the exam-ple that enforcing the hard constraint that α̂ + β̂′g(x) = Tresults in settings that are substantially less robust to pa-rameter errors. For these reasons, the authors believe thatthe CRD MSE criterion of Equation (9) is preferable tothe dual-response criterion when parameter uncertainty islarge.

    Acknowledgements

    This work was supported in part by the National ScienceFoundation under grant CMMI-0758557. We also thanktwo anonymous referees and the Department Editor, Rus-sell Barton, for their many helpful comments.

    References

    Apley, D.W. (2004) A cautious minimum variance controller withARIMA disturbances. IIE Transactions, 36, 417–432.

    Apley, D.W. and Kim, J.B. (2002) A cautious approach to robust de-sign with model uncertainty. In Proceedings of the 2002 IndustrialEngineering Research Conference, IIE, Orlando, FL, paper 2164.

    Apley, D.W. and Kim, J.B. (2004) Cautious control of industrial processvariability with uncertain input and disturbance model parameters.Technometrics, 46(2), 188–199.

    Åström, K.J. and Wittenmark, B. (1995) Adaptive Control, second edi-tion. Addison-Wesley, New York, NY.

    Box, G.E.P. and Hunter, J.S. (1954) A confidence region for the solution ofa set of simultaneous equations with an application to experimentaldesign. Biometrika, 41, 190–199.

    Box, G.E.P. and Meyer, R.D. (1993) Finding the active factors in frac-tional screening experiments. Journal of Quality Technology, 25, 94–105.

    Bunke, H. and Bunke, O. (1986) Statistical Inference in Linear Models:Statistical Methods of Model Building, Volume I, Wiley, New York,NY.

    Chipman, H. (1998) Handling uncertainty in analysis of robust designexperiments. Journal of Quality Technology, 30, 11–17.

    Joseph, V.R. (2006) A Bayesian approach to the design and analysis offractionated experiments. Technometrics, 48, 219–229.

    Joseph, V.R. and Delaney, J.D. (2007) Functionally induced priors forthe analysis of experiments. Technometrics 49, 1–11.

    Khattree, R. (1996) Robust parameter design: a response surface ap-proach. Journal of Quality Technology, 28, 187–198.

    Kim, J.B. (2002) A cautious approach to minimizing industrial processvariability. Ph.D. dissertation, Department of Industrial Engineer-ing, Texas A&M University.

    Lucas, J.M. (1994) How to achieve a robust process using response surfacemethodology. Journal of Quality Technology, 26, 248–260.

    Miró-Quesada, G. and Del Castillo, E. (2004) Two approaches for im-proving the dual response method in robust parameter design. Jour-nal of Quality Technology, 36, 15–168.

    Miró-Quesada, G., Del Castillo, E. and Peterson, J. (2004) A BayesianApproach for multiple response surface optimization in the presenceof noise variables. Journal of Applied Statistics, 31, 251–270.

    Monroe, E.M., Pan, R., Anderson-Cook, C.M., Montgomery, D.C. andBorror, C.M. (2010) Sensitivity analysis of optimal designs for ac-celerated life testing. Journal of Quality Technology, 42(2), 121–135.

    Montgomery, D.C. (2001) Design and Analysis of Experiments, fifth edi-tion, John Wiley & Sons, New York, NY.

    Myers, R.H., Khuri, A.I. and Vining, G. (1992) Response surface al-ternatives to the Taguchi robust parameter design approach. TheAmerican Statistician, 46, 131–139.

    Myers, R.H. and Montgomery, D. (2002) Response Surface Methodol-ogy: Process and Product Optimization Using Designed Experiments,John Wiley & Sons, New York, NY.

    Nair, V.N. (1992) Taguchi’s parameter design: a panel discussion. Tech-nometrics, 34, 128–161.

    Ng, S.H. (2010) A Bayesian model-averaging approach for multiple-response optimization. Journal of Quality Technology, 42, 52–68.

    Peterson, J.J. (2004) A posterior predictive approach to multiple responsesurface optimization. Journal of Quality Technology, 36, 139–153.

    Peterson, J.J., Cahya, S. and Del Castillo, E. (2002) A general approachto confidence regions for optimal factor levels of response surfaces.Biometrics, 58, 422–431.

    Pignatiello, J.J. and Ramberg, J. S. (1985) Discussion of off-line qual-ity control, parameter design, and the Taguchi method. Journal ofQuality Technology, 17, 198–206.

    Rajagopal, R. and Del Castillo, E. (2005) Model-robust process optimiza-tion using Bayesian model averaging. Technometrics, 47, 152–163.

    Rajagopal, R., Del Castillo, E. and Peterson, J.J. (2005) Model anddistribution-robust process optimization with noise factors. Jour-nal of Quality Technology, 37, 210–222.

    Sahni, N.S., Piepel, G.F. and Naes, T. (2009) Product and process im-provement using mixture-process variable methods and robust op-timization techniques. Journal of Quality Technology, 41(2), 181–197.

    Shoemaker, A.C., Tsui, K. and Wu, C.F.J. (1991) Economical experimen-tation methods for robust design. Technometrics, 33, 415–427.

    Taguchi, G. (1986) Introduction to Quality Engineering, UNIPUB/KrausInternational, White Plains, NY.

    Vining, G.G. and Myers, R.H. (1990) Combining Taguchi and responsesurface philosophies: a dual response approach. Journal of QualityTechnology, 22, 38–45.

    Wu, C.F.J. and Hamada, M. (2000) Experiments: Planning, Analysis, andParameter Design Optimization, John Wiley & Sons, New York, NY.

    Biographies

    Daniel W. Apley is an Associate Professor of Industrial Engineering andManagement Sciences at Northwestern University, Evanston, IL. He ob-tained B.S., M.S., and Ph.D. degrees in Mechanical Engineering and anM.S. degree in Electrical Engineering from the University of Michigan.His research interests lie at the interface of engineering modeling, statisti-cal analysis, and data mining, with particular emphasis on manufacturingvariation reduction applications in which very large amounts of data areavailable. His research has been supported by numerous industries andgovernment agencies. He received the NSF CAREER award in 2001, theIIE Transactions Best Paper Award in 2003, and the Wilcoxon Prize forbest practical application paper appearing in Technometrics in 2008. Hecurrently serves as Editor-in-Chief for the Journal of Quality Technologyand has served as Chair of the Quality, Statistics & Reliability Sectionof INFORMS, Director of the Manufacturing and Design EngineeringProgram at Northwestern, and Associate Editor for Technometrics.

    Jeongbae Kim is the Director of the Service Innnovation Team at KoreaTelecom Headquarters. He obtained his Ph.D. in Industrial Engineeringfrom Texas A&M University.

    Downloaded By: [[email protected]] At: 16:21 1 April 2011