Calibration of a Monte Carlo simulation model of disease spread in slaughter pig units

15
Computers and Electronics in Agriculture 25 (2000) 245–259 Calibration of a Monte Carlo simulation model of disease spread in slaughter pig units Erik Jørgensen * Biometry Research Unit, Danish Institute of Agricultural Sciences, P.O. Box 50, DK-8830 Tjele, Denmark Abstract The use of new resampling methods to improve the handling of stochastic simulation models is demonstrated. As an example, we use a Monte-Carlo simulation model of disease spread within a slaughter pig herd. The model parameters reflect the disease spread and comprise, for example infection risk given diseases, and the positioning of the animals. The setting of the prior distribution of the parameters using expert knowledge is complicated, because the expert knowledge is generally based on the resulting dynamics rather than the underlying parameters. The paper shows how the prior distribution of model parameters can be made consistent with the knowledge concerning model output, using methods such as importance sampling and Markov Chain Monte Carlo techniques. Based on these methods, different management strategies are compared. © 2000 Elsevier Science B.V. All rights reserved. Keywords: Bayesian synthesis; Herd models; Importance sampling www.elsevier.com/locate/compag 1. Introduction Mathematical models of animal production systems, such as Monte Carlo simulation models, can be seen as representations of expert knowledge of the systems. This knowledge comprises model output, model parameters, and the structure and equations in the model. With the model, a diverse set of output variables can be predicted as a consequence of a set of decision rules. Examples of such models within animal production are Singh (1986); de Roo (1987); Sørensen et * Tel.: +45-89-991230; fax: +45-89-991819. E-mail address: [email protected] (E. Jørgensen) 0168-1699/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved. PII:S0168-1699(99)00072-1

Transcript of Calibration of a Monte Carlo simulation model of disease spread in slaughter pig units

Computers and Electronics in Agriculture

25 (2000) 245–259

Calibration of a Monte Carlo simulation modelof disease spread in slaughter pig units

Erik Jørgensen *Biometry Research Unit, Danish Institute of Agricultural Sciences, P.O. Box 50,

DK-8830 Tjele, Denmark

Abstract

The use of new resampling methods to improve the handling of stochastic simulationmodels is demonstrated. As an example, we use a Monte-Carlo simulation model of diseasespread within a slaughter pig herd. The model parameters reflect the disease spread andcomprise, for example infection risk given diseases, and the positioning of the animals. Thesetting of the prior distribution of the parameters using expert knowledge is complicated,because the expert knowledge is generally based on the resulting dynamics rather than theunderlying parameters. The paper shows how the prior distribution of model parameters canbe made consistent with the knowledge concerning model output, using methods such asimportance sampling and Markov Chain Monte Carlo techniques. Based on these methods,different management strategies are compared. © 2000 Elsevier Science B.V. All rightsreserved.

Keywords: Bayesian synthesis; Herd models; Importance sampling

www.elsevier.com/locate/compag

1. Introduction

Mathematical models of animal production systems, such as Monte Carlosimulation models, can be seen as representations of expert knowledge of thesystems. This knowledge comprises model output, model parameters, and thestructure and equations in the model. With the model, a diverse set of outputvariables can be predicted as a consequence of a set of decision rules. Examples ofsuch models within animal production are Singh (1986); de Roo (1987); Sørensen et

* Tel.: +45-89-991230; fax: +45-89-991819.E-mail address: [email protected] (E. Jørgensen)

0168-1699/00/$ - see front matter © 2000 Elsevier Science B.V. All rights reserved.

PII: S0168 -1699 (99 )00072 -1

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259246

al. (1992); Jørgensen and Kristensen (1995). When these models are used fordecision purposes, the use is often based on maximisation of expected utility orsome related measure, such as expected income. Within this framework it is naturalto consider the expert knowledge as a representation of the expert’s subjectiveprobability distribution. This concept can be generalised to any function of modeloutput, and therefore to any use of the models. Consequently, the whole range ofmethods within Bayesian statistics becomes available, for example model testing,updating and calibration of model parameters to new observations and estimationof the uncertainty in the model predictions. Until recently, this has not beenpossible due to the complexity of the calculations. However, advances withinBayesian statistics, and improvements in calculation speed have removed at leastsome of these obstacles.

Within the field of Markov Chain Monte Carlo (MCMC) simulation methods(see, e.g. Gilks et al., 1996), techniques for parameter estimation in highly struc-tured stochastic systems have been developed with several successful applications.Because of the complex interactions between the elements (e.g. animals andconfinement) in simulation models of animal production systems, they can onlyexploit these developments in special cases. It is often difficult/impossible to identifythe necessary conditional independency structures that are exploited in MCMCsimulation methods. This is mainly because decision making has to take herdcapacity restrictions into account, thus producing dependencies between the differ-ent elements of the system. Instead, other methodological developments have to beused. The so-called Bayesian Synthesis (BS) approach described in Givens (1993);Raftery et al. (1995) and related work seems a promising starting point.

The aim of the present paper is to illustrate the potential of such methods, usingan existing model of disease spread in slaughter pig production units as an example.

2. The simulation model

The simulation model used in the present context serves to illustrate the applica-tion of the Bayesian Synthesis method. Therefore, it is only described very briefly.The model is a simulation model of an animal production herd and can simulatethe effect of different production strategies. In addition, it can handle diseasespread within the herd. The model follows a long tradition within research inanimal production systems; early examples are Singh (1986) and de Roo (1987).Such a model is often called a Monte Carlo simulation model because Monte Carlosimulation is the only feasible method for calculating the parameters of interestusing the model. The present model was originally developed to study the informa-tion flow within pig production units, and the impact of different informationsystems and production management, as described in Jørgensen and Kristensen(1995). The model of the disease spreading is an add-on to this simulation model,and utilises its basic elements. The addition of disease spread has been made bydefining an additional model part, called a generic disease model (Jørgensen et al.,1995). The management part of the model is described in Jørgensen (1998b) and thegrowth model in Jørgensen (1998a).

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 247

As in other herd simulation models the herd is described consisting of theanimals and their different biological states, as well as the housing system, theconfinement. The rest of the production system plays a major role, i.e. theobservation, the updating and filtering of information, decision support systems,the decisions and the corresponding actions that are carried out. The model of theconfinement is detailed and includes the position of each pen in the herd. Prior tothe model calculations the initial state of the herd is specified.

Model calculations proceed using the next-event approach, i.e. the main updatingalgorithm consists of a queue of scheduled messages. A message (corresponds to anevent) consists of message type (e.g. movement of animal, observation of animal,infection), message time, message receiver (e.g. animal or manager identification)and message sender. The messages are sorted according to message-time. The firstmessage is processed and results in updating of some of the elements in the model,and generation of new messages. For example, a message to a manager to move aspecific animal will send messages to the previous pen that the animal has moved,to the new pen that the animal has been inserted, and to the animal that its positionhas changed. In addition, the system time is incremented to the message-time andthe calculations proceed with the next message in the queue. Each model element isupdated at least daily (in system time).

2.1. The generic disease model

The basic model thus describes the movement and positioning of the animals inthe herd based on management observations and decisions. Depending on themanagement strategy (Jørgensen, 1998b), this is well defined.

The additional disease model consists of two parts. One part that describes thedisease dynamics at the individual pig level. Another part that describes the spreadof germ cells in the herd. The disease model of the individual is structured with 16different disease states as illustrated in Fig. 1. Each state is a combination of fourdichotomous sub-states, i.e. Infected9 , Antibodies9 , Acute symptoms9 andInfections9 . (Note that only 10 of the 16 theoretical disease states actually occur.As an example, an animal has to be infected, before it starts germ release.)Similarly, the number of possible transitions between states is limited, as shown inFig. 1.

The spread of the disease within the herd depends on the layout of theconfinement. The confinement is defined as a hierarchy of departments, sectionswithin departments, pens within sections and pigs within pens, as illustrated in Fig.2. The risk of infection depends on the distance between the infectious pig and thepig at risk and whether they are in the same pen, the same section or in differentsections.

Several parameters are used to describe these model parts (Jørgensen et al., 1995).In the present context only a subset of the parameters will be studied. Theparameters can be summarised as follows. The first parameter (F1) describes theperiod between the infection of the pig and the subsequent release of germ cellsfrom the pig. Another important parameter is the risk of infection (F2), i.e. if a pig

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259248

is in contact with an infectious pig, what is the risk of becoming infected during agiven period. Other parameters describe the spread of the disease, i.e. how muchcontact is there between neighbouring pens (F3), how important is the airborneinfection compared to the contact infection (F4), how much does the airborneinfection pressure decrease with distance (F5).

Based on these parameters the result of the model can be expressed in severaloutput parameters, such as the number of pigs produced, daily gain and averageslaughter weight. In this paper only number of infected pigs, V, is used. Thesevalues correspond closely to the production traits that can be observed in pig units,and can easily be used for economic evaluation of the cost of the disease underdifferent management strategies. As the term generic disease model implies, theunderlying model does not depend on the actual disease in question. The parametervalues in the model do, however, differ between diseases. Our initial modellingeffort has been concentrated on the disease Atrophic Rhinitis (AR), partly becausean attempt at modelling this disease has already been made (Turner et al., 1993).

Fig. 1. Disease states and state transitions in the generic disease model. The grey area signifies diseasestates.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 249

Fig. 2. Layout of the confinement.

2.2. Comparison of pen partitioning types

The example used in this paper is based on our modelling efforts for AR. As themethod shown has been implemented during this model building, the actual processhas not been as straightforward as we present it.

An experiment was carried out in the Danish Applied Pig Research (DARP)scheme (Pedersen et al., 1995), where the effect of closed pen partitionings wasstudied in commercial slaughter pig units. Usually pigs are housed in pens withopen partitionings, thus allowing contact between the pigs in neighbouring pens.The experiment studied the effect of the pen partitionings within each unit, i.e. bothsections with closed partitionings and open partitionings were present in theexperimental herds.

This experiment was used as a starting point. The intention was to study theeffect of pen partitionings on the initial disease spread in the herd, with the use ofthe model. Usually, such a question can be investigated only under very expensiveexperimental conditions. The mechanistic approach allowed us to model the effectof closed pen partitionings simply by setting the parameter F3=0 (relative contactinfection risk in the neighbouring pen). A housing system consisting of a sectionedproduction system with ten pens and 16 pigs within each pen was defined. Theproduction cycle consisted of keeping the growing pigs in the section for a 105-dayproduction cycle. The production management closely resembled usual sectionedmanagement in Denmark. In each of the simulation runs a single newly infected pigwas introduced (in state 8 in Fig. 1) to the other 159 non-infected pigs (in state 0

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259250

in Fig. 1). The output variables, Vij: the number of infected pigs, were in the rangefrom 0 to 159; i denotes the type of pen partitioning, i.e. 1: Open pen partitioningsand 2: Closed pen partitionings ; j is the simulation run.

The initial project work started out with the traditional approach used whenbuilding simulation models, but the shortcomings of this approach lead us to aredefinition of the concept behind our modelling. New techniques within the field ofBayesian statistics were adapted, mainly inspired by the work of Raftery et al.(1995) and described in detail in Givens (1993). Also the developments withinMCMC have been used (Gilks et al., 1996).

3. Method

To present the approach used, we will briefly give a formal definition of MonteCarlo simulation method and then discuss some techniques that can be applied. Anelaborate description of the applied techniques for resampling can be found inJørgensen (1998c).

3.1. Elements of the Monte Carlo simulation method

Initially, we will present some elements of the Monte Carlo simulation method,mainly to introduce the notations followed in the paper. Details of the Monte Carlosimulation methods can be found in textbooks such as Fishmann (1996).

Essentially, the Monte Carlo simulation method is a method for evaluating anintegral

C=Ep{U(X)}=&

U(x)p(x) dx (1)

where Ep() is the expectation with respect to the probability density p and U() issome response function, e.g. a utility function. The evaluation involves generatingrandom draws X=x ( j ) from the target distribution p and then estimating C by

C. =1k

{U(x (1))+…+U(x (k))} (2)

In our context, X={U, F} is a vector consisting of decision parameters, U, andsystem parameters and state variables, F. The Monte Carlo method is thus anumeric method for evaluating the integral in Eq. (1). In addition, if the randomdraws, x ( j ) are independent, we can easily obtain an estimate of the error of theapproximation, using the Central Limit Theorem, (see e.g. Fishmann, 1996, section2.2).

Often, it is an advantage to reformulate the integral in Eq. (1) by splitting F intothe so-called state of nature, F0, and parameters and state variables F ·s={F1s,F2s,…,FTs} that are calculated by the model. (The additional index denotes modelstep, e.g. model time.) A subset of F ·s, V is called the output of the model. Thissplitting of the parameter vector leads to a reformulation of Eq. (1)

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 251

C. =Ep 0{Eps�0{U(X)}}=

&!&U(x)

p(x)p0(F0)

d{U, F · s}"

p0(F0) dF0 (3)

where Eps �0{U(X)} denotes the conditional expectation of U(X) for a given state of

nature, F0.The dimension of F0 will in general be fixed by the model structure, while the

number of elements in F ·s will vary with different decisions and different combina-tions of the other elements in F. Disregarding the problem of dimensionality, theintegration with respect to F0 is well behaved and lends itself to techniques otherthan simple Monte Carlo simulation. In contrast, the integration with respect toF · s is of a complexity that is only feasible to solve using the Monte Carlo method.(Note that the dimension of F0 in such models is often in excess of 100, so eventhough it is well behaved, the evaluation of the integral is complicated.)

In the disease model, F0={F1,…,F5,…}. The first five parameters are describedin the previous section. Several other parameters are included in the model, such asthe parameters describing growth (e.g. expectation and variance in growth rate),layout of the confinement, weighing precision and reduction in growth rate as aresult of infection. The number of infected pigs is one element of the outputvariables. F ·s comprises e.g. position of each pig in the herd, live weight at eachtime step, time of infection of each pig, observed live weight of each pig to namea few. All these variables may affect the output variable. The decision variables aree.g. closed and open pen partitionings. The rest of the decision variables are keptconstant in the present context, but comprise e.g. length of section cycle, deliverystrategy, etc.

The model can be seen as a representation of expert knowledge concerning thecausal structure (U()) the parameter values in the state of nature (F0) and the output(V). Expert knowledge has contributed to the structure of the model, and based onexperimental results prior distributions of values of the model parameters (F0) canbe found. However, the expert knowledge is often based on the elements of V, orrather E(V�U0 ), where U0 is a subset of the possible decision combinations. (Theextrapolation from knowledge based on U0 to decisions in general is typical for themodeling approach. The use of the mechanistic simulation models for this extrapo-lation is considered more robust than the use of simple empirical models.)

We thus have two sources of information concerning the output parameters ofthe model. The first source is the model calculations, i.e. based on expert knowledgeconcerning model structure and state of nature (the parameters F0). The secondsource is the expert knowledge concerning the actual (observable) values of outputparameters. Raftery et al. (1995) introduces the concept post-model distribution tothis joint distribution, while the term pre-model distribution refers to the priordistribution of F0 before the information based on the output parameters has beenused. In addition we have to distinguish between two types of knowledge. Expertknowledge is considered to refer to knowledge concerning the expected level of theoutput values (v=E(V)), while obser6ational knowledge is considered as the fullprobability distribution of the output values, e.g. arising from observations of thesystem.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259252

The problem the model developer faces is, how to make this knowledge consis-tent, i.e. to specify the correct joint prior distribution of p(F0, V�U). This proce-dure is often referred to as calibration of the model. The usual approach can besummarised as follows. The prior distribution of the parameters pa(F0) is specified.Then k model calculations are carried out using random drawings from this priordistribution, and resulting values of the (observable) output variables are calcu-lated, i.e. the simulation runs calculate S(0)={x ( j )}. If the distribution of the outputvariables does not ‘seem’ likely, the prior distribution of the parameters is adjustedand another k simulation runs are carried out, i.e. S(1) is generated. This process isiterated with corresponding new S(i ) until the model builder and the user of themodel are satisfied with the validity of the model. In fact with models of animalproduction systems, it is customary to use point estimates of the parameters in thestate of nature, F0, and only include stochasticity in the parameters and statevariables in F · s. This approach is dubious, especially when the model is used fordecision support purposes, because the risk in the decision making isunderestimated1.

The calibration is a series of refinements of the sampling strategies until asatisfactory result is obtained, i.e.

S(0)�S(1)� ···�S(n−1)�S(n) (4)

To be able to evaluate the calibration objectively, we need to know the relationbetween subsequent iterations and the stopping rule, i.e. the criteria for comparisonbetween S(n−1) and S(n) that leads to acceptance of the result.

The techniques presented in the following are a formal approach to these modelbuilding steps. They ensure that the process is coherent and can be documented. Inaddition, they allow for a reuse of the costly simulation runs.

3.2. The resampling approach

When the initial series of simulation runs has been made, the resulting data setS(0) is a random sample from the joint prior distribution, pa, used in the samplingprocess (e.g. the pre-model) distribution. At each point we can estimate thecorresponding density pa(x ( j )), at least in principle, as will be described later on.

If we are interested in a sample from another probability distribution, pp (e.g. thepost-model density), we do not need to discard S(0) but can reuse the elements. Foreach sample of simulation runs S(i ) we can define the sampling density p (e.g. pa)that generated the sample and the target density that we wish to produce p̃ (e.g. pp).

The basic method is to estimate the proportion between p̃(x ( j)) and p(x ( j )) (insome cases only up to a normalising constant) in each of the sampled points. If weare interested in e.g. estimating the expected utility based on this density we can useimportance sampling (i.e. weigh each observation with the weight factor when theaverage is calculated). If we are interested in a new sample that follows the new

1 In deterministic models, not even this stochasticity is included.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 253

density we can use acceptance–rejection sampling, i.e. to select observations fromthe old data set with a probability depending on the weight factor. This process canbe done several times using the SIR algorithm (Rubin, 1987). Using the SIRalgorithm (Sampling/Importance Resampling) the individual observations in thedata set are no longer independent, i.e. some observations will be replicated. TheMetropolis–Hastings independence sampling can be used to draw samples from thedensity by keeping the next observation in the data set, if it is more likely (i.e. hasa higher weight) than the current. Otherwise, the current observation is replicatedwith a probability depending on the ratio of the weight factors. The Metropolis–Hastings algorithms will also result in dependency between observations. See Liu(1996) for a short review and comparison between some of the techniques.

The application of the method for revised prior distribution of F0 is straightfor-ward, because the prior density of the input parameters, pa(F0) will be known, asit is input to the model. The joint density distribution can be expressed as pa(F0,V)=pa(F0)p %(V�F0), while the requested ‘new’ joint distribution pp(F0, V)=pp(F0)p %(V�F0). The corresponding weight factor

w8pp(F0)p %(V�F0)pa(F0)p %(V�F0)

=pp(F0)pa(F0)

In the simple case with independent prior distribution of the parameters, pa(F0)and pp(F0) are simply the product of the density of the sampled values of eachindividual parameter. In this case, estimation of model sensitivity to parameterchanges is an obvious possibility2. In Jørgensen (1998c) the use of the techniques isillustrated in detail.

If we want to include knowledge concerning model output, the situation getsmore complicated. The prior distribution pa only specifies the distribution of F0,while we need the joint distribution p(F0, V) from the model calculations in orderto weigh the observations correctly with the post-model distribution as targetdensity. The joint density distribution needs to be estimated based on the modelruns. In general, density estimation in a high dimensional parameter space is nottractable. But the special problems that are found within the herd simulationmodels will often make it possible to base the inferences on density estimation withlow dimensionality; often univariate kernel density estimation (Scott, 1998) willsuffice.

In the case of resampling from S(0) only the marginal distribution of the outputparameters needs to be estimated. With obser6ational knowledge it can be baseddirectly on the output values. In case of expert knowledge an intermediate step isneeded in order to calculate the expected value of the output value given F0, i.e.v: f(F0), where f(F0) may be any function, e.g. an nth-order polynomial to obtaina function that correspond to an nth-order Taylor expansion of the simulationmodel. The estimation of the coefficients in the polynomial can be made by e.g.least square minimisation based on the model

2 Note that the so-called score function approach (Rubinstein and Shapiro, 1993), is another relatedchoice for sensitivity estimation.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259254

Y= f(F0)+o (5)

where Y is equal to V or a transformation of V. For every observation in S(0) wecan now calculate vi= f(F0i) and subsequently estimate p(v) using kernel densityestimation. Note that Raftery et al. (1995) is only concerned with deterministicmodels. The modelling of v �F0 is thus trivial in their case.

The method thus produces a sample from the post-model distribution based onthe existing simulation runs in S(0) at least if f(F0) is an adequate approximation.This sample can be found without making additional costly simulation runs. Thisapproach is illustrated in the following.

However, usually only a small fraction of the elements of S(0) will be used,because most of the sampling weights are close to zero. Therefore resamplingapproaches that can be used to generate new observations from the post-model witha higher success rate, are of interest. In Jørgensen (1998c), such methods that canbe used at least in the case of expert knowledge are described. The methods arebased on the Metropolis–Hastings random-walk algorithm (see e.g. Gilks et al.,1996, p. 9). Using these methods, it is possible to iteratively refine the samplingprocedure as illustrated in Eq. (4) and to define an objective stopping rule based onthe approximating function in Eq. (5).

4. Results and discussion of simulation study

4.1. Prior distributions of model parameters

The prior distribution of the parameters was quantified following initial discus-sions with domain experts. The parameters were transformed into somethingmeaningful to the experts. In the model, the parameter F1 is actually the transitionintensity between the state Infected, but not infectious and Infected and infectious(state 8�9 in Fig. 1). This was transformed into the median time from infection toinfectious. Similarly, the daily infection rate with one infectious pig in the pen wastransformed into the infection rate in 100 days, corresponding to the productionperiod of the animal, and similarly for the other parameters. The resulting priordistributions are shown in Fig. 3a–e, as the pre-model distributions.

With respect to the resulting spread of the infection the experts seemed moreconfident, although they differed slightly. The prior distribution, which tries tocapture their different opinion, is shown in Fig. 3f.

4.2. Simulation runs

Initially 17273 simulation runs were performed generating the data set S(0). Theduration of each simulation run was approximately 100 s. Based on these 1727

3 The number of simulations was the result of a fixed schedule for calculations instead of a fixednumber.

E.

Jørgensen/

Com

putersand

Electronics

inA

griculture25

(2000)245

–259

255

Fig. 3. Prior distribution (pre- and post-model) of the parameter values in the model.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259256

simulations, a logistic quadratic response surface was fitted to the data to obtainthe local approximation to f(F0):Eps �0

{V�F0} in Eq. (5). The expected number ofinfected animals was predicted for each of the 1727 simulation runs, and the densityof the expected number estimated using a kernel smoother, and shown in Fig. 3f(the model distribution). This pre-model density was combined with the expertopinion and resampling with the SIR algorithm was used. Ten times repeatsampling of the whole database left a total of 1375 or approximately 8% of theinitial simulation runs per sampling cycle. Approximately 15% of the initialsimulation runs are represented in this sample with an average representation of5.4% of those included; 2.4% of the initial runs are included in every resamplingcycle. Resulting kernel estimates of the parameter distributions were made and arepresented in Fig. 3 as post-model distributions.

4.3. The re6ised distribution

As shown in Fig. 3f, the distribution of the expected number of infected animalswas not very informative based on the pre-model distribution of the parameters. Asa result, the resampling weights depended almost entirely on the expert’s priordistribution of the output values. The distribution of parameter F2 (Fig. 3b) ismuch affected by the introduction of the expert knowledge concerning the output,while the other parameters seem relatively unaffected. Though the marginal distri-butions were modified only slightly, dependencies between the variables are intro-duced, as shown in Fig. 4. These dependencies are very sensible, e.g. a givennumber of infected pigs can be a result of either a high risk of being infected foreach pig, or a high risk of getting infected from other pigs. The dependencybetween F2 and F4 is much lower.

Note that the experts’ distribution of the output variable could have been fitted,simply by changing the prior distribution for one of the input parameters. Probablyusing a prior distribution of parameter F2 resembling the post-model distribution inFig. 3b would have sufficed. This would, however, have falsely maintained theindependency between the parameter values.

Fig. 4. Post-model dependency between (a) F2 and F3, (b) F2 and F4.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 257

Fig. 5. Comparison of pen partitioning type using post-model distribution. The vertical lines signify themean number of infected pigs for each partitioning type.

4.4. Results of comparison of pen partitioning type

The effect of closed pen partitionings was to eliminate the contact spreadbetween neighbouring pens, i.e. F3=0. Here the simulation model had to be runagain with this value for F3. However, the other parameters should follow thepost-model distribution. Again the SIR approach was used. The sampled observa-tions were used as initial values for new simulation runs except that F3j=0. Thisgenerated V2j. V1j was already in the data set. Thus a series of paired observations(V1j, V2j) was generated. The resulting distribution is shown in Fig. 5. In total 831additional simulation runs were obtained. Because of the negative correlationbetween F2 and F3 as shown in Fig. 4a, V2, has a high variation. If this correlationwas not taken into account, e.g. if the calibration had been done simply bychanging the distribution of F2, the result would have been a lower variation of V2,because the variation due to F3 had been removed. Of course this would be againstcommon sense, the highest precision would be expected where the expert knowledgehad been used in calibration, i.e. V1.

5. Conclusion

The methods presented have removed some of the trial and error feeling from themodel building. Currently, we are trying to obtain the subjective probabilitydistributions from other domain experts and other diseases in order to broaden theapplication of the method.

The methods seems an ideal framework to use, when working with the complexmodels that are used within research in animal production systems. These modelswill often be very detailed because the researchers want to maintain an almost oneto one correspondence between the real system and the model, although a muchsimpler model with only a few parameters could usually make predictions just aswell, at least within a restricted problem/decision domain.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259258

The possibilities for reuse of the simulation runs can be used for studying a widerange of questions, e.g. sensitivity analysis and importance of different expert views.Givens (1993) lists several other possibilities. The method can also be applied toMarkov Chain models, such as Jalvingh et al. (1992, 1993) and Saatkamp (1995) toinclude parameter uncertainty in the evaluation of strategies. In fact, the Leslie-ma-trix growth model studied in Raftery et al. (1995) and the epidemiological modelstudied in Givens and Hughes (1995) are very similar to these models.

One obvious application of the method is to adapt a simulation model to theproduction level of an individual herd. A general database can be generated usingthe time-consuming simulation model runs. Using this database, the adaptation todifferent herds can be made simply by weighting the elements of the generaldatabase according to the level of each herd. This possibility could be explored foron-farm use of the simulation models or even for webbased solution, where thestoring and handling of large general databases should be without problems, andthe response times should be adequate for on-line purposes. This may simplify theuse of the detailed simulation model for practical applications, and remove the needfor other methods of reducing calculation time such as long (e.g. weekly ormonthly) time-steps, or only few replicates for evaluation of each set of decisionparameters.

With the current high activity in research in numerical methodology within therelated fields, Bayesian statistics and analysis of highly structured stochastic sys-tems, further improvements and potential applications should be expected as futurespin-off.

Acknowledgements

The research was conducted within the framework of Dina, the Danish Informat-ics Network in the Agricultural Sciences and was part of a Research Project,Respiratory Disease in Pigs: Effects of the Production System on Health, InfectionPressure and Production Economy financed by the Danish Ministry of Food andAgriculture.

References

Fishmann, G.S., 1996. Monte Carlo. Concepts, Algorithms and Applications. Springer, New York.Gilks, W.R., Richardson, S., Spiegelhalter, D.J., 1996. Markov Chain Monte Carlo in Practice.

Chapman and Hall, London.Givens, G.H., 1993. A Bayesian framework and importance sampling methods for synthesizing multiple

sources of evidence and uncertainty linked by a complex mechanistic model. Ph.D. dissertation,Department of Statistics, University of Washington, Seattle, WA.

Givens, G.H., Hughes, J.P., 1995. A method for determining uncertainty of predictions from determin-istic epidemic models. In: Anderson, J.G., Katzper, M. (Eds.), Proceedings of Health Sciences,Physiological and Pharmacological Simulation Studies, Proceedings of the 1995 Western Multicon-ference. Society for Computer Simulation, San Diego, CA.

E. Jørgensen / Computers and Electronics in Agriculture 25 (2000) 245–259 259

Jalvingh, A.W., Dijkhuizen, A.A., van Arendonk, J.A.M., Brascamp, E.W., 1992. An economiccomparison of management strategies on reproduction and replacement in sow herds using adynamic probabilistic model. Livestock Prod. Sci. 32, 331–350.

Jalvingh, A.W., van Arendonk, J.A.M., Dijkhuizen, A.A., Renkema, J.A., 1993. Dynamic probabilisticsimulation of dairy herd management practices. 2. Comparison of strategies in order to change aherds calving pattern. Livestock Prod. Sci. 37, 133–152.

Jørgensen, E., 1998a. Stochastic modelling of pig production. Working Paper: Growth Models. DinaNotat, 73 pp. 1–23. Dina Foulum, P.O. Box 50, DK-8830 Tjele, Url: http://www.sp.dk/�ej/dinapig/manage.ps

Jørgensen, E., 1998b. Stochastic modelling of pig production. Working Paper: Management model.revision no. 1.0. Dina Notat, 71, pp. 1–30. Dina Foulum, P.O. Box 50, DK-8830 Tjele, Url:http://www.jbs.agrsci.dk/�ejo/dinapig/growth.ps

Jørgensen, E., 1998c. Techniques for Reuse of Monte Carlo Simulation Runs. Dina Notat, 77, pp. 1–34.Dina Foulum, P.O. Box 50, DK-8830 Tjele, Url: http://www.jbs.agrsci.dk/�ejo/dinapig/montcar.ps

Jørgensen, E., Kristensen, A.R., 1995. An object oriented simulation model of a pig herd with emphasison information flow. In: FACTs 95 March 7–9, 1995, Orlando, FL, Farm Animal ComputerTechnologies Conference, pp. 206–215.

Jørgensen, E., Kristensen, A.R., Vestergaard, E.-M., 1995. Modelling of Respiratory Disease. Workingpaper: Generic Disease Model. Dina Notat, 40, pp. 1–18. Dina Foulum, P.O. Box 50, DK-8830Tjele, Url: ftp://ftp.dina.kvl.dk/pub/Dina-reports/notat40.ps

Liu, J.S., 1996. Metropolized independent sampling with comparisons to rejection sampling andimportance sampling. Stat. Comput. 6, 113–119.

Pedersen, B.K., Ruby, V., Jørgensen, E., 1995. Organisation and application of research and develop-ment in commercial pig herds: The Danish approach. In: Hennesy, D.P., Cranwell, P.D. (Eds.),Manipulating Pig Production V. Proceedings of the Fifth Biennial Conference of the AustralasianPig Science Association (APSA). Canberra, November 26–29, 1995.

Raftery, A.E., Givens, G.H., Zeh, J.E., 1995. Inference from a deterministic population dynamics modelfor bowhead whales. J. Am. Stat. Assoc. 90 (430), 402–430.

de Roo, G., 1987. A stochastic model to study breeding schemes in a small pig population. Agric. Syst.25, 1–25.

Rubin, D.B., 1987. Comment on ‘The calculation of posterior distributions by data augmentation’. J.Am. Stat. Assoc. 82, 543–546.

Rubinstein, R.Y., Shapiro, A., 1993. Discrete Event Systems. Sensitivity Analysis and StochasticOptimisation by the Score Function Method. Wiley, Chichester.

Saatkamp, H.W., 1995. Simulation studies on the potential role of national identification and recordingsystems in the control of Classical Swine Fever. Ph.D. thesis. Department of Farm Management,Agriculturel University, P.O. Box 338, 6700 AH Wageningen, The Netherlands.

Scott, D.W., 1998. Density estimation. In: Armitage, P., Colton, T. (Eds.), Encyclopedia of Biostatistics.Wiley, Chichester, pp. 1134–1139.

Singh, D., 1986. Simulation of swine herd population dynamics. Agric. Syst. 22, 157–183.Sørensen, J.T., Kristensen, E.S., Thysen, I., 1992. A stochastic model simulating the dairy herd on a PC.

Agric. Syst. 39, 177–200.Turner, L.W., Wathes, C.M., Audsley, E., 1993. Dynamic Probabilistic Modeling of Atrophic Rhinitis

in Swine. Paper No. 93-4559. In: Proceedings 1993 International Winter Meeting of ASAE, Chicago,IL, pp. 1–19. ASAE, 2950 Niles Rd., St. Joseph, MI 49085-9659, USA.

.