Kriging metamodeling for simulation

(1)

Tilburg University

Kriging metamodeling for simulation

van Beers, W.C.M.

Publication date:

2005

Document Version

Publisher's PDF, also known as Version of record

Link to publication in Tilburg University Research Portal

Citation for published version (APA):

van Beers, W. C. M. (2005). Kriging metamodeling for simulation. Tilburg University Press.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

53976

..

(3)

/3.-/-1

UNIVERSITEIT * 2]F *

.16----4

;AN TILBURG SIBLIOTHEEK

1 TILBURG

STELLINGEN

behorende bij het

proefschrift

Kriging

Metamodeling

for

Simulation

door Wim C.M.

Van

Beers

1. In_{'discrete-event' simulatie geeft}

_Kriging

betere_{voorspellingen dan} polynomiale regressiemodellen.

(dit proefschrift)

2.

Kriging

interpolatieisrobuusttenaanzien vandeklassieke modelveronderstelling datde _{respons een}constantevariantieheeft.

(dit proefschrift)

3.

Voor

tijdrovende simulaties

zijn

modelafhankelijke

sequentiele proefopzettenteprefereren boven klassieke proefopzetten.

(dit proefschrift)

4. Hoewelhetsimuleren vande_gemiddelde

_{wachttijden voor}

_{een gegeven}

bezettingsgraad van een M/M/1 model door middel vaneenniet-lineaire

stochastische

_{differentievergelijking uit principiele}

_{overwegingen de} voorkeurheeft, is hetvoor toetsdoeleindene

fficient

eneffectief om de analytisch verwachte wachttijdenteverhogenmetNHD-trekkingen.

(4)

(5)

KRIGING METAMODELING

FOR

SIMULATION

Proefschrift

ter verkrijging vandegraad vandoctor

aandeUniversiteitvan_Tilburg,

op gezag vande_{rector magnificus, prof. dr. F.A. van der Duyn}_Schouten,

in_{het openbaar}te_{verdedigen ten overstaan van}

een door het_{college voor promoties}_aangewezencommissie in de aula vandeUniversiteit

(6)

BIBLIOTHEEK TILBURG

(7)

Acknowledgment

This thesis istheresult of myresearchduring_{six years in the Operations Research} group oftheDepartment

of

InformationSystems andManagement intheFaculty of Economics and BusinessAdministrationat

_Tilburg

_{University. During that time I met} many peoplewho_{supported me} _{in various ways. They made my}_stay_productive

and-not in

theleast-pleasant. To some of them I_{owe special thanks.}

First of all, I

amextraordinarily grateful to mysupervisor, Professor Jack Kleijnen.Heguidedmethroughthefascinating world

of

randomsimulation and taught me to search for the

intuition

_{('wiedes') behind}_{the results. We had many}

valuable discussions, andheencouraged me to develop the ideas that we discussed in such a way thatthey couldbe_published_{in international journals. I}am_{grateful for the} manycriticalcommentshe

made-in

_{red, green, and blue. The}existence ofthis thesis is largely due to him.

Alsospecial thanks are due to themembers of myPh.D.committee: Professors H.Daniels, D._{den Hertog, T. Dhaene,}J._{Rooda, and H. Wynn}

_for

_their

carefulreading,thoughtfulsuggestions,andcorrections. Ifeel honored to have them in my committee.

Among alltheotherswho_supportedme_{during my}Ph.D._{study, I particularly} wishtothank ourdaughter,Miranda. Her typicalmental support was a real

encouragement to me.

Last butcertainlynot least, I am mostgrateful to my wife,Marianne. She patiently gave me all the time_{and space I needed to venture}_into_those_{things that I} foundfascinating._{There are no words to describe the}_emotional_and_{loving way she} supportedme.Undoubtedly,thebiggest thanks are for her!

(8)

1 Introduction

to

_Kriging

_Metamodeling

for

Simulation 1

1.1 _{Real systems and mathematical}

_{models 1}

1.2 Simulationtypes 2

1.3 Metamodels 3

1.4

_Kriging

₄ 1.5

Kriging

applications area 5

1.6 Summary

of

thesis 6

References 7

2 Kriging for

Interpolation

in

Random

Simulation 11

2.1 _{Introduction 11}

2.2 Kriging 14

2.2.1 History

of

Kriging 14

2.2.2 Basics

of

Kriging 14

2.2.3 Formalmodel

_{for Kriging 15}

2.3 Detrended

_{Kriging 19}

2.4 Two_{examples and}

five

_{metamodels 20}

2.4.1 Example I: M/M/1

hyperbola 23

2.4.2 Example

I[:

fourth-degreepolynomial 24 2.5 Nugget

_{effect 25}

2.6 Conclusionsandfuture

research 26

References 27

3 Robustness

_{of Kriging}

when

_{Interpolating in}

Random

Simulation with

Heterogeneous

Variances 29

3.1 _{Introduction 29}

3.2

_Kriging

_{Basics 31}

3.3 Randomsimulationwith_{heterogeneous variances ...}_{... 32}

3.4 Novel

_Kriging

method:

_{Studentization 35}

3.5 Numerical

examples 36

3.6 Conclusionsandfurther research 43

References 44

(9)

4

_{Application-driven}

SequentialDesigns

for

Simulation Experiments:

Kriging

Metamodeling 47

4.1 Introduction 47

4.2

Kriging basics . 50

4.3 Doeand

_Kriging

52 4.4 _{Application-driven}_sequential

design 53

4.4.1

Pilot

input

combinations 53

4.4.2 Candidate_{input combinations 55}

4.4.3 Cross-validation 55

4.4.4 Jackknifing 56

4.4.5 Sequentialization 57

4.4.6 Stopping rule 58

4.5 Two examples 59

4.5.1 Example I: ahyperbolicVOfunction 59 4.5.2 Example H:a_{fourth-order polynomial}

I/0

function ... 61

4.5.3 Estimated variograms: Gaussian versuslinear ... 63

4.6 Conclusions andfurtherresearch 63

References 64

5 Customized Sequential Designs

for

Random

Simulation Experiments:

Kriging

Metamodeling

and

Bootstrapping 67

5.1 Introduction 68

5.2

Kriging

Basics 70 5.3 DOEand

Kriging 73

5.4 Sequential DOE 75

5.5 Two examples 79

5.5.1 M/M/1 model 79

5.5.2 (s,S)inventory

model 89

5.6 Conclusionsandfuture

research 92

5.7 Appendix

92

References 94

(10)

6.2 Future

research 99

Samenvatting (Summary

in Dutch)

103

(11)

Chapter 1 Introduction

to

_Kriging

_{Metamodeling for}

Simulation

1.1 Real

_systems

and

mathematical

models

Many scientificdisciplinesusemathematical models(including simulation;see_{below) to}

describecomplicated real systems. The goal ofsuchmodelsisgetting more insight into the real system,to answer questions such as: What is theoutput's sensitivity tothe inputs; what is the optimal combination ofthe_input values?

To obtainsuch_{insight, analytical methods (e.}g.,differentialcalculus) often turn out to fail. In suchcases,_{numerical methods may}betried; i.e., experimentation with the model may answerthequestions about the realsystem.This experimentation often implies thatthemodel is converted intoa_{computer code (or computer program) that is run for}_a_number

_of

_different_input

combinations. Next, the resultinginput/output

(I/0)

behavior ofthe model shouldbe_analyzed.

We emphasize that theselection oftheinputcombinations shouldbeguided byscientific principles (e.g., changingoneinput at a time canbeproven tobeineffective

if

inputs interact, and inefficientelse). Theseprinciplesare_{investigated in mathematical statistics under the name}

Design OfExperiments (DOE). In numerical experiments (as opposed to experiments with the

realsystem)theseprinciplesneedadjustment;seeKleijnen et al. (2005)andChapters 4 and 5. We focus on computer expensive simulation experiments; i.e., experiments that require much computer time. Inthesesituations, DOE iscertainlyneeded.

(12)

analysis requiresformalmethods. Most popularareLeastSquares(LS)curvesfitted to the I/0

data.We, however, focus onanalternative method knownas_Kriging;seeSection 1.4. Next,weshall discuss the key terms inthetitle ofthis chapter andthisthesis.

1.2 Simulation

_types

A

simulation_{model might be}a_{physical model, e.g.,} ascalemodel ofa_{racing car in a wind}

tunnel. We, however,

limit

ourselvestomathematical models (also seethepreceding section).

Following Kleijnen (1974), we define simulationasexperimenting with a model over time.This

definitionindicates thatthevariabletimeplaysaspecial roleinsimulation. Indeed, Law and

Kelton (2000) also emphasize the role of time, by using the term dynamic simulation. Besides this simulation type, there is static simulation, in which time plays no role; an example is the Monte Carlo (MC)method.By definition, the MCmethodusesPseudo-Random Numbers (PRN);i.e., computergeneratednumbers between zero and one thatareindependent and

uniformly

distributed over this interval. We shall also useMCmodels in this dissertation (see Chapters 2,3, and 4).

Therearedeterministicandrandom simulationmodels. Ifthesimulationmodel contains no random components, the simulation modelis_{called deterministic. For example, such}

simulationmight solve a set

of

complicateddifferentialequations describingtheairflowaround air_{wings. Simpson et al. (2001) mention applications}

of

deterministic simulation invarious disciplines.Typically, in this type

of

simulation,aninputcombinationneedssimulation only once(repeatedcomputer runs with thesameinputcombination give exactly thesameoutput value, provided we do not make any changes inthecomputer software or hardware).

(13)

Introduction to Kriging Metamodeling for Simulation 3

simulation type. Note that we donotstudy dynamic,deterministic_{simulation models (often used} inComputerAided_{Engineering, CAE; see}for_{example De Geest et al. (1999)).}

Table1: Simulation typeswithappropriate dissertationchapters

Deterministic Random

Static _{Chapter 4} _{Chapters 2 and 3}

Dynamic Chapter 5

Finally,wediscussathird waytodistinguish various simulationtypes:continuousversus

discrete-event simulation. Continuous simulation experiments with models consisting of differential_equations(so statevariableschangecontinuously). Computercodes_approximate thesedifferentialequations by difference equations.Thesemodelsare_{usually deterministic (see}

the preceding discussion, summarized in Table 1).Obviously,random elements maybeadded to thedifferentialequations;forexample, econometric models may consist

of

difference equations plusadditivenoise.

Discrete-event simulationhas statevariables thatchangeinstantaneously,atpoints in time that are not necessarily equidistant;forexample, customers arrive ataserveratrandom_{points of} time.Anevent isthe change inthesystem'sstate;forexample, the number

of

waitingcustomers

increases.Randomsimulation_{includes discrete-event event simulation; see Van}Beers and

Kleijnen (2005).

Simulation-in its

_many

forms-is

_{applied in many}scientific_{disciplines. We focus on} thedisciplineknownas_{Operations Research/Management Science (OR/MS). In OR/MS,}

simulation isalsooftenapplied-,eventhoughsimulationisdescribedasmethod of last resort 'when allelse

fails...' in

thefamousChapter 21 inWagner (1975).

1.3 Metamodels

A metamodel is also calleda_response_su,face,_auxiliary_{model, emulator,}etc._Metamodeling

originatesfrom neuro-linguistics inthebeginning ofthetwentieth century;seeKorzybski and Meyers 1958. _{Kleijnen et al. (2005) define}ametamodel asan_{approximation of the true I/O}

(14)

simpler thantheunderlyingsimulation model;anexample isafirst-order polynomialregression (meta)model of the

I/0

function ofaqueueing simulation. A metamodeltreats thesimulation model asablack box; i.e., it uses the VO datawithoutknowledge of the way thesimulation model_{processes these}inputs to getthe_outputs.

A metamodel is a tool forthe_systematic_{experimentation}andanalysis ofasimulation model, togain insight (againseeSection 1.1).Kleijnen(1998)discusses the use

of

metamodeling for

(i)

sensitivity analysis

(ii) optimization

(iii) validation

andverification. We shall focusonsensitivityanalysis.

There are many types

of

metamodels;seeKleijnen (2005). Most populararelow-order polynomialregression models. However,Kriging isalso_{applied frequently in deterministic}

simulation; see Den HertogandStehouwer (2002),and_{Simpson et al. (1998).}Becauseother types

of

metamodels do not play a role in this thesis and we wish to avoid errors

of

omission, we refrain from listing(anddefining)thesetypesandgivingkey references. In thisthesis,wefocus

on Kriging,andcompare its performancewithclassical regression.

Atkinson (1989, ch.3) definesinterpolation as theselection of a function in such a way thatits graphpassesthroughafinite set

of

given data points.Krigingappliedtodeterministic simulationmeetsthat definition, as we shall see in the next section. Inpractice, the term interpolation is also used to meanafunctionthat approximates the VO data: forexample, a regression model may fit the I/0 data such that

it

minimizes the sum

of

squaredapproximation errors (LS criterion).

1.4 Kriging

In the195Os,theSouthAfrican miningengineer D.G. Krige (born in 1919, andstillalive)

devisedaninterpolation methodtodetermine true ore-bodies,based on samples.The basic idea is thatthesepredictionsareweighted averages of the observed outputs, where the weightsdepend

(15)

observed.The weightsare chosen so astominimizetheprediction variance, i.e.,theweights should provide a Best Linear UnbiasedEstimator (BLUE) ofthe_{output value for}a_{given input.}

Therefore,Kriging isalso_{called Optimal Interpolation.}

The dependence oftheinterpolationweights on the distances betweentheinputs was mathematically formalized bythe French mathematician Georges Matheron (1930-2000) in his monumental 'Traitdde_{gdostatistique appliqude' (1962). He introduced}afunction, which he calleda_{variogram, to describe the variance of the difference between two observations; also see}

Chapter 2.

SoKrigingoriginated in geostatistics to answer concrete questions in the goldmining industry:

Drilling

for

_ore-deep

underthe

_ground-is

_expensive,soefficient_{prediction methods}

are necessary.Lateron, _Krigingwas_{successfully introduced}into deterministic_{simulation by} Sacks et al. (1989). In thisthesiswe introduceKriginginterpolation into the area

of

random simulation.

Krigingprovides 'exact' interpolation,i.e.,predicted outputvalues_{at inputs already}

observed equalthe_{observed output}values. Such_{interpolation}isattractiveindeterministic simulation,and_Kriging_{is often applied in}CAD(mentionedinSection 1.2).In discrete-event

Simulation, however, Kriging hasjuststarted.

Above, we mentioned thatthevariogram isthecornerstonein Kriging. Hence,accurate estimation ofthevariogram,based on theobserved data,isessential. JournelandHuijbregts (1978,pp. 161-195) present various parametric variogram models. The values of itsparameters

are obtainedbyeither Weighted Least Squares (WLS)or Maximum LikelihoodEstimation (MLE);see_{Cressie (1993). In this}_thesis,_{we shall use}_{both methods: WLS in Chapters 2,3, and}

4, and MLE in_{Chapter 5.}

1.5 _Kriging

_{applications area}

Aftertheintroduction

of

Kriging intheminingindustry,

it

became very popularin other areas of geostatistics, suchas_{meteorology, oceanography, agriculture}andenvironmentstudies; see

(16)

Afterthepioneeringarticle

of

Sacks(1989), Kriging has alsobeenwidelyapplied in deterministicsimulationfor_engineering,aimed at thedesign

of

better computer chips, television

screens,aircraft,andautomobiles.

In this thesis we introduceKriging for simulation inOR/MS, covering both deterministic and random simulations(in differentchapters). We

limit

ourselves tosensitivityanalysis; we feel that sensitivityanalysisprecedesoptimization, sowe leaveKriging for optimizationasfutureresearch

(alsosee_{Chapter 6).}

1.6 Summary

of

thesis

Thisthesiscontains the full text of fourpapersthat eitherhavealreadybeenpublished or have

been submittedfor publication. It istheverbatim text, exceptforthreesentencesinChapter 2, indicated by footnotes. In this section we summarizeeach paper.

In Chapter 2, we introduceKriging interpolation forrandom simulation. We develop a novel type

of

Kriginginterpolation, and call

it

DetrendedKriging.We demonstrateKriging through two numerical examples:

(i) ahyperbole inspired by the single-server queueingsimulation withPoisson (Markov)arrivalandserviceprocesses(known as theM/M/1queueing model)

(ii) an

artificialmodel, namelyafourthdegree polynomial withmultiplemodes plus

additivenoise.

We test our novel method through cross-validation,andcompare it tolow-orderregression metamodelsfitted_{through Ordinary Least Squares (OLS). The main}conclusion ofthis chapter is thatKriginggives better predictions thantheseregression models. Furthermore, tests show that the Kriging'sso-called nugget effect equals the variance of the additive noise.

In Chapter 3, we droptheclassicKrigingassumption

of

outputswithconsmnt variances. In practice, this assumption is not realistic. Therefore we investigatethe consequences

of

Kriging

(17)

heterogeneity;i.e,Kriging isarobustmethod thatstilloutperformslow-order polynomial regression.

In Chapter 4, we propose a novel method toselect_{experimental designs}

for

_{Kriging. Our}

methodis_sequential,becausewe focuson_{expensive deterministic simulation.}Our_designdiffers from traditionaldesigns suchasLatinHypercubeSampling (LHS);seeMcKay,Beckman, and

Conover (1979), andalso_{Koehler and Owen (1996),}and _{Kleijnen et al. (2005). More}

specifically,our method accounts for the specific

I/0

functionimplied bytheunderlying simulationmodel.Our customized, tailor-made designsareconstructed throughcross-validation and jackknifing. We test our method through the two academic applications that we also used in Chapter 2. Our main conclusions are thatthenovel method simulatesrelativelymore inputs in the more interesting parts oftheunderlying

I/0

function, and

it

gives better predictions than

traditionalLHSdesigns.

In Chapter 5, we extendthemethod

of

Chapter 4 to random simulation, especially discrete-eventsimulation. However, customization is nowbasedon_{bootstrapping instead of}

cross-validation. We testour method through two discrete-event simulation models that are classic inOR/MS,namely theM/M/1 queueing model and an (s,S)inventorymanagement model. We again comparethe_{performance of our method}withclassicalLHS_designs. Itturns out thatour design indeed gives better results.

Finally, inChapter 6 we summarize the conclusions ofthe_{preceding chapters. We also} discussthe advantages andthedisadvantages

of

Kriging.Wefinish withpossible topicsforfuture research.

References

Atkinson, K.E. (1989), An introduction to numericalanalysis. John Wiley & Sons, Inc., New York

(18)

De Geest, J., T. Dhaene, N. Fache, and D. DeZutter(1999),Adaptive CAD-model building algorithm forgeneral planar microwave structures./EEETransactionsonMicrowave Theory and Techniques, 47, no. 9,pp. 1801-1809

Den Hertog, D., and H.P. Stehouwer (2002),Optimizing colorpicturetubesby high-cost non-linear programming. EuropeanJournal

_of

OperationalResearch, 140(2),197-211

Journel, A.G. and Huijbregts, C.J. (1978),MiningGeostatistics. Academic Press,

London

Kleijnen, J.P.C. (1974),Statisticaltechniquesinsimulation, partL MarcelDekker Inc., New York

Kleijnen, J.P.C. (1998), Experimental designfor sensitivityanalysis,optimization, and validation

of

simulation models. Chapter 6 in: Handbookofsintulation, edited byJ.Banks, _{Wiley, New York,}pp.173-223

Kleijnen, J.P.C. (2005), Invited review: Anoverview ofthedesignand_{analysis of}

simulation experiments for sensitivity analysis.European Journal of Operational Research tin press)

Kleijnen, J.P.C., S.M.Sanchez,T.W.Lucas andT.M._{Cioppa (2005),}A_{user's guide} tothebravenewworld o

f

designingsimulationexperiments. INFORMS Journal on_{Computing (accepted as State-of-the-Art Review)}

Koehler, J.R. and A.B. Owen (1996), Computer experiments. Handbook of statistics,

by S. Ghosh and C.R. Rao, vol. 13,pp.261-308

Korzybski, A. andR._{Meyers (1958),}Scienceand sanity:anintroduction to non-aristotelian systems and general semantics; fourth edition. International Non-Aristotelian LibraryPublishing Company,Lakeville

Law, A.M. and W.D. Kelton(2000), Simulation modeling and analysis;

third

edition. McGraw-Hill,Boston

Matheron, G. (1962), Traitd de gdostatistique appliqude. Memoires du Bureau de Recherches Geologiques et Minieres, Editions Technip, Paris, no.14: pp.

(19)

McKay, M.D.,R.J._{Beckman and W.J. Conover (1979),}

A

_comparison

of

three methodsforselecting values

of

input variables intheanalysis

of

output from a computercode. _{Technometrics, 21, no. 2,}_pp.239-245(reprinted in 2000: Technometrics, 42, no. 1, pp. 55-61

Sacks, J.,W.J. Welch, T.J. _{Mitchell and H.P. Wynn}_{(1989), Design}and_{analysis of}

computer experiments.Smtistica/_{Science, 4, no. 4, pp. 409-435}

Simpson, T.W.,J.J.Korte,T.M._{Mauery, and}F._{Mistree (1998),}_{Comparison of}

responsesurfaceand _Krigingmodels_{for multidisciplinary}_design optimization. AIAA Journal, vol. 1,pp.381-391

Simpson, T.W.,T.M. Mauery,J.J. Korte, andF.Mistree (2001), Kriging

metamodels

for

_{global approximation in simulation-based}_{multidisciplinary}

designoptimization. AIAAJournal, 39, no. 12, 2001,pp.2233-2241

Van Beers, W.C.M. andJ.P.C. _Kleijnen_{(2005), Customized sequential designs for}

randomsimulationexperiments:Krigingmetamodelingandbootstrapping. Submitted for publication

(20)

(21)

Chapter 2 Kriging for Interpolation

in

Random

Simulation

Abstract

Wheneversimulation_{requires much computer time,}_{interpolation}is needed.Simulationists use different interpolationtechniques (forexample, linear regression), but thispaperfocuses on

Kriging. Thistechniquewasoriginallydevelopedingeostatistics by D.G.Krige, and has recentlybeen_widely_appliedindeterministic simulation. Thispaper,however, focuses on randomorstochastic_{simulation. Essentially,}_Kriging_{gives more}_weightto_{'neighbouring'} observations. Thereareseveral types

of

Kriging;this paperdiscusses-besidesOrdinaryKriging -a novel_type,which _{'detrends' data through the use}of linearregression. Resultsarepresented fortwo examples

of

input/outputbehaviour of theunderlying_{random simulation model:} Ordinary and DetrendedKriginggive quite acceptable predictions;traditionallinear regression gives theworstresults.

2.1 Introduction

A primary goal of simulationis_{what if}or_{sensitivity analysis: What happens if inputs of the} simulation model change? Therefore simulationists runa_givensimulation

_program-or

computer

code-for (say)

ndifferentcombinations of theksimulationinputs. Weassume that

Paper by_{Van Beers, W.C.M. and}J.P.C._{Kleijnen, Journal Ofthe}OperationalResearchSociety, no. 54,2003, pp

(22)

theseinputsareeither parameters orquantitativeinputvariables ofthesimulationmodel. Typically, Krigingassumes thatthe number

of

values per input variableisquite

'big',

certainly exceeding two(twovalues are usedinsimulationexperimentsbased on 2k-Pdesigns).

Given this set ofninputcombinations,theanalysts run the simulation and observe the outputs. Note that most simulation models havemultipleoutputs, butinpracticetheseoutputs are analysedperoutput type.

The crucial question of this paper is: How to analyse thissimulation input/output (I/0)

data?_{Classic analysis}uses_{linear-regression (meta)models;}see_{Kleijnen (1998). A}metamodel is an approximation of the

I/0

transformationimplied bythe_{underlying simulation}program. Many other termsare_{popular in certain disciplines:}Response surface,compact model, emulator, etc. Suchametamodeltreats thesimulation model asablack box; that is, thesimulationmodel's VO is observed, andthe_{parameters of the metamodel}areestimated.Thisblack-box_{approach has the}

following

advantages anddisadvantages.

An advantage is thatthemetamodel canbeapplied to all types

of

simulationmodels, either deterministic or random, either in steady-state orintransientstate.Adisadvanmge is that it cannot take advantage ofthespecificstructure ofagiven simulation model, so it may take more computer time comparedwithtechniques suchasperturbation analysis andscorefunction.

Metamodelling can also helpin optimizationandvalidation ofthesimulationmodel. In this paper, however, we do not discussthesetwo topics, but refer tothereferences ofthispaper.

Further, ifthesimulation modelhashundreds

of

inputs,thenspecial 'screening' designs are

needed, discussedin Campolongo,Kleijnen,and Andres (2000). In ourexamples-but not in our methodologicaldiscussion-we limitthenumberofinputs totheminimum,namelya_single

input.

Whereas_{polynomial-regression metamodels have}been_{applied extensively}in discrete-event simulation (suchas_queueing_{simulation), Kriging}hashardlybeenappliedto random simulation: Asearch ofIAOR(International Abstracts

of

Operations Research) gave only two hits. However, in deterministicsimulation_{(applied in many engineering disciplines; see our}

(23)

Kriging for Interpolation in Random Simulation 13

In randomsimulation,however, thisKrigingproperty may not besodesirable, sincetheobserved (average)value is onlyan_{estimate of the true, expected simulation output.}_{Unfortunately,}

Krigingrequires extensive computation,so adequatesoftwareis needed.We discovered that for random simulation no software is available, so we developed our own software, in Mattab.

Note that several types

of

_{random simulation may}be_{distinguished:}

(i) Deterministic simulation with_{randomly sampled inputs. For example, in investment analysis} wecancompute the cash

flow

development over time throughaspreadsheet suchasExcel. Next, we sampletherandom values

of

inputs-such as the cashflow growth

rate-by

means

of

either MonteCarlo orLatinHypercube Sampling_(LHS)throughanadd-on such as @Riskor Crystal Ball; see Van Groenendaaland_{Kleijnen (1997)}

(ii)

Discrete-event simulation. For example, classic queueingsimulationis_{applied in logistics}

and telecommunications.

(iii)

Combined continuous/discrete-event simulation. For example,simulation

of

nuclearwaste

disposalrepresentsthe physicalandchemicalprocessesthroughdeterministicnon-linear difference equationsandmodelsthehuman interventionsasdiscreteevents (seeKleijnen and Helton, 1999).

Ourresearchcontributionconsists inthe_{development of}a_{novel (namely, detrended)}

Kriging type, andtheexploration of how wellthis Krigingtype performs compared with OrdinaryKrigingandtraditionalpolynomial-regression modeling. The main conclusion of our examples is:A perfectly_{specified detrending}function_givesbest_predictions;_{Ordinary Kriging is}

acceptable;theusuallinear regression givestheworstresults.

We organize the remainder of thispaperasfollows.First we sketch the history

of

Kriging

and_{its application in geology, metereology,}anddeterministic simulation. Then we describe the

(24)

2.2 Kriging

2.2.1 History

of Kriging

Kriging isan_{interpolation}_technique_originallydeveloped by D. G. Krige,aSouthAfrican miningengineer. Inthe 1950s hedevised this method to determine true ore-grades,based on

samples. Next,heimproved the methodincooperation with G. Matheron,aFrench

mathematician atthe'EcoledesMines'. At the same time,inmeteorology L. Gandin (in the formerSovietUnion)workedonsimilarideas,under thename_'optimuminterpolation' (see Cressie, 1993).

Nowadays,Kriging isalsoapplied to 1/0 data

of

deterministic simulationmodels; we

referagaintoSacks etal. _{(1989)'s pioneering article. Many}morepublications followed; for

example, Meckesheim et al. (2001) give35references. AlsoseeKoehler and Owen (1996), and

Jones,Schonlau,andWelch (1998).

2.2.2 Basics

_{of Kriging}

Kriging isanapproximation method that can give predictions

of

unknown values ofarandom function, random field, or random process. These predictionsarebest linear unbiased estimators, under theKrigingassumptionspresented in the next subsection.

Actually, these predictionsare_{weighted linear combinations of the observed values.}

'Kdging assumes that the closer the input data are, the more positively correlated the prediction

errors are._{Mathematically, this assumption is modeled through}asecond-orderstationary

(25)

thepredictionequals theobserved_{value. (In deterministic}simulationthis propertyiscertainly attractive, as wesaid_above.)

In Kriging, a crucial role is played bythe_variogram:

A

_{diagram of the variance of the}

difference between the measurements at two input locations; alsoseeFigure1.which has symbols explained in the next subsection. The assumption ofasecond-order stationary covarianceprocessimplies thatthevariogram isafunction ofthedistance (say)hbetween two locations. Moreover, the further apart two inputs are,thesmaller this dependence

is-until the

effectis_negligible.

2.2.3 Formal

model

_{for Kriging}

A random_processZ(•) can be described by {Z(s) :s E D} where D i s a fixed subset of Rd and

Z(s) isarandomfunctionat_{location s D; see Cressie (1993, p. 52).}

There are several types of Kriging, but we limit this subsection to_{Ordinary Kriging,} whichmakes the

_following

two_{assumptions (already mentioned above, but not yet formalized):}

(i) The model assumptionisthat the random process consists of a constant 11 and an error term O (s):

Z(s)=B+O (s) with seD,BER (1)

(ii)

Thepredictorassumption is thatthepredictor forthepoint so -denoted by

p(Z(so))-is

aweighted linearfunction of alltheobserved output data:

p(Z(So))=Elli,Z(st) with TIi,1,=1 (2)

To select the weights A, in (2), the_{criterion is minimal mean-squared prediction error (say) a} defined as

03 =E[{Z(so)-p(Z(so))}21 (3)

To minimize(3)given (2),the_{constraint X 1,1, =l} isadded to theobjective functionthrough

theLagrangianmultiplier m: Then wecanwritetheprediction error as

1

(26)

E[{Z(so)-63:.,A,z(s,)}2]-2m[Ili,1 -11. (4)

To minimize (4), weutilizethevariogram; alsosee_{Figure 1. By}definition,the_{variogram is}

2, (h) = var[Z(s + h) -

Z(s)],

where h =s, -si asexplained by the stationary covarianceprocess

assumption with h e Rd and i,j - 1,..., n. Obviously, we have

var[Z(s + h) - Z(s)] = 23(st - s,) = 22(h).The spacing h is also called the lag. Aftersometedious manipulations, (4) gives

- X l,T "=,A;.1, »,-st)+261 1,·1, Aso -s,)-2mC X liA,-1). (5)

Differentiating (5) withrespect to Ai , · · · , A„ and m.givestheoptimal ,11, . . . , A„ :

f 1-1'r-'TY_

11= 7+1

11-1 and m=-(1-1'r-ly)/(1'r-'1), (6)

ir-1 1 3

where y denotes thevector

of

(co)variances (y(so-st)....,y(so-s„))',

r

denotes the nx n matrixwhose (i,j)th

_{element is )(s,-s,), 1 -(1, ..., 1)'is}

thevectorofones; alsoseeCressie (1993, p. 122).

We emphasize thattheseoptimal Kriging weights 1 depend onthe specificpoint so that

is tobepredicted,whereaslinear-regression metamodelsusefixedestimatedparameters (say) B

Note further that some of the weights 1 may be negativez.

The optimal weights (6) givetheminimalmean-squaredpredictionerror: (3)becomes

(alsosee_{Cressie (1993, p. 122)}

Cr; = Il,ly(so-st)+m

(1'r-'7-1)2 (7)

= y'r-'y

1, r -1 1

However, in (6) and (7) 2 (h) isunknown. Theusualestimator \s

(27)

29(17) =

-LY

(Z(st)-Z(s,))2 (8)

N(hM - Nth) \

where_|N(h)| denotes thenumber

ofdistinct

_{pairs in N(11)={(s,.s,):st-sj=h; i, j =1,....n};}

see Matheron (1962). The estimator in (8)isunbiased, ifthe_{process Z(•) is indeed second-order}

stationary; see_{Cressie (1993, p. 71).}

Given (8)for differentIlhll values,thevariogramisestimated_{by fitting}a_{curve through}

the estimated values 2f (11). This curve displays the following importantcharacteristics(see

Figurel):

(i) Forlarge values

of

1|h||,thevariogram 2 f(h) approaches aconstant _C(0),called the

sit!'. 'Forthese large Ilhll values, all variances of the differences Z(s + h) - Z(s) are invariant with

respect to h.

To prove this property, we definethe_{covariogram C(h)}=_{Cov(Z(s), Z(s + h)).} Obviously, COV(Z(S),Z(S)) =Var(Z(s)).Then it is easy to derive

22 (h) = 2(((0) - C(h)). (9)

Because C(h) 1 0 as 11 h Ill' -, the variogram hastheupper

_{limit 2C(0).}

(ii) The interval of Ilhll on which the curve does increase (to the sill), is called therange

(say) r;that is, C(11) < E for 11 h 11> r + r,. We shall giveaspecific_{model in (10).}

(iii) Although₍₉₎impliesgamma 3 (0) =0,thefittedcurve does not alwayspassthrough zero: It may have a positive intercept-called_{the nugget}_{variance. This variance estimates noise.}

(28)

For example, in geostatistics this nugget effects means that when going back to the 'same' spot, a completely differentoutput (namely, a gold nugget)isobserved.

We add that in random simulation, the same input (say, the same traffic rate in queueing

simulation)givesdifferentoutputsbecausedifferent_{pseudo-random numbers are}used. Below we shallreturn to thisissue.

To fit

avariogram curve through theestimatesresulting from(8), analysts usually apply the exponential model

[co+ct(1-e-Alita) if h#0

Y(h) =4 (10)

[0 if h=0

where obviously co isthenugget, co + ct the sill, and athe range.However, other models are also fitted; for example,thelinear model

Y(h)=

4

[co+bllhll if h#0

(11)

[0 if h=0

where again co isthe_nugget;see_{Cressie (1993, p. 61).}_Actually,weshall apply (11) in our experiments, because it is the simplest model and yet gives acceptable results(forexample, it estimatesthenuggeteffect very well).

In deterministicsimulation.analysts use more general distance formulas than (8). For example, Sacks et al. (1989, p. 413)andJones et al. (1998, p. 5) use for the k-dimensional inputs

xi = (X,(„, ..., X'. k,)' and xj = (xjct,•..., x,(k,)' theweighted distanceformula

h(xi,xi)

=

_{E:.1 0: IX, g,-XM,}

i

₍₁₂₎ where 19, (with d:

2 0)

measuresthe importance oftheinput xg, and pg controls the

(29)

2.3 Detrended

_Kriging

OrdinaryKrigingwas_{defined by (1), where v E R was}theconstant mean of the random_process Z(•).This assumption, however,limitstheapplication

of

OrdinaryKrigingtorather simple models of_{the process Z(•). A more}_general_{assumption is that v is not}aconstant, but an unknown linear combination

of

known functions

_{{fo(s), -:f„(s)}, se D.}

Thisis

called Universal Kriging:see_{Huijbregts and Matheron (1971, p. 160) and also Cressie (1993, p. 151).}

Cressie (1993)discusses real(non-simulated) coal-ash data,and_Regniereand_{Sharov (1999)}

discusssimulated spatialandtemporal output data ofarandomsimulationmodelforecological

processes.

Now we introduceanovel type

of

Kriging that wecall DetrendedKriging.Detrended Krigingpre-processestheoriginal data, and then appliesOrdinaryKriging totheresulting data so wecan_{apply software}_{for Ordinary Kriging.}For Universal_Kriging,_{however, software is} available only

for

spatialandtemporal data, notfor simulation withanarbitrarynumber of inputs-to the best ofour knowledge3.

We assume that the_{process mean}

_*(s)

satisfiesthe_{decomposition}

#(S)= S (S)+71(S) (13)

where S(s) is a known signal function (see, however, the text below (14)) and 11(s) isawhite noiseprocessthat_{models the measurement error; that is, 11(s) is} normally identically and independentlydistributed with zero mean (NIID). So,we replace (1) by

ZCS) = SCS) + 11(S) + a(S). (14)

In practice,the_signal_{function S(s) in (14)}is_{unknown. Therefore we estimate S(s)}

through S(s), from the set

of

observed_{(noisy) I/O data {(st,Z(s, )). i =l, ..., n}.} Because of the

assumedwhite noise, weuseordinaryleastsquares (OLS) to obtaintheestimator S(s).

Next we applyOrdinary Kriging tothe_{detrended set <(st,Z(s, ) - S(st ) ) :i= 1, . . . . n . Our}

predictor forthe output

of

_{location so is the sum of this}OrdinaryKrigingprediction and the

3 Afterthis paperwasaccepted,theMatlabKrigingtoolboxof_{Lophaven, Nielsen,}andSpndergaardbecame

(30)

estimator S(so)

To test our new Detrended Kriging, we apply classiccross-validation:see_{Kleijnen and}

van Groenendaal (1992, p. 156).Cross-validation eliminates one

I/0

combination,

say (sk I Z (s k )), from to the

original data set {(st,Z(s,)): i =l,..., n},so

the remaining n -1 data

combinations are {(st,Z(s,)): i -1,..., k -1, k +1...., n}.This new

setgivesaprediction

p( Z(sk )) ·Thisprocess

of

eliminationandpredictionisrepeated for (say) c different

combinations (c C n) .Obviously, if we sorttheoriginal set such that thefirst c observations are deleted one at a time, then we get k =1,2, . . . ,c.

To summarizetheresultingpredictionaccuracy, we use the I, norm of the difference

I 2 \B

vector li p(Z(sk))-Z(sk) 11 (the 4 norm 11x11 is defined as Ek-ixk 11 ). In our experiments

we find that the l, and

L-

norms givesimilarconclusions.

Note that inKriging,allpredictionerrors may be zero at the 1/0 points thatareactually

usedto estimatetheKrigingmodel. Therefore weusecross-validation.

2.4 Two

_{examples and}

five

metamodels

Weareinterested intheapplication

of

Krigingtodiscrete-eventsimulationmodels, such as simulated queueing systems. As LawandKelton (2000»-thebestsellingtextbook on

simulation-states (on page 12),a_singleserver_{queueing system is quite representative of more}

complex, dynamic, stochastic simulation models. Forfurther simplification,we suppose that the

output ofinterest is the meanwaiting time in thesteady state, E(W).Thisoutput can be

(31)

S(i) and A(i)are_identicallyand_{independently}_{distributed (IID),}sosimulationis_{straightforward.}

The output E(W)isusually estimated through the simulationrun'saverage

iii _r, W(i)

6

with

OEb<n

(16)

Abn-b+1

where bdenotesthelength oftheinitialization(start-up, transient)phase(which may be zero), and n therun length. (InM/M/1 analysisandsimulationthrough renewal analysis, this initialization isnoissue;in practical simulations, however, it isa_{major problem; see Law and}

Kelton (2000,pp.496-552).) In other words,thedynamic simulation model generates the time series (15), but thisseries ischaracterized through the single statistic (16).

Actually, simulation is done_{for sensitivity}_{analysis (possibly}followed_by_{optimization).} Such an analysis aims at estimatingthe_{input/output (I/0)}_function_(say)

E(Z(s)) =S(s) (17)

where-following

_(14)-Z

denotes the(multiple)output and s the(multiple) input. In the M/M/1

example we have Z=W and s= 1/#

with arrival rate A and_{service rate v.}

In general, S(s) in (17)hasunknownshape andparameters. However, when studying the

performance ofa_{specific simulation methodology,}researchersoften use theM/M/1 simulation modelbecause the

I/0

function S(s) isthenknown-assuming thatthemethodology has

selected an_appropriateinitialization_{length b in (16)}_{(obviously, knowledge of S(s) may not be} used by the methodologyitself):

E(W) = 1 with

1<1.

(18)

BCM- A) B

Unfortunately,thelatter assumption is very questionable: it iswellknown that selecting an appropriate transient-phase length b and runlength n in (16)is

difficult.

Moreover, Krigingassumes_{that-in general-the}simulation observations Z have additive white noise:see_{(14). In the M/M/1 example, (14) gives (i) Z=W, (ii)}

S(s) = A/(v

(1-A)),(iii)

normality holds if W in (16) uses a_{sample size (n - b +1) such that an}

asymptotic central

limit

theorem holds,

(iv)

constant variances result

if

differentsimulatedtraffic

rates usedifferentand_{appropriate sample}

sizes-see

Kleijnen and VanGroenendaal (1995)-and (v) independenceresults ifno_{common pseudorandom numbers are}used._Altogether,

(32)

Hence, it is much moreefficientandeffectivetogenerateKriging testdatathrough sampling from (13) with S(s)=1/(51(1

-A))

instead of (15) and (16). Indeed,ourapproach requireslesscomputer time,andguarantees that thewhitenoise assumption holds,including the desiredvalue forthevariance ofthewhitenoise.The alternative using (15) and (16)would require very longruns.especially for hightrafficrates A/# T 1 this alternativerequires n T. .

In conclusion, to testthe_Krigingmethodology wegenerate datathroughastatic, random Monte Carlo model like(13) instead ofa_{dynamic stochastic simulation model such as (15)}

combined with (16). So,theMonteCarlotechnique is bothefficientandeffective.

Besides Example I, we study Example II representing simulationswith multiple local maxima, which are interesting when optimizing simulation outputs. Example I represents queueing simulations that show 'explosive' meanwaitingtimes as thetrafficrateapproaches the valueone._{Example II has no specific interpretation.}

We samplethewhite noise-term 11(s) in (14) throughtheMatlabfunctioncalled 'randn', whichgivesstandard

NIID

variates; that is, 11(s) has zero mean andunitvariance. We also experiment withalarger variance namely 25; this results in larger error terms, but not in other

conclusions.

To estimate possible values of the L2 norm (defined above), we use 100

rnacro-replications.Fromthesemacro-replications we estimate

_4's

median, 0.10 quantile Qo 1, and 0.90 quantile Q09.

In both examples we take n =21 equallyspaced_{input values: s, with i = l,..., 21. For}

cross-validation we select (ratherarbitrarily)

c=5

inputsvalues:We

eliminate i=2,8,9,1 5,1 6

respectively. We comparedthefollowing fivemetamodels. i) OrdinaryKriging

ii)

second-degree detrending: S(s) isasecond-degreepolynomial

iii)

perfectlyspecified detrendingfunction: S(s) isahyperbolainExample I, andafourth degreepolynomialinExample II

iv)

fifth-degreedetrending: S(s) isa_fifth-degree_{polynomial (overfitting)}

(33)

Note that wealsostudya_{perfectly specified detrending function (see}

_iii),

becausethisfunction providesautopian situation:This functiongives theminimum_{prediction error;}inpractice, this function_{is unknown (otherwise}simulationwould not beused).Furthermore,

it

helps us toverify the correctness ofour computer program.

2.4.1 Example I:

M/M/1 hyperbola

We take S(s) =

s/(1-s) on D=[0.01,0.99] c R:

this hyperbolic function may represent the

mean_steady-state_{waiting time for}atraffic rate s in anM/M/1 _queueing_system.This function gives Figure2,which also displaysan_{example of}the_noisy

_{output Z(s).}

The_inputlocations are

s, c <0.01,0.05,0.10,..., 0.95,0.99}. The cross-validationiscarried out at s = 0.05,0.35,0.40,0.70 and 0.75. 25 20 S(S)

15 1-Cs)

0 eliminated data 10

5

0 . 70-7%1' a:--*-::0::'$ 4 '4 0 0.2 0.4 0.6 0.8 S . -5

Figure 2: S(s) = s/(1 - s) andexample sample output Z(s)

(34)

Table 1:Prediction accuracy: estimated quantiles ofLedistribution forExample I (M/M/1 ) metamodel

L2 i ii iii iv V

00 i 1.2429 1.3622 1.1972 2.7411 17.593 median 1.8832 2.1522 1.8419 3.6678 18.17 00, 2.5698 2.925 2.5829 4.4677 18.652

2.4.2 Example

II:

fourth-degree

polynomial

We take thefollowingspecificpolynomial: S(s)=_{-0.0579s4 + 1.11 s' - 6.845}

sl

+14.1071 s+2

on D = [0,10] c R'.

Thispolynomial has two maxima:

A

local one andaglobal one; see Figure 3. We obtainoutput forthefollowing21 input locations st € 0,0.5,1,...,10}.We

cross-validate at s = 0.5,3.5,4,7 and 7.5. 15 10

-2

S(s)

a

\

. . . ... ..CO . .0 5 5 -:-I.A ...Z(s) 0 i i I eliminated data

-5 0 2 4 6 8 s 1 0

-10 -15

Figure 3: S(s)=_{-0.0579 s4 + 1.1 l}

s' -6.845

s2+14.107ts + 2 andexample sample output Z(s)

(35)

Table2:_{Example II ; also}seeTable 1

metamodel

4 i ii iii iv V

Qo.i 1.5976 1.515 1.2094 1.2173 5.5965 median 2.4713 2.41 1.8748 1.9117 6.0363 00.9 3.3226 3.246 2.6424 2.6959 6.5048

2.5 _{Nugget effect}

We also wish to better understandtherelationship betweenthe_{nugget effect in (11) and the}

variance of the noise 11(s) in (13). Therefore we performasimpleMonteCarloexperiment: We take Z(s) =1 0+11(s) where 11(s) is NIID with v =0 and 0 2= 1,4,9,16, and25respectively. We sample two macro-replicates, setting the seed

of

Matlab's

'randn'-rather arbitrary-to the

values 10 and 20. In the variousKrigingmetamodels, we fitthelinear variogram of (11); see Figure 4 (we display results for theseedvalue of 10 only; notethedifferentscales for the y-axis in thefourplots).

6 · 25

seed: 10

5 seed: 10_sigma: 1 20· _{sigma: 2}

4. 15- 3-• _{10- •} 2 . .. 1 • 5-0. · 0 100 50· seed:10 90 seed: 10 sigma: 3 80- Sigma: 4 40· 70-30- · 60 50· * 40 20- * . · • .. 30· • 10 + 20 10 0· · · 0· · · · · 02 46 8 10 0 2 46 8 10

(36)

The intercept in (11) estimatesthenugget effect; this intercept ispresentedfor different 02 values inTable3.Obviously,these resultsconfirmour conjecture: The nugget effect is the variance ofthenoise.

Table3:Estimated nugget effectsfor different whitenoise variances a 2 Q 2 seed10 seed 20 1 1.1 0.9

444

9 9.6 8.5 16 17.1 15.5 25 26.5 24.1

2.6 Conclusions

and

future

research

We assume that in practice the mean v oftheKrigingmetamodel (1) is notaconstant, but is a composition ofasignalfunctionandwhitenoise.Wepresentedresults fortwo examples of inpuUoutputbehavior oftheunderlying random simulation model:Ordinary Kriging and Detrended Kriginggivesquite acceptable predictions, whereastraditional linearregression gives theworstresults.

0LS predictsso_poorly,becauseOLSassumes thatthe

_fitting

errorsarewhitenoise,

whereas_Krigingallows errors thatarecorrelated; morespecifically,thecloserthe_{inputs are, the}

more positive correlation. Moreover, 0LS usesa_{single estimated}function forall_inputvalues,

whereasKrigingadaptsits predictor as the input changes. Note that OLS maybeattractive when looking foran

_{explanation-not}

a

prediction-of

thesimulation's

I/0

behavior;for_example, whichinputs are most important; doesthesimulation showabehavior thattheexperts find acceptable (alsosee_{Kleijnen, 1998)?}

0LS isalso_comparedwith(universal) Krigingby RegniereandSharov (1999).Their

(37)

Further, we found that tile nugget effectequals thenoise variance.

We restrictedour examples toasingle input. Therefore we gaveeach

weight in the

more general distance formula (12),thefixed value of one. In designoptimization,_however,

theseparameters are used to controltheimportance of theinput_{variable x ; see}forexample

Simpson et al. (2001, p. 8)and_{Jones et al. (1998, p. 5). In}future workwe_{shall investigate} multipleinputs.

Further, we shall relaxtheassumption

of

whitenoise:We shall investigatetheeffects of non-constant variances(whichoccur_{in queueing simulations), common random number}_usage (whichcreatescorrelations amongthesimulation outputs),andnon-normality(Kriging uses maximumlikelihoodestimators of the weights

_4,

whichassumes_{normality). Finally,}we shall applyKrigingto practical queueingand_{inventory simulations.}

Acknowledgment

We thank the two anonymousreferees

for

their most useful comments onanearlier version.

References

Campolongo, F., J.P.C. Kleijnen, and T. Andres (2000), Screening methods. In: Sensitivity Analysis, edited by A. Saltelli, K. Chan, and E.M. Scott, Wiley, Chichester (England), pp.

65-89

Cressie, N.A.C. (1993).StatisticsforSpatial Dam,Wiley, New York

Huijbregts, C.J.and_{Matheron, G. (1971). Universal}_{Kriging (An optimal}_methodfor_estimating and contouring in trend surface analysis). In Proceedings of Ninth International Symposium on Techniques for Decision-making in the Mineral Industry

Jones, D.R., M.Schonlau, and W.J. Welch (1998).Efficient global optimization

of

_expensive

black-box functions.Journal of Global Optimization.13. 455-492

(38)

Kleijnen, J.P.C. and W. vanGroenendaal (1995), Two-stageversussequential sample-size determination in regression analysis of simulation experiments.American Journal of Mathematical and Management Sciences, 15, nos. 1&2, 1995, PM 83-114

Kleijnen, J.P.C. (1998),Experimental designforsensitivity analysis,optimization,andvalidation of simulation models. In: Handbook of Simulation. edited by J. Banks, Wiley, New York, pp.

173-223

Kleijnen, J.P.C.andJ.Helton (1999), Statisticalanalyses

of

scatter plots toidentify important factors_{in large-scale simulations.} ReliabilityEngineering andSystemsSafety, 65, no. 2, pp.

147-197

Koehler, J.R. and A.B. Owen (1996), Computer experiments. In: Handbook OfSmtistics, Volume13,edited byS.GhoshandC.R.Rao, Elsevier, Amsterdam, pp.261_308

Law, A.M. andW.D.Kelton(2000), Simulation modeling and analysis,

third

edition, McGraw-Hill, Boston

Matheron, G. (1962) Traite de Geostatistique Appliquee, Memoires du Bureau de Recherches

GeologiquesetMinieres, No. 14,Editions Technip,Paris,(pp.57-59)

Meckesheimer, M., R.R. Barton,T.W._{Simpson, and A.J. Booker (2001),}Computationally inexpensive metamodelassessmentstrategies._{American /nstitute OfAeronautics and} AstronauticsJournal(submitted)

Rdgnitre, J. and A. Sharov (1999),Simulatingtemperature-dependent ecologicalprocesses at the

sub-continental scale: male gypsy moth flight phenology.International Journal of Biometereotogy, 42, 1999,pp. 146-152

Sacks,

J.,

Welch, W.J., Mitchell, T.J. & Wynn, H.P. (1989), Designand_analysis

of

_computer

experiments. Statistical Science, 4. no.4.pp.409-435

Simpson, T.W., Mauery, T.M., Korte, J.J., and Mistree, F.(2001).Kriging Metamodels for Global Approximation in Simulation-Based Multidisciplinary Design Optimization. American Institute of Aeronautics and Astronautics Journal (submitted)

(39)

Chapter 3 Robustness

_{of Kriging}

when

_{Interpolating in}

Random

Simulation

with

_{Heterogeneous}

Variances:

Some

_Experiments

Abstract

Thispaper investigates the use

of

Krigingin random simulation whenthesimulation output variances are not constant. Kriginggivesaresponsesurfaceor metamodel that can be used for interpolation.BecauseOrdinaryKrigingassumesconstant variances, thispaperalso applies DetrendedKrigingtoestimatea_{non-constant signal function, and}thenstandardizestheresidual noise throughtheheterogeneous variances estimatedfromreplicated simulationruns.Numerical examples, however,suggestthat OrdinaryKriging isarobust_{interpolation}method.

3.1 Introduction

Given a set

of

_{inpuVoutput (I/O)}data _{Kriging fits}a_{metamodel (also called}_response_surface,

emulator, auxiliary model, etc.). Kriging isan_{interpolation}method; that is, at the observed 1/O values_{its predictions}are_{exactly equal to}theobserved_outputvalues._{However, classic}least

(40)

squares

_(LS)-used

in

regression-typically fits

alow-degreepolynomial,which results in

non-zero

fitting

errors (or residuals) such that the sum

of

these squarederrorsisminimized.

Originally,theSouthAfrican miningengineer D.G.Krigemodeled realworld I/0 data to predictgold quantities;see_{Cressie (1993). Later on,}Krigingwasapplied todeterministic simulation 1/Odata(modelingtheperformance

of

computer chips, cars, etc.);seeSacks et al. (1989). More recently, the application

of

Krigingtorandom simulation(forqueuing problems, etc.)was_proposedbyBarton (1994), andwas_investigatedindetail by VanBeersandKleijnen (2003). The latter authors suggested that it maybebetter to replace classicOrdinary Kriging by theirnovel method called DetrendedKriging, whichcorrectsor'detrends' theoriginaloutput data throughanestimated signal function (see S(s) below). However, both Ordinary and Detrended_Krigingassume that the outputs have constant variances.

In practice, this constant variance assumptionisoften completelyunrealistic. For example,theso-calledM/M/1queue isanimportant buildingblock in discrete-eventsimulation; see LawandKelton (2000). For theM/M/1,Cohen (1969) provesanalytically thatthe

steady-state_{waiting time (say) Z for}atraffic load s with 0<s<1 has mean

E(Z(s))=-1 (1)

_1-s

and variance

\ 3(2 - s)

Var(Z(s))= (2)

(1 -s) 2

sothis varianceisdefinitelynotconstant whenthetrafficloadchanges.Therefore we now investigate the consequences

of

varianceheterogeneitywhen applying Kriging to the I/0 of random simulation.

Whereas linear-regression analysis

of

_{low-order polynomial}metamodels has been applied extensively in discrete-event simulation (such as queueing simulation), Kriging has hardly been applied to random simulation.

The mainconclusion of our (limited)experiments will be thatOrdinaryKrigingseems a

(41)

Robustness of Kriging when Interpolating in Random Simulation with 31

Heterogeneous Variances: Some Experiments

The remainder of this paperisorganizedasfollows.Section3.2summarizesthebasics of Kriging.Section3.3 discussessolutions fortheproblems

of

non-constantandexploding

variancesinrandomsimulation. Section3.4_proposesanovel_Krigingmethod, which estimates a non-constant signal function andthenon-constant variances. Section3.5 presentsnumerical examples, whichsuggestthat OrdinaryKriging isarobust prediction methodwhereasthenovel Krigingmethodis_{better only in utopian situations. Section}3.6_{gives conclusions}and_proposes

futureresearch_topics.

3.2 _Kriging

basics

Krigingassumesthe_followingmodel:

Z(s) = S(s) + '1(s) with 11(s)

-NID(0, 02 (s)) (3)

where S(s) is the so-calledsignal function and4(s) the additivenoise; we use the notation x - NID(a, b) when x is normally and independently distributed (NID) with mean a and variance b. Below, we shall discuss the classic assumption

of

white noise,which means that

02(s) in(3) reduces toaconstant (say) 0 2.

Note: The model in (3) maygeneratenegative output values,whereasM/M/1generates

non-negative valuesonly.Nevertheless, we use(3)because

it

_{provides better}control over our experiment and itisfaster thanM/M/1 simulation; see VanBeers andKleijnen (2003).

Ordinary Krigingassumesa_{stationary covariance process}

for

_{Z(s) in (3) (see Cressie}

1993).That assumption implies thatthe_{expected values E (Z(s))} areconstant, and the

covariances of Z(s, + h) and Z(si) depend only on the distance (or lag) h._{The predictor for the}

point so -denoted by p(Z(so))-is

aweightedlinearcombination of all the (say)nobserved output data:

(42)

To selectthe_{weights /1, in}(4), Krigingminimizes the mean-squared prediction error. The

technique uses the_variogram,defined as 23 (11) =var(Z(s + h)-Z(s)), see Figure l discussed below.The optimal weights turn out to be

" -t' +1 1 i,;, i, j r-1 (5)

where 7 denotesthe vector (7(so -s, ) . . . . , y(so

-s„))',r

denotes the n x n matrix whose (i,

j)*

elementis 2 (s, -sj),and 1 denotesavectorofones; alsoseeCressie (1993, p. 122). Note thattheweights A, in (5) vary withthepredictionpoint,whereaspolynomial-regression metamodels use thesameestimated metamodel forallthese_points.

3.3 Random simulation

with

heterogeneous variances

Inthispaper,weallowheterogeneous variances;hence, theKrigingassumption ofastationary covariance process does not hold. Therefore, we do not expecta_{smooth variogram. Furthermore,}

Van Beersand_Kleijnen(2003) found thatthe_{intercept of}thefitted_{linear variogram estimates} theso-callednuggetefect. The term 'nugget' originated in the Kriging of gold mining data:

when going back to the 'same' spot (lag h 1 0; see the text above equation 4),acompletely different output-namely, agold nugget-maybeobserved. In general,Kriginginterprets a nugget effect asanestimate

of

noise;for_{example, measurement noise in geostatistics. Random} simulation_{models produce noisy}

data-by

definition. Inthispaperwe shall investigate whether

the intercept ofthe variogramstillestimatesanugget effect in case

of

heterogeneous variances. Moreover, in simulation the variance may besolarge that simulation is actually impractical: as the traffic load approaches the value one, the varianceexplodes-as the M/M/1 illustrates; see (2). Therefore ChengandKleijnen(1999) recommend that

(i) in suchasituation we should simulate a load much lower than one,and_extrapolate;

(43)

Robustness of Kriging when Interpolating in Random Simulation with 33

Heterogeneous Variances: Some Experiments

Likewise,Asmussen (1992) recommends not tosimulate the M/M/1 foratraffic rate s > 0.9; he too suggests to extrapolate inthose cases.VanBeers and_Kleijnen_(2003),_{however, simulated}

loads up to

0.99-but

theyart(ticially keptthevariances constantandsmallovertheirwhole range

of

simulated values, 0.01 5 3 5 0.99.

Initially,we samplea_{single output value for each of}21 _{input (traffic}_rate)_{values, as}

follows. We use (3) where we insert (1) and (2),whichmodel the signal and the variance functions of the M/M/1. We do so for21 traffic rates s e *101,0.05,0.10,...,0.95,0.99}. We

ignorethevariance heterogeneity andthevariance explosion,and_{proceed as in Van Beers and}

Kleijnen (2003).We thengetestimated variogramslikeFigure 1.Thisfigure showsa

wildly

oscillatingvariogram; obviously, thelinearapproximation is poor. This poor

fit

gives an inaccurateKrigingmetamodel withtheweights _A, _{defined in (5).}

15000 10000 -21(h)

...

5000- , I ..-I-... 0 01 0.2 0.3 04 05 0.6 0 7 0 8 0. 1 h

Figure1: _{Estimated variogram for Z(s) = s/(1 - s) + 11(s) with 11(s) - NID ), 3(2 - s)/(1 - s)2 )}

Figure 1 illustrates that in random simulation we should ensure thatthesignaYnoise

ratio

is 'adequate'.Torealize suchasignal/noise ratio, we should-first of all-avoid variance explosion; that is, we should simulatea_{non-saturated trairic load: that is. a load much lower than}

(44)

Comparison withthevalues onthey-axis

of

Figure 1 shows thattherange ofthevariogram is

much smaller,sovarianceshaveindeedbeen decreased.

. .. 14- * 12- 10-.. 2,(h) ' 6. -,

...

0 0.1 02 03 04 05 06 07 0. 09 1 h

Figure2: _{Variogram after}elimination of s = 0.99 in Figure 1

Besidesavoidingvariance explosion, we may reducethemagnitudes oftheremaining variances through increasingthe_{number of}_replicates_{(say) m-whenever the noise of a single}

replicate is too big. We denotethenumber

of

identicallyandindependentlydistributed (IID) replicatesfor input value s by m(s). Thenoneoption istoselect_{these m(s) such that the}

variances ofthe average responsesbecomeaconstant (say) 02:

t- 1 Var(Z(s))

Var(Z(s))= , = 02, (6)

m(s)

see Kleijnen and Van Groenendaal (1995).

Note:Insteady-state

simulation-as

opposed toterminating simulation-we may make a single long runand_{partition that run}into_{subruns that play the role}of_{replicates; see Law and} Kelton (2000).

In practice, however,the_{experimental design}implied by (6) maybe_{impractical (see}

(45)

Robustness of Kriging when Interpolating in Random Simulation with 35 Heterogeneous Variances: Some Experiments

averages

_Z(s).

These averages are'the' observations attheinputs s towhich we fitaKriging metamodel,asfollows.

3.4 Novel

_Kriging

method:

Studentization

To solvethe_{variance heterogeneity problem, we}first_{apply Van}Beersand_{Kleijnen (2003) 's}

Detrended_Kriging,whichassumes

Z(s)= SCS)+11(s)+ a (s) (7)

where S(s) is the signal function, 11(s)- _{NID(9,02) is the white}_{noise model for measurement}

error, and b (s) denotes_fittingerror. However,unlikeVanBeers and_{Kleijnen (2003) we now}

allow 11(s) to haveheterogeneousvariancesdetermined bytheinput s; see (3).

Unbiased estimators

of

these

variances-not

_{depending on}a_{metamodel such as (7»are}

theclassic estimators

I ", 4,(s,) - 2(s, ,)'

a2 (st)= "' (8)

m(st)-1

where Z,(s,) denotes the

r

replicated observation on Z(s,) and

zcs,)= Ir:.)z.(st)

(9)

m(si)

denotes_{its sample}_average.

LikeVanBeers andKleijnen (2003), wemay estimatethesignalfunction _{S(s) through}

(46)

unequal variances, so the best linear unbiased estimator (BLUE) is given by weighted least

squares(WLS)-not ordinary least squares (OLS).

Next,we_applyOrdinaryKriging tothedetrendedandstandardized observations

Z (S)=Z(S) - S(S) (10)

0(s)/ fm

Because (10) resembles Student'stStatistiC, we callthetransformation in(10) 'Studentizing'. If we neglect the random character of S(S), then we conjecture thatthetransformed data

Z(s) in (10) haveaconstantmean_{(namely zero) and}a_{constant variance, namely}

_{IN(m -2).So}

2(s) then satisfies the Ordinary Kriging assumptions.

Finally,the predictor for so (say) p(Z(so)) follows from (10):

d(so)

p(Z(S )) = P(Z(sonx--+

SWA, (11)

qm

where p(Z(so)) isthe OrdinaryKriging predictor for _so based onthe transformed data in (10);

d (so) isgiven by (8) if so

-st;

otherwise, weusepiecewise linear interpolation between the variances at the two neighboring points thathavealreadybeenobserved. (This interpolation avoids negative estimated variances.)

3.5 Numerical

_examples

Asthe_{signal jiinction in (3)}wefirst_{select S(s) = s/(1 -}s), whichequals theM/M/1hyperbola in (1). FollowingVanBeers and_{Kleijnen (2003),}we_{experiment with five}_{detrending models,}