Using Bayesian deep learning approaches for uncertainty-aware building energy surrogate models

(1)

Citation for this paper:

Westermann, P., & Evins, R. (2021). Using Bayesian deep learning approaches for

uncertainty-aware building energy surrogate models. Energy and AI, 3, 1-13.

https://doi.org/10.1016/j.egyai.2020.100039.

UVicSPACE: Research & Learning Repository

_____________________________________________________________

Faculty of Engineering

Faculty Publications

_____________________________________________________________

Using Bayesian deep learning approaches for uncertainty-aware building energy

surrogate models

Paul Westermann & Ralph Evins

March 2021

© 2021 Paul Westermann & Ralph Evins et al. This is an open access article distributed under the terms of the Creative Commons Attribution License. https://creativecommons.org/licenses/by-nc-nd/4.0/

This article was originally published at:

https://doi.org/10.1016/j.egyai.2020.100039

(2)

ContentslistsavailableatScienceDirect

Energy

and

AI

journalhomepage:www.elsevier.com/locate/egyai

Using

Bayesian

deep

learning

approaches

for

uncertainty-aware

building

energy

surrogate

models

Paul

Westermann

∗

_,

_Ralph

_Evins

Energy and Cities Group Department of Civil Engineering, University of Victoria, Canada

h

i

g

h

l

i

g

h

t

s

g

r

a

p

h

i

c

a

l

a

b

s

t

r

a

c

t

•Developinguncertainty-aware engineer-ingsurrogatemodels.

•Comparing deep Bayesianneural net-worksandGaussianprocessmodels.

•Uncertaintyestimatescanidentifyand mitigateerrorsinsurrogatemodels.

•Aconcepttohybridizeengineering mod-elsanddata-drivenmodels.

a

r

t

i

c

l

e

i

n

f

o

Article history: Received 6 October 2020

Received in revised form 17 November 2020 Accepted 7 December 2020

Keywords:

Surrogate modelling Metamodel

Building performance simulation Uncertainty

Bayesian deep learning Gaussian Process Bayesian neural network

a

b

s

t

r

a

c

t

Fastmachinelearning-basedsurrogatemodelsaretrainedtoemulateslow,high-ﬁdelityengineeringsimulation modelstoaccelerateengineeringdesigntasks.Thisintroducesuncertaintyasthesurrogateisonlyan approxi-mationoftheoriginalmodel.

Bayesian methods canquantifythat uncertainty,anddeeplearning modelsexist thatfollow the Bayesian paradigm.Thesemodels,namelyBayesianneuralnetworksandGaussianprocessmodels,enableustogive predic-tionstogetherwithanestimateofthemodel’suncertainty.Asaresultwecanderiveuncertainty-awaresurrogate modelsthatcanautomaticallyidentifyunseendesignsamplesthatmaycauselargeemulationerrors.Forthese samplesthehigh-ﬁdelitymodelcanbequeriedinstead.ThispaperoutlineshowtheBayesianparadigmallows ustohybridizefastbutapproximateandslowbutaccuratemodels.

Inthispaper,wetraintwotypesofBayesianmodels,dropoutneuralnetworksandstochasticvariationalGaussian Processmodels,toemulateacomplexhighdimensionalbuildingenergyperformancesimulationproblem.The surrogatemodelprocesses35buildingdesignparameters(inputs)toestimate12annualbuildingenergy perfor-mancemetrics(outputs).Webenchmarkbothapproaches,provetheiraccuracytobecompetitive,andshowthat errorscanbereducedbyupto30%whenthe10%ofsampleswiththehighestuncertaintyaretransferredtothe high-ﬁdelitymodel.

1. Introduction

Awealthofconceptsexisttoexplorethedesignofnewandexisting buildingstoimprovethebuildingsector’slargeclimatefootprint[1]. Scalingthemischallenging,asusuallyeachbuildingisdesigned individ-ually,respondingtotheculturalcontext,climaticconditions, surround-ingbuildingsanddesignpreferences.Thisimpedesthedistributionof

Abbreviations:BDL,Bayesiandeeplearning;BNN,Bayesianneuralnetwork;SVGP,stochastic-variationalGaussianProcess;DoE,design-of-experiment;ReLU, rectiﬁedlinearunit.

∗_{Corresponding}_author.

E-mailaddresses:pwestermann@uvic.ca(P.Westermann),revins@uvic.ca(R.Evins).

centrally-deriveddesignparadigmstothelevelofindividualbuilding projects.

Architectsandengineers playa vitalroleinbridgingthegap be-tweenhigh-levelideasandindividualbuildingprojects.Oftentheyuse buildingperformancesimulation(BPS)toolstoassesstheenergyand environmentalperformanceofvariousdesignoptionsandbalancethem againstdesignpreferences.Thecomputationalexpenseandassociated

https://doi.org/10.1016/j.egyai.2020.100039

(3)

Fig.1.Distributionoferrorsofasurrogatemodel.Theplotshowstheerror ofasurrogatemodelwhichemulatesthesimulationoftheheatingdemandofan oﬃcebuilding(seecasestudyinSection4).Whiletheaverageabsoluteerror𝐴𝐸 andabsolutepercentageerror𝐴𝑃𝐸arelow(indicatedbytheredlines),large errorscanoccur.Thisstudyaimstoidentifythelargeerrorsusingestimatesof surrogatemodeluncertainty.

waitingtime,however,prohibitsexhaustive designspaceexploration andoptimization.This hasledresearcherstotrainmachinelearning modelsonsimulationinputandoutputdatatoemulatebuilding simu-lationmodels[2].

Thecomputationalspeedoftheseso-called‘surrogatemodels’has beenthebasisforarangeofinnovationsinthefieldofbuilding simu-lation,forexample,interactiveearly-stagedesigntools(e.g.ELSA[3], BuildingPathfinder[4],Net-ZeroNavigator[5]),fasteroptimization al-gorithms[6],anddetaileddesignsensitivityanduncertaintyanalysis [7][8].Arecentsurveyofbuildingdesignersconfirmedthatthosewho receivedrealtimefeedbackfrom asurrogatemodelarrivedat higher performingbuildingdesigns[9].

Thegrowinguseofsurrogatemodelsturnsattentiontothe robust-nessoftheiraccuracy.Theaccuracyofasurrogatemodelismeasuredby theerrorofthesurrogatemodeltoestimatethephysics-based simula-tionresults,whichisconsideredthegroundtruth.1_Studies_have_shown

satisfactoryaverageaccuracyontestdata[11]whichcanbeinﬂuenced bythetypeandthecomplexityofinputs[12]andtheselectionof out-puts[5].

Nonetheless,averageerrorscomputedontestdatacanbedeceiving (seeFig.1).Testdatausuallyconsistsofdesignsamplesdistributed uni-formlyinthedesignspaceandmaynotreflecttheportionofthespace thebuildingdesignerisinterestedin.Largeerrorsonspecificbuilding designsmayoccur(i.e.heteroscedasticityoftheerrors),affecting impor-tantdesignchoicesandpotentiallyloweringtheenergyperformanceof thefinalbuildingdesign.

Bayesianmethods oﬀer a frameworkto quantify theuncertainty stemmingfromtheinadequacyofanapproximatemodel(epistemic un-certainty)andrecentdevelopmentsin Bayesiandeep learning(BDL)

1 _Please_note,_that_the_surrogate_model_accuracy_does_not_reﬂect_how_well

theunderlyingsimulationmodelmatchesareal-worldbuilding.Thereaderis referredto[10]andmanyotherstudies,thataddressthegapbetweensimulation modelandtherealbuilding.

managed tointegrateBayesianconceptsinto largemachine learning models[13,14].AsaresultBDL-basedsurrogatemodelscan express forwhichinputstheirestimatesareuncertain.Inourcase,aBayesian surrogatemodelproducesabuildingperformanceestimateasa prob-abilitydistribution,wheretheentropyorvarianceofthatdistribution allowustoquantifytheuncertainty.Thearchitectorbuildingdesigneris thereforeprovidedwithalevelofconfidenceintheperformanceresults andthuscandefineuncertaintythresholdsabovewhichthehigh-fidelity model,heretheBPStool,isqueriedtoguaranteehighconfidenceresults (seeFig.2).

Inthisstudy,weexploretwodiﬀerentBayesianmodels,Bayesian neuralnetworks[15]andstochasticvariationalGaussianprocess mod-els [16], toquantify epistemic uncertaintyin surrogatemodels(see Section2).Bothmodelswerechosenastheyscalewelltolarge sur-rogate modelling problemswith many inputsandoutputswhich re-quirestotrainthemodelsonlargedatasets.Webenchmarkthe over-allaccuracyagainstnon-Bayesiansurrogatemodels,validatethe qual-ityoftheuncertaintyestimate,andquantifyhowahybridizationoffast butapproximateandslowbutaccuratemodelsreducestheerrorofa surrogatemodelwhilecomputationalcostsincreaseonlyslightly(see Section5ﬀ.).

2. Background

2.1. Motivationforsurrogatemodelling

Thecoremotivationtoemulateaphysics-basedhigh-fidelitymodel iscomputationalefficiency;simulationoutputscanbeestimatedmany ordersofmagnitudefaster,effectivelyinreal-time.Thisallowsa holis-ticdesignspaceanalysiswhichwouldbeinfeasiblewithaslow simu-lationmodel.Variousapplicationsofsurrogatemodellingarefoundin thebuildingdomainaswellasotherdomains[18,19]:

• Generaldesignspaceexploration:Therelationshipbetweendesign parametersandperformanceisinteractivelyexploredtoimprovethe user’sunderstandingofthedesignproblem [9,20]. Thiscan hap-penonthesinglebuildinglevelorontheurbanlevel[21].Oftena parallel-coordinatesplotisusedtovisualizethemulti-dimensional problemspace[5].

• Designoptimization:Thesurrogate modelis trainedandqueried toaccelerateiterativeoptimizationalgorithms[22–24].Adaptively trainingthesurrogatemodelonnewsimulationsamplescollected ateachoptimizationiterationcanfurtherincreaseoptimization per-formance[6].

• Sensitivityanalysis:Thesurrogatemodelisusedtoruntheextensive sampling(thousandsofsimulationruns)requiredforglobal sensitiv-ityanalysismethods[7].

• Designuncertaintyanalysis:Severaltypesofuncertaintiesexist dur-ingthebuildingdesignprocess-causedbyundetermineddesign pa-rameters,uncertaincontextualparameters(e.g.surrounding build-ings,carbonfactors,etc.),andvaguedesignconstraints[25].This uncertaintyisoftenquantiﬁedusingMonteCarlosamplingmethods, wheresamplesfrom uncertainparameter distributionsaredrawn andsimulatedtoquantifyhowthatparameteruncertainty propa-gatestobuildingperformanceuncertainty.Withasurrogatemodel, theseuncertaintiescanrapidlybecalculatedandupdated through-outthedesignprocess[8].

• Simulationmodelcalibration:Anaccuratecalibrationofa simula-tionmodelisrequiredtoassessretroﬁtdesignchoicesforan exist-ingbuilding.Thecalibration,i.e.theprocessofdetermining uncer-tainbuildingparameters,oftenrelieseitheroniterative optimizia-tionalgorithms[26],oronBayesiancalibrationoftheseuncertain parameters[27]. Inboth cases simulationsareiteratively run to closelymatchsimulationoutputswithmeasuredsensordataby ad-justingtheunknownparameters.Onecanusesurrogatemodelsto reducethecomputationallimitationsoftheseapproaches.Notethat

(4)

Fig.2. Uncertaintyestimatestolinkhigh-ﬁdelitymodelandasurrogatemodel.Thesurrogatemodelprovidesbothaperformanceestimatê𝑦𝑠𝑢𝑟𝑟𝑜𝑔𝑎𝑡𝑒andan

uncertaintyestimatê𝜎𝑠𝑢𝑟𝑟𝑜𝑔𝑎𝑡𝑒.Iftheuncertaintyislarge,ahigh-ﬁdelitymodel(e.g.abuildingenergysimulation)isqueriedtoproduceaccurateestimates𝑦𝑠𝑖𝑚ofan

engineeringdesign(e.g.abuilding).Pleasecompareto[17]whointroducedasimilarconcept.

simulationmodelcalibrationcanbedonebothforaspeciﬁc build-ing[28]ormultiplebuildings[29].Thelattercommonlyrequiresan archetypemodelwhoseparametersarerepeatedlycalibratedusing measurementsoftheconsideredbuildings[30].

2.2. Surrogatemodelderivation

Insurrogatemodelling,weﬁtamachinelearningmodeltoa simu-lationdataset𝐷={𝑥_𝑛,𝑦_𝑛}𝑁_𝑛

=1=(𝑋,𝑌)consistingof𝑁 samples,where

theinputs𝑥𝑛 correspondtothesimulationparametersand𝑦𝑛 to

real-valuedoutputsofthesimulationrunrecordedforsample𝑛[19].2_In_the

caseofbuildingenergysurrogatemodels,thesimulationparametersare thebuildingdesignparameters(e.g.insulationvalueofthewalls)and theoutputsarethesimulatedbuildingperformancemetricslikethe ag-gregatedannualenergyconsumptionorgreenhousegasemissions[2]. Studiesalsoexistwithtimeseriesoutputs,likehourlyenergydemand [21].

Forderivingthesurrogatemodelthemodellerﬁrstneedstocarefully specifythedesignproblem,whichincludeschoosingthefreedesign pa-rametersandtheperformanceobjectivesaswellasallotherimportant contextualparameters(surroundingbuildings,etc.).Thensimulations areruntocreatethesimulationdataset𝐷.Theideaistogainmaximum informationaboutthedesignspace(thecollectionofallpossible pa-rametercombinations)persimulationrun.Tailoredsamplingschemes exist,calleddesign-of-experimentmethods[31],e.g. Latin-Hypercube-samplingthatuniformlydistributessamplesinthemultidimensional in-putspace.Thenumberofsamplesmustbespeciﬁed(e.g.10-1000 sam-plesperparameterdimension[2])andisadjustedifmodelaccuracyon testsamplesistoolow.

2.3. Accuracyinsurrogatemodelling

Theaccuracyof a surrogatemodelis quantiﬁed byhow wellits buildingperformanceestimatesmatchtrue,physics-based simulation outputs.Weassumethesimulationmodelasourground-truthmodel, anddisregardthemismatchbetweenthesimulationmodelandthe real-worldbuildingwhencalculatingthesurrogate’s accuracythroughout thepaper.

Metricslikethecoeﬃcientofdetermination(𝑅2_),_the_mean_absolute

percentageerror (𝑀𝐴𝑃𝐸), or theroot-mean-squared-error (𝑅𝑀𝑆𝐸) can be used to quantify accuracy[32]. Basedon [5,11], accuracies of𝑅2_>₀_.₉₉_are_feasible_when_estimating_annually_aggregated

perfor-mancemetrics,e.g.heatingdemand,buttheycanbesigniﬁcantlylower whenmorecomplexperformancemetricsareestimated.

As mentioned above, surrogatemodel accuracy is commonly re-portedasonemetric,implyinghomoscedasticerrors.Thismaynot al-wayshold,i.e. theerrors maydependon thechoice of inputs (het-eroscedasticity).ByusingBayesiandeeplearning[13],weaimtotrain surrogatesthatareawareofwhereinthedesignspace,i.e.forwhich

2 _Also_categorical_outputs_can_be_considered_but_practical_examples_are

lack-inginbuildingsimulationliterature.

buildingdesigns𝑥∈𝑋,themodelisuncertainandmayproducelarge errors.

2.4. Uncertaintyinsurrogatemodels

Amathematicalfunction𝑓 ofthesimulationisnotexplicitly avail-able. We use thesurrogatemodel toﬁndan estimate 𝑓̂to approxi-mate thatfunction. Themost importantcause ofuncertaintyin sur-rogate modellingishowplausiblethedetermined𝑓̂is(model uncer-taintyorepistemicuncertainty)[13].Forthemostpart,thisuncertainty is causedbythetrainingset𝐷=(𝑋,𝑌)which containsonlya ﬁnite setofpointswithinthespaceofpossiblesimulationparameter combi-nations𝑋 (thedesignspace) andassociatedbuildingperformance𝑌. Theoretically,epistemicuncertaintycanbereducedtozerogivenmore andmoredata[13].

Weconsidertheproblemofsurrogatemodellingasfreeofaleatoric uncertainty,whichrepresentsnoiseorotherunknowns impactingthe observations.3_Therefore,_we_only_deal_with_epistemic_uncertainty._We

proposethatquantifyingthisuncertaintycanbeapowerfulaidin sur-rogatemodellingasitacknowledgesthatwehavetotrainourmodel withalimitednumberofsimulationsamplesthatrepresentafractionof thedesignspace,whichmakesthesurrogatemodeluncertain.Bayesian modellingnowallowsustoreasonunderthatuncertainty,whilestill beneﬁtingfromtheadvantagesofsurrogatemodelling,i.e.the compu-tationaleﬃciencyforlargescaledesignspaceexploration.

2.4.1. Othersourcesofuncertaintyinbuildingperformancesimulation

Thescopeofthisstudyisspecificallysetonestimatingthe uncer-taintycausedby trainingasurrogatemodeltoemulateasimulation model(seeFig.3).Itdoesnotconsiderorcomputeanyothersourcesof uncertaintyprevailinginbuildingperformancemodelling,whichmay includeuncertaintyindesignparameterandmodelspecification, uncer-taintyinthepropertiesofthefinalconstructionanduncertainty stem-mingfromassumptionsofinternal(e.g.occupantbehaviour)and exter-nal(e.g.climate)conditions[25].Whereuncertaintyinsurrogate mod-ellingispurelycausedbythemodellingprocess(epistemic),uncertainty inspecifyingasimulationmodelisaleatoric.Formoreinsightsonthe uncertaintiestacklingthemismatchbetweenthesimulationmodeland theconstructedbuilding,thereaderisreferredto[34]instead.

3. Bayesianmodellingforsurrogatemodels

Bayesian probability theory oﬀersus grounded tools toquantify modeluncertainty[35].

To understand thecoreidea of Bayesian modelling, we consider a parametricmodel 𝑦=𝑓(𝑥,Θ), where 𝑥 is the input, 𝑓 isa space of possiblemodels(seeFig.4)andΘisthesetof modelparameters

3_In_the_case_of_sensor_data,_this_can_correspond_to_sensor_noise._Here,_we

considersimulationrunstobedeterministic,i.e.theimpactofnumericalnoise tobesmall.Inthecaseofnumericalbuildingsimulation,hereEnergyPlus[33], thiscorrespondstothenumericalnoiseofsolvingthethermodynamic-based diﬀerentialequations.

(5)

Fig.3. Uncertaintyinsurrogatemodelling,anduncertaintyinbuildingperformancesimulation.

Fig.4. HeatingdemandestimatedwithaBayesianneural network,andtheassociatedepistemicuncertainty.In par-ticular,theuncertaintyofthesurrogatemodelislargewhenthe buildinghasawallthicknesswiderthan1𝑚,whichiswiderthan allsamplescontainedinthetrainingdata(out-of-sample).

(forexample,theweightsina neuralnetwork). Insteadof ﬁndinga singleΘ,inBayesianmodellingwesearchforacollectionofΘ,which likelyhasproducedtheoutput𝑌 given𝑋.Inourcasewesearchfora collectionofsurrogatemodelswithdiﬀerentweights.

TheBayesiantheorem,asshowninEq.(1),isappliedtofinda collec-tionwhichlikelyhasproduce𝑌given𝑋.Basedonourpriorknowledge onthedistributionofthemodelweights𝑝(Θ)andcombinedwiththe likelihoodfunction 𝑝(𝑌|𝑋,Θ)=∏𝑁_𝑛₌₁𝑝(𝑦_𝑛|𝑥_𝑛,Θ),which quantifiesthe probabilitythataspecificmodelparametersetgeneratedthe observa-tions(𝑋,𝑌),theposteriorofthemodelparameterscanbecomputed.

𝑝(Θ|𝑌,𝑋)= 𝑝(𝑌|𝑋,Θ)𝑝(Θ)

𝑝(𝑌|𝑋) (1)

where𝑝(𝑌|𝑋)iscalledthemarginallikelihood.Itrepresentsthe proba-bilityoftheobserveddatagiventhemodel𝑓 withallpossiblemodel pa-rameters.Itisascalarthatnormalizestheposterior.Giventheposterior, wecannowinferaboutfuturedatainformofapredictivedistribution:

𝑝(𝑦∗|𝑥∗,𝑋,𝑌)=_{∫ 𝑝}(𝑦∗|𝑥∗,Θ)𝑝(Θ|𝑋,𝑌)𝑑Θ (2)

Themeanandvarianceorentropycanbederived,wherethelattertwo provideinformationontheuncertaintyintheestimatedvalues.Inthe buildingsurrogatemodellingsetting,wepredictanexpectedbuilding performance,e.g.annualheatingdemand,andanassociateduncertainty givenbuildingdesign parameters,e.g.thethicknessof thewall(see Fig.4).

3.1. Variationalinference

Thetrueposterioroftheweights𝑝(Θ|𝑌,𝑋)however,iscommonly intractable.This isparticularlythecaseinthebigdataregimewhen morecomplexmodelsarerequired[16].Inthesmalldataregime (be-lowafewthousandsamples)posteriorinferencewithastandard Gaus-sianProcessBayesianmodelisfeasibleandwassuccessfullyappliedfor buildingsurrogatemodels[28,36].However,withincreasing complex-ity,forexamplemoreinputsandoutputs(e.g.[12]),standardGPshave majorshortcomings:

• Themodelcomplexityislimitedasitonlyconsistsofonelayer,i.e. theoutputsoftheGParenotusedasinputstoanotherGP.This prohibitsmodelinghierarchicalstructuresandabstractinformation [14].

• Computationalcostincreasewiththecubically((𝑛3₎₎_with_the

num-berofsamples𝑛.Thisprohibitsincreasingthesizeofthesurrogate modeltrainingsettoimprovethemodelaccuracy(forexample,to trainacomplex,tailoredkernelwithmanyhyperparameters[35]). Instead,recentadvancesinvariationalinference(VI)allowusto ap-proximatethetrueposteriorofΘinbigdataproblems[37].Wepick anapproximatevariationaldistributionoverthe(latent)model param-eters𝑞𝜈(Θ)withitsownvariationalparameters𝜈.Nowwesearchfor𝜈

thatminimizesthedivergencetothetrueposteriorwhichisquantiﬁed bytheso-calledKullback-Leibler(KL)divergence.Therebythe marginal-ization,i.e.theintegrationrequiredtocalculatethetrueposterior,is turnedintoanoptimizationproblemwhichisofteneasiertosolve.The

(6)

approximativedistributionof𝑞 canbeusedtoformpredictionsabout unseensamples.

3.1.1. Variationalinferencefortrainingscalablesurrogatemodels

Scalablevariational inferencemethodshavebeendeveloped both todoapproximative inferencewithBayesianneuralnetworks(BNN) [13]andwithGaussianprocessmodels[38].Wepickedoneapproach ofeachtype(BNN,GP)thatcanbeused”oﬀ-the-shelve”,thatis scal-ableto10’000andmoretrainingsamples, andthathasshownhigh performanceinpreviouspublications[16,17].Theyareintroducedin thefollowingsections.

The interested reader is referred to [39] for an introduction to Bayesiandeeplearningapproaches.Pearceetal.[40]providesa com-parisonof variousBNNtypes;diﬀerent Gaussianprocessmodeltypes whichrelyonvariationalinferenceareexplainedin[38].

3.2. DeepBayesianneuralnetworks

Theconceptof aBayesianneuralnetwork(BNN) isan extension ofstandard networkarchitectures (e.g.feed-forward neuralnetwork, convolutionalneuralnetwork,orrecurrentneuralnetwork)tofollow theBayesianmodellingparadigm[41].InaBNNwesampletheneural networkweightsfromapriordistributionratherthanhavingasingle ﬁxedvalueasinnormalneuralnetworks,forexample,froma Gaus-sianΘ ∼𝑁(0,𝐼)[39]. Insteadof optimising thenetworkweights di-rectly, we average over all possible weights, called marginalisation. GiventhestochasticoutputoftheBNN𝑓Θ₍_𝑥₎_,_we_receive_a_model

like-lihood𝑝(𝑦|𝑓Θ₍_𝑥₎₎_._Based_on_the_dataset_𝐷,_Bayesian_inference_is_used_to

computetheposteriorovertheweights𝑝(Θ|𝑋,𝑌).Thisposterior cap-turesthesetofallplausiblemodelparameters.Thisdistributionallows predictionsonunseendata.

Asmentionedabove theexactposterior isintractable, and diﬀer-entapproximationsexist[15,40].Intheseapproximateinference tech-niques,theposterior𝑝(Θ|𝑋,𝑌)isﬁttedwithasimpledistribution𝑞(Θ). HereweconsidertheDropoutvariationalinferenceapproachasithas showngreat performancewhenbenchmarkedagainstother methods [15,17].

3.2.1. Dropoutvariationalinference

Dropoutvariational inferenceisa variationalinferenceapproach, i.e.itallowstoﬁnda𝑞∗

𝜈(Θ)thatminimisestheKullback-Leibler

diver-gencetothetruemodelposterior,thatneitherrequirestochangethe architectureofcommonnetworkarchitecturesnortochangethe opti-misationalgorithmfortrainingthenetwork[39].Theinferenceofthe posteriorisdonebytrainingamodelwhichusesstochasticdropouton everyneuronlayer[42](seeFig.5).Thisstochasticdropoutisalsoused toremoveneuronswhenperformingpredictions.Byrepeatingthe pre-dictions(stochasticforwardpasses),wecreateadistributionofoutputs, whichwasshowntominimizetheKLdivergence[39].

This KL divergence objective is formally given in thefollowing, whereweapproximate𝑝(Θ|𝑋,𝑌)with𝑞(Θ)[13,39]:

Ł(Θ,𝑝)=−1 𝑁 𝑁 ∑ 𝑖=1 log𝑝(𝑦_𝑖|𝑓̂Θ_𝑖₍_𝑥 𝑖))+1₂−_𝑁𝑝||𝜃||2 (3)

with𝑁 datapoints,dropoutprobability𝑝,weightsampleŝΘ_𝑖∼𝑞∗ 𝜈(Θ),

and𝜃 thesetofthesampledistribution’sparameterstobeoptimised (weightmatricesinthedropoutcase).Notethatforeachdatapointin thetrainingsetdropoutisapplied,whichprovidesuswith𝑁 samples

ofΘ_𝑖.

Whenperformingdropoutvariationalinferencethe𝑇stochastic for-wardpassesprovideuswiththeepistemicuncertaintygivenbythe vari-ance𝑉𝑎𝑟(𝑦): 𝑉𝑎𝑟(𝑦)≈ 1 𝑇 𝑇 ∑ 𝑡=1 𝑓̂Θ_𝑡₍_𝑥₎𝑇_𝑓̂Θ_𝑡₍_𝑥 𝑡)-𝐸(𝑦)𝑇𝐸(𝑦) (4)

withpredictionsinthisepistemicmodeldonebyapproximatingthe pre-dictivemean:𝐸(𝑦)≈_𝑇1∑𝑇_𝑡₌₁𝑓̂Θ_𝑡₍_𝑥₎_._Note_that_in_this_formulation_we

as-sumednonoiseinherentinthedataandtherefore,𝑉𝑎𝑟(𝑦)iszerowhen wehavenoparameteruncertainty.

3.3. Gaussianprocessesinthebigdataregime

Gaussian Processes models are attractive for non-parametric Bayesian modelling [35]. They use a Gaussian Process prior for a stochastic, latent function 𝑓 to describetherelationship between 𝑋

and𝑌 (seeFig.5).Thefunctionvalues 𝑓(𝑥)areassumedtobe sam-pledfromthatGaussianwithzeromeanandcovariancematrix𝐾,i.e.

𝑓∼(0,𝐾).Thechoiceofcovariancefunctionimpactsvariousaspects of theGP modelandalso determineswhich modelparameters Θto betuned.ThesemodelparametersareoptimizedwhentrainingtheGP model.

However,giventheabove-mentionedlimitationsofstandard Gaus-sianProcessmodels(seeSection3.1),sparseGPapproximationshave beendevelopedtohandlelargedatasetsbyloweringthecomputational complexityto(𝑛𝑚2₎_[38,43]_.4_They_rely_on_the_use_of_inducing

vari-ables(orpseudo-inputs),i.e.areducedsetoflatentvariableswithsize

𝑚<<𝑛torepresenttheactualdataset𝐷 with𝑛samples.The𝑚inducing pointsareGPrealisations𝑢=𝑓(𝑧)attheinducinglocations𝑍 whichare inthesamespaceastheobservedinputs𝑋 (butnotnecessarilypartof

𝑋).WhentrainingtheSVGP,thelocationsoftheinducingpoints𝑍 and

thecovarianceparametersΘareoptimallychosentominimizetheKL divergence.Importantisthatthelocations𝑍 areparameterstoshape thevariationalapproximatedistribution𝑞(𝑓),ratherthanbeingpartof themodelparametersΘ,i.e.thecovariancefunctionwithparametersΘ arecalculatedfortheinducinglocations𝑍.

IncomparisontosparseGPs[43],stochasticvariationalGPs[16] al-lowmini-batchtrainingwhichfurtherreducescomputational complex-ityto(𝑛_{𝑏𝑎𝑡𝑐ℎ}𝑚2₎_._Since_[16]_and_others,_{multi-layered}_deep_Gaussian

Processmodelshavebeendeveloped,too,butarenotconsideredinthis studyasourcasestudydatasetisstilloflimitedsizeandcomplexity [14,44].However,ourSVGPmodelmayberegardedasaone-layered deepGP[45].

4. Casestudy:surrogatemodelsforthedesignofnet-zeroenergy buildings

4.1. Objective

Weuseacasestudyonapopulartopicinthebuildingdomain,the designof buildingswithnet-zeroenergydemand,totrainandassess thetwoBayesianmodeltypesintroduced above.Itshallserveas an exampleshowcasingtheuseofbothmodeltypesforbuildingsurrogate modelling,butshouldnotbeconsideredasanexhaustivecomparisonof thetwo.Forthatpurposethereaderisreferredtootherstudiesinstead, e.g.[17,44].

4.2. Casestudybuilding

Weemulatethesimulationoutcomesofonearchetypebuilding con-tainedin theNet-Zeronavigatorproject[5]. AspartoftheNet-Zero navigatorproject,buildingsimulationsurrogatemodelsarehostedona web-platformwhichallowsuserstoreceivebuildingenergy consump-tionofarchetypebuildingsgivenalargesetofbuildingdesign param-etersinrealtime.Sofartheplatformreliedoncommondeterministic neuralnetworksurrogates,whosebuildingperformanceestimation ac-curacywasvalidatedonseparatebuildingdesignsnotcontainedinthe

4_This _blog _post _provides _a _summary _on _the _history _on _sparse

Gaus-sian Process models: https://www.prowler.io/blog/sparse-gps-approximate-the-posterior-not-the-model.

(7)

Fig. 5. Considered variational-inference approachestoturnexistingsurrogate mod-ellingarchitecturesintoscalableBayesian models[15,16].

trainingdata.Allthesimulationrunsfortrainingandtestingwere col-lectedusingthewell-knownbuildingperformanceassessmentprogram EnergyPlus[46].Currently,deterministicsurrogatemodelsareused.

Inthiscasestudy,webuildasurrogatemodelofamediumoffice archetypebuilding,where35designparametersarefreetochooseand thebuildingenergyperformanceisquantifiedby12 separate perfor-mancemetrics(seeFig.6).Theofficearchitectureisbasedonworkfrom theUSDOECanmet-Energywhichderivedcommercialprototype build-ingmodels.Thedevelopmentoftheparameterset,thechoiceof per-formancemetrics,andsoftwaretogeneratethe(parametric)simulation dataset,however,wasdevelopedindividuallyforthatproject,where theparameterrangesaredirectlybasedonrequirementsinthe Cana-dianbuildingsector[47].Themechanicalsystemsareparametrizedto captureawidevarietyofconfigurationsallowingdirectmanipulation oftheair-sidesystem(incl.heatrecoveryventilation,variouspump ef-ficiencies)andplantequipmentperformanceofvarioussystems(heat pump,electricresistance heater,biogasfurnace,natural gasfurnace, airconditioningsystem).ThisallowsustoexplorealargeHVACsystem designspaceonahigh-level(incl.multi-systemsetups).Alldetailson thebuildingmaybefoundin[5].

4.2.1. Datasetandtransformations

Wesamplethedesignspaceusing10’000simulationruns,wherethe individualparametercombinationsinthedatasetarepickedusingthe space-ﬁllingLatin-Hypercube-sampling(LHS) [31]. Similarly,werun additional3000simulationsanduseitasaseparatetestset.Thenumber

ofsimulationsrunsrequiredtoﬁtanaccuratesurrogatemodelwas pre-viouslystudiedin[5],whereitwasfoundthat10’000runsaresuitable fortheconsideredbuilding.Eachbuildingsimulationruntook approxi-mately2minand10susing1CPUand4GBRAM,butvarieddepending ontheparameterchoices.

Priortotraining,westandardizedtheuniformlydistributedinputs withdifferentrangestobenormallydistributedwithzeromean. Fur-thermore,wetransformedthe12outputvariablestoalsobeclosetoa normaldistribution.Therefore,adaptiveBox-Coxtransformationswas applied[48].Itadaptivelyfindstransformationparameterstotransform variouskindsofdistributions(hereof12differentoutputs)tonormal distributions.This,inparticular,increasedtheaccuracyofthe multi-outputneuralnetworkcomparedtoothertransformations.

4.3. Modelarchitectures

InthissectionweprovidedetailsonthedropoutBayesianneural net-workandthestochasticvariationalGaussianProcessmodelwetrained toemulatethesimulationmodelofthecasestudybuilding.

4.3.1. BNNmodelarchitectureandimplementation

WeimplementedadropoutneuralnetworkusingtheKeras Tensor-ﬂow API[49,50] basedon thework fromGalandGahramani[15]. Our network is a feed-forward neural network with 2 hidden lay-ers of 512 neurons which are activated with a leaky rectiﬁed lin-ear(ReLU) function. Trainingwasdonewithin 1200epochs usinga

(8)

Fig.6. Overviewofthecasestudybuilding.Thebuildingdesignparameterscorrespondtothesurrogatemodelinputsandtheannualperformancemetricstothe surrogatemodeloutputs.

batch sizeof 128 samples. A dropout rateof 5% was set.All men-tionedparameters (𝑛𝑙𝑎𝑦𝑒𝑟𝑠∈ [1,2,3], 𝑛𝑛𝑒𝑢𝑟𝑜𝑛𝑠=[256,512,1024], dropout

rate∈ [5%,10%,20%])wereanalysedina5-foldcross-validation.The modelwiththehighestaccuracyonthetestsetwaspicked. Further-more,weanalysedtheimpactofthedropoutrateontheuncertainty quality(seeSection4.4),butnosigniﬁcantchangeintheperformance wasobserved,whichagreeswiththeobservationfrom[15],thatthe un-certaintyestimatesofmodels,thatusediﬀerentdropoutrates,converge withthetrainingprogress.

4.3.2. GPmodelarchitectureandimplementation

We built a stochastic variational Gaussian Process model based on [16]using theGPy implementation[51]. Theﬁnal modelhas a Matern32covariancefunctionwithaﬁxednoiseterm(≈ 0.001%ofthe meanabsolute valueoftherespectiveoutput)anditusesaGaussian likelihoodfunction.Weappliedoneseparatelengthscaleperoutputfor thecovariancefunction.OursparseGaussianprocessmodelused400 in-ducingpoints,whichweinitializedrandomlydrawingfromauniform distribution.Trainingwasperformedonmini-batchesof100samples usingtheAdadeltaoptimizer.

Thecovariancefunctionwaspickedafterrunninga5-foldcross val-idation(bothsquared-exponential,andMatern32kernelswere consid-ered).Althoughtheobserveddatasetisdeterministic,weconsidereda ﬁxednoiselevelin themodel(≈ 0.001%of themeanabsolutevalue oftheoutputs)asitproducedmuch moreaccuratemodels.This im-pliesthatvarianceoftheonelayeredGaussianprocessmodelin[16]is toosmallandadeepGaussianprocessmaybeabetterchoiceforour problem.

4.4. Evaluationcriteria

Weevaluatethemodelswithregardtomultipleobjectives:(i)the model accuracy, (ii) uncertainty accuracy, (iii) the eﬀectiveness of uncertainty-estimate-basedissue-raising.

4.4.1. 𝑅2_score,_𝑀𝐴𝑃_{𝐸 and}_𝐴𝑃_𝐸

90scoretoquantifypredictionaccuracy

Ourerrormetricscovertwooftenusedmetricsintheﬁeld,i.e.the R2_[11]_and_the_Mean_Absolute_Percentage_Error_(MAPE)_[52]_.

R2₍_𝑌_,̂𝑌₎₌₁ -∑𝑛 𝑖=1(𝑦𝑖-̂𝑦𝑖) 2 ∑𝑛 𝑖=1(𝑦𝑖-̄𝑌)2 (5) MAPE(𝑌,̂𝑌)=1 𝑛 𝑛 ∑ 𝑖=1 |𝑦𝑖-̂𝑦𝑖| 𝑦𝑖 (6)

where ̂𝑌 correspondstothematrixofpredictedvalues,𝑌 isthematrix of simulatedbuildingperformancevalues.Whentheerrorterm,𝑌-̂𝑌 approacheszero,R2_approaches_one,_and_MAPE_goes_to_zero.

Thegiventwoerrormetricsprovideinsightintotheoverall perfor-manceofthemodels.However,theymaydisguiselargeerrorswhich occurfor fewsamples.Therefore,weaddedthe𝐴𝑃𝐸90 error.It

rep-resentsthe90thpercentileoftheabsoluteerrorssortedbyascending magnitude,andtherefore,allowstoestimatemaximummodelerrors whileaccountingforpossibleoccurrencesofoutliers.

4.4.2. Accuracyoftheuncertaintyestimate

Inawell-calibratedBayesianmodeltheuncertaintyestimates cap-turethetruedatadistribution,forexample,a95%posteriorconﬁdence

(9)

intervalalsocontainsthetruesimulationoutcomein95%ofthetimes [53]. Quantifyingthelevelofcalibrationis awell-knownconceptin classiﬁcation[54]buthasalsobeenusedforregressionproblems re-cently[53,55].

Formally,we say that theuncertainty estimatesof thesurrogate modelarewell-calibratedif

∑𝑁 𝑛=1{𝑦𝑡≤𝐹

-1 𝑡 (𝑝)}

𝑁 → 𝑝forallp∈ [0,1] (7)

where 𝐹𝑡 is the cumulated density function targeting 𝑦𝑡 and 𝐹𝑡-1=

𝑖𝑛𝑓{𝑦∶𝑝≤𝐹_𝑡(𝑦_𝑡)}isthequantilefunction.Hereweconsidereach pre-dictionasastandard,symmetricGaussiandistribution(𝜇(𝑋),𝜎(𝑋)).5

Theconﬁdenceintervalscanbecomputedusingtheinversecumulated densityfunction.Toassessthecalibrationquality,wecountthe frac-tionofobservationsinthetestdatafallinginthepredictionconﬁdence intervalsderivedfromthequantilefunction(seeFig.8,left).

We show the level of calibration of the Bayesian models in Fig.8(left),whereperfectlycalibrateduncertaintyestimateswouldbe alignedwiththediagonal.Toquantitativelycomparedifferent calibra-tioncurves,onecanalsocomputetheabsolutedifferencebetweenthe confidencecurveandthediagonal,calledthecalibrationerrororthe areaunderthecurve(AUC)[55].Theproblemofassessingthe calibra-tionqualitybasedonthecalibrationplotisthatitcansuggestperfect qualitywithhomoscedasticuncertaintyestimates,i.e.constant uncer-taintyestimatesforanyinput.Therefore,wealsoquantifythesharpness

oftheuncertaintyestimatesbycalculatingtheoverallvarianceinthe uncertainty[53](seeSection5).

4.4.3. Discard-rankingtoquantifytheeﬀectivenessofuncertainty estimatesforsurrogatemodelapplication

While havingaccurate uncertaintyestimates is the one thing,in buildingsurrogatemodellingwearemostly concernedwithwarning modelusers,whenthemodelisuncertainandrecommendtoratherrun asimulationinstead(seeFig.2).Therefore,wederivearankingofthe samplesinthetestsetbasedonthemagnitudeoftheiruncertainty.This providestwoconclusions.First,ifitstronglyoverlapswiththeactual surrogatemodelerrortheuncertaintyestimatesareaneﬀective het-eroscedasticwarningmechanism.Second,we canuse therankingto calculatehowmuchtheaverageerrorcanbereducedwhenreferring acertainpercentageofmostuncertainsamples(here10%or20%)to thehigh-ﬁdelitysimulationprogramthanprocessingitwithasurrogate model.

Bothaspectsareaddressedwhenplottingthemeanerrorcomputed ondiscretepercentiles ofthetestdata,wherethetestdataissorted bythemagnitude oftheuncertainty. Wecan comparethat curveto themeanerrorcomputed usingtestdatasortedbythemagnitudeof thecomputederror(oracleranking).Alargedistancebetweenthetwo curvescantellusthatthesurrogate’suncertaintyestimatesarenot help-fultopredictwhenitisinaccurate.Furthermore,bylookingattheslope ofthecurve,wecanseebyhowmuchthemeanerrorcanbereducedif wediscardallsampleswithuncertaintiesaboveacertainthreshold.

5. Results

Inthissection,weshowtheresultsofthecasestudywherewe de-riveduncertainty-awaresurrogatemodelstoreplacedbuildingenergy simulationmodels.

Inthecasestudy,wetrainedtwodiﬀerentBayesianmachine learn-ing models to provide epistemic uncertainty estimates, i.e. a deep Bayesiandropout neural network(here abbreviated byBNN) and a stochasticvariational Gaussian Process model(SVGP) approach. We scrutinizetheperformanceofbothapproachesbycomparingtheir pre-dictiveaccuracy,bycomparingthequalityoftheuncertaintyestimates,

5 _This_is_not_necessarily_true_and_possibly_a_{recalibration}_step_is_required_[53]_.

andbyquantifyinghoweﬀectivelytheuncertaintyestimatesallowus toidentifypossiblesurrogatepredictionerrors.

5.1. Modelaccuracyanduncertaintyquality 5.1.1. Accuracy

Webenchmarktheaccuracyofthetwomodeltypes,dropoutneural networksandSVGPmodels.Theperformancewasquantiﬁedusingthree performancemetricsasintroducedabove(seeSection4.4).Eachmodel wastrainedﬁvetimestogeneraterobustresults.Theresultsareshown inFig.7andTable1intheAppendix;detailsonthemodellayoutand trainingprocesscanbefoundinSections4.3.1and4.3.2).

Bothconsideredmodelsreachanaccuracyof𝑅2_>₀_.₉₇_on_all_the

outputs,whenpredictingbuildingperformanceofbuildingscontained in thetestdata. TheBNN is more accuratewith𝑅2_{⩾ 0}_.₉₉_(also _see

Table1).Meanpercentageerrorsof𝑀𝐴𝑃𝐸<13.2%fortheGPmodel and𝑀𝐴𝑃𝐸<9.82%forBNNwerefound.Thelargesterrorsoccurwhen estimatingtheenergydemandprovidedbydiﬀerentheatingsources(i.e. thediﬀerentfueltypes),andtheair-sidesystemenergydemand.Small surrogatemodelerrorsarefoundfortheotherbuildingperformance tar-getslikethephotovoltaic(PV)generation,orenergydemandforinterior lightsandequipment.

Toproverobustnessof surrogatemodelestimates,wespeciﬁcally lookatthelargesterrorsitproduces.Therefore,wecomplementour analysisofthemeanabsolutepercentageerrorwithananalysisofthe distributionoftheabsolutepercentageerrorsobservedforeachsample inthetestdata.Weextractthe90-thpercentileofthedistributionasa proxyofthelargesterrorfoundwhileignoringoutliers.Weabbreviate thismetricwith𝐴𝑃𝐸90.𝐴𝑃𝐸90errorsarefoundreachingupto22.3%

(30.5%)fortheBNN model(GPmodel),highlightingthedemandfor increasingtherobustness.

5.1.2. Uncertaintycalibration

Whenuncertaintyestimatesareperfectlycalibrated,thederived con-fidenceinterval,e.g.the90%confidenceinterval,containsthetrue out-comeintherightnumberofcases,i.e.90%ofthetimesforthegiven example.ThisisillustratedinFig.8,wherewecountedforhowmany timesthetruesimulationoutcomewascontainedintheestimated confi-denceinterval.WithaperfectlycalibratedBayesianmodeltheestimated confidenceandfractionofthetestsampleswithinthatintervalshould perfectlyalign(dashedline).Theregionbelowthedashedlineindicates anoverlyconfidentmodel(i.e.confidencebandsaretoonarrow),the regionabovethedashedlinemeansthatthemodelistoocarefulhaving toolargeconfidencebands.

WefindthattheBNNmodeliswell-calibrated,whiletheGPmodelis overlyconfident(Fig.8,left).Thelowqualityofuncertaintyestimates oftheGPmodelcanalsobeseenontheright,wherewedisplaythe distributionofalluncertaintyestimatescollectedforpredictionsofthe testdatasamples.TheaveragemagnitudeofuncertaintyintheGPmodel indicatesitstoohighconfidence,andthesmallwidthofthedistribution indicatesthattheuncertaintyestimatestendtobehomoscedastic,i.e. asimilar uncertaintyispredictedindependentlyofthemodelinputs. Thiswidthofthedistributionisalsocalledthesharpnessofuncertainty estimates(seeSection4.4).IncaseoftheBNN,thesharpnessisbetter anduncertaintyestimatesdepictasignificantlevelofvariance.

WecanconcludethattheuncertaintyestimatesoftheBNNare well-calibratedandprovideheteroscedasticuncertaintyestimates.

5.1.3. Usinguncertaintyestimatestoincreaserobustness

Inthissectionwestudyhoweﬀectivetheepistemicuncertainty es-timatesaretopredictinaccuraciesofthesurrogatemodel.

Theconceptisasfollows.Wesorttheuncertaintyestimatesonthe testdatabyscale,whereweassumethatsurrogatemodelestimatesare moreinaccuratewhenitisuncertain.Thesampleswithhighuncertainty willbeevaluatedbythehighﬁdelitysimulationprograminsteadofthe surrogatemodel(seeFig.2).Asa consequence,thesurrogatemodel

(10)

Fig.7.Summaryofresultsontheuseofdeep,uncertainty-awaresurrogatemodels.Theplotshowstheaccuracy,quantifiedusingthreedifferenterrormetrics, ofbothBayesianlearningapproachesforalltwelveoutputsconsideredinthecasestudy.Thefiguresalsoincludeperformancemetricswhenweusetheuncertainty estimatestoidentifyerror-pronesamplesinthetestdata(texturedbars,fordetailsseeSection5.1.3).

Fig.8. VisualizationofthequalityofuncertaintyestimatesoftheBNNandtheSVGP.Thequalityisquantiﬁedbyhowwell-calibratedandsharptheuncertainty estimatesare.Inbothregards,theBNNoutperformstheSVGPinthisstudy.

user,hereabuildingdesigner,isprovidedwithestimatesproducedby thesurrogatemodelonlywhenithashighconfidence,andwithactual simulationresultswhenthesurrogatemodelhaslowconfidence.The numberofsamplesprocessedbythecomputationallyexpensive simula-tionmodelshouldbetraded-off againstanincreaseinruntime.Here,we

handlethistrade-oﬀ bydeﬁninganuncertaintythresholdabovewhich thesimulationprogramisqueried.

Wedeﬁnethisthresholdasthe90th-or80th-percentileofall uncer-taintiesobservedonourtestdataset.Therationalebehindthatchoice is thatonly10%(or20%) ofallsamplesaretransferredtotheslow

(11)

Table1

ResultsoftheaccuracyoftheBayesianmodels.

𝑅 2 _{𝑀𝐴𝑃 𝐸} _{𝐴𝑃 𝐸90}

BNN SVGP BNN SVGP BNN SVGP

Pumps [MWh/y] 0.990 ± 0 . 001 0 . 983 ± 0 . 001 7.180 ± 0 . 180 8 . 530 ± 0 . 260 14.830 ± 0 . 510 17 . 950 ± 0 . 610 Heating supply, Other [MWh/y] 0.990 ± 0 . 003 0 . 977 ± 0 . 001 9.820 ± 0 . 350 12 . 490 ± 0 . 430 22.300 ± 0 . 750 29 . 300 ± 1 . 480 Fans [MWh/y] 0.991 ± 0 . 004 0 . 988 ± 0 . 001 8 . 630 ± 0 . 380 8.530 ± 0 . 250 18.120 ± 0 . 770 18 . 280 ± 0 . 540 Heating supply, Elec. [MWh/y] 0.992 ± 0 . 001 0 . 986 ± 0 . 000 7.150 ± 0 . 290 8 . 670 ± 0 . 360 15.130 ± 0 . 290 18 . 260 ± 0 . 900 Heating supply, Gas [MWh/y] 0.992 ± 0 . 002 0 . 973 ± 0 . 001 9.400 ± 0 . 380 13 . 230 ± 0 . 220 21.440 ± 0 . 620 30 . 480 ± 0 . 520 Cooling supply, Elec. [MWh/y] 0 . 992 ± 0 . 002 0.998 ± 0 . 000 3 . 550 ± 0 . 200 2.820 ± 0 . 100 7 . 490 ± 0 . 560 5.820 ± 0 . 200 Heating demand [MWh/y] 0 . 995 ± 0 . 001 0.996 ± 0 . 000 3 . 960 ± 0 . 330 3.710 ± 0 . 080 8 . 040 ± 0 . 710 7.800 ± 0 . 250 Cooling demand [MWh/y] 0.997 ± 0 . 000 0.997 ± 0 . 000 2 . 440 ± 0 . 050 2.270 ± 0 . 060 4 . 980 ± 0 . 090 4.700 ± 0 . 110 Interior lights [MWh/y] 0 . 998 ± 0 . 000 0.999 ± 0 . 000 2 . 410 ± 0 . 100 1.590 ± 0 . 080 5 . 050 ± 0 . 160 3.150 ± 0 . 270 Interior equipment [MWh/y] 0.998 ± 0 . 000 0.998 ± 0 . 000 2 . 790 ± 0 . 100 1.410 ± 0 . 120 5 . 650 ± 0 . 200 2.600 ± 0 . 250 Water heating, Gas [MWh/y] 0 . 999 ± 0 . 000 1.000 ± 0 . 000 1 . 220 ± 0 . 130 0.250 ± 0 . 070 2 . 590 ± 0 . 260 0.430 ± 0 . 090 PV Generation [MWh/y] 0.999 ± 0 . 000 0.999 ± 0 . 001 3 . 030 ± 0 . 090 1.290 ± 0 . 090 6 . 040 ± 0 . 100 2.200 ± 0 . 150

Fig.9. Recordedsurrogatemodelerrorreductionaftertransferringuncertainsamplestothehigh-fidelitysimulationmodel.Thedatashowstheerrorif either100%,90%or80%ofthebuildingdesignsamplesareprocessedbythesurrogatemodelandtherestprocessedbythehigh-fidelitymodel.Inthatway,the averageerrorofsamplesprocessedbysurrogatemodelscanbedecreased(herequantifiedbythe90-percentileabsolutepercentageerror).

simulationprogram.Findingasuitablethresholdismorediﬃcultand shouldalsobebasedonthepreferencesofthebuildingdesigner.

InFig.9,thedecreaseintheerror ofthesurrogatemodel predic-tionsisillustratedforthethreetargetvariablescoveringtheheatsupply ofdiﬀerentfuelsources.Thesetargetsproducedthelargesterrors(see Section5.1.1)andthus,wefocusonincreasingthesurrogaterobustness particularlyforthem.Discardingthe10%sampleswiththehighest un-certaintyonthetestdata,wecandecreasethe𝐴𝑃𝐸90errorinestimating

theannualheatingsupplywithagasfurnacefrom21.44%to16.66%.6

Thisisequivalenttoareductionof≈ 22%.

The𝑀𝐴𝑃𝐸 errorontheothersurrogatemodeloutputswasreduced by4%to18%,andthe𝐴𝑃𝐸90by5%to25%(seeFig.9).Inparticular,

thesigniﬁcantreductionofthe𝐴𝑃𝐸90errorproofstheincreaseinthe

robustnessofthesurrogatemodelpredictions.

6. Discussion

Surrogatemodelshaveshowntohelparchitectsandbuilding de-signerstorapidlyassesstheenergyperformanceoftheirdesigns[9]. However,bybeingonlyapproximative,concernsabouttherobustness

6 _To_calculate_these_errors,_we_exclude_the_10%_or_20%_most_important

sam-plesfromEqs.5and6.Forexample,the16.66%errorwascomputedonthe 90%remainingsamplesinthetestset.

ofthesurrogatemodelaccuracyarise.ABayesianapproachfor surro-gatemodelling,allowstonotonlyprovideaperformanceestimatebut alsoinformabouttheconﬁdenceoftheapproximatingsurrogatemodel andpotentially,toidentifypartsofthedesignspacewherethesurrogate modelmayprovideinaccurateresults.

ThisfirstanalysisoftheuseofBayesiansurrogatemodelsrevealed essential properties on therobustnessof surrogatemodels,and how Bayesianmodellingcanbeanaidforeffectivereasoningontheenergy performanceofbuildingsundertheepistemicuncertaintyofsurrogates. Thegoalwastoaugmentsurrogatessuchthatwecanmaintainthe ben-efitsofsurrogatemodelswhileminimizingtheriskassociatedwiththe uncertaintyofsurrogatemodels.

6.1. Lackingrobustnessofsurrogatemodels

Surrogatemodelaccuracyisoftenreportedwitherrormetricslike the𝑅2 _or_𝑀𝐴𝑃_{𝐸 scores.}_They_are_important_but_can_be_deceiving._A

highcoeﬃcientofexplainedvariance(𝑅2₎_or_a_low_mean_absolute

per-centage error 𝑀𝐴𝑃𝐸,may disguisethat thesurrogatemayproduce quitelarge errorsin certainfractions ofthedesignspace. For exam-ple,wefoundthatthe90-percentileabsolutepercentageerrorcanbe ashighas22.3%althoughan𝑅2₌₀_.₉₉_suggests_very_high_performance

(seeTable1).Thismotivates,thatindeedmeasurestoidentifysurrogate inaccuraciescouldlessentheriskassociatedwithsurrogatemodelling.

(12)

Fig.10. ConvergenceofBNNestimateswithanincreasingnumberofMonteCarlodropoutsamples.TheplotshowsBNNheatingdemandestimatesand uncertaintyestimateswithincreasingnumberofMCsamples(seecasestudyinSection4).Bothapproximatelyconvergeafterconducting30randomdropoutruns, whichtakesaround0.8s(withoutparallelization).

6.2. Bayesianlearningtoexpresssurrogateconﬁdence

Resultsonthequalityofuncertaintyestimatesofthedropoutneural networkvalidatedthatitcanbeusedtoeffectivelyexpressconfidence onitspredictions,e.g.onecanformulatethattheheatingdemandfor abuildingwithawallof1𝑚thicknessisbetween220𝑀𝑊ℎ∕𝑦𝑒𝑎𝑟and 230𝑀𝑊ℎ∕𝑦𝑒𝑎𝑟witha90%confidence(seeFig.4).

Ontheotherhand,whilebeingalmostasaccurateastheneural net-workmodel,wefoundthatthestochasticvariationalGaussianProcess modelproducesmiscalibrateduncertaintyestimates.Pleasenote,that thisﬁndingcannotbegeneralizedasmethodsexisttocalibrate uncali-bratedestimates[53],andinotherstudiesdeepGaussianprocess mod-elswerefoundtoproducealargervarianceintheuncertaintyestimates [44].Nonetheless,theresultsontheSVGPmodelshighlightthat assess-ingthequalityofBayesianuncertaintyestimatesisimportant.

6.3. PracticalissuesofBayesiansurrogatemodels

WeleveragedtheuncertaintyestimatesoftheBNNtoraisewarnings whenthesurrogatemodelishighlyuncertain.Bydeﬁningathreshold, herethe90-percentileor80-percentileoftheuncertaintyestimateson thetestdata,wecouldreducethe𝐴𝑃𝐸90errorbyupto40%.Thisisa

significantfirststeptowardsthehybridizationoffast,low-fidelity,and slow,high-fidelitymodels.

Still,practicalissueshavetobesolved.Forexample,thequestion arisesonhowtoimplementtheroutingbetweenthesurrogatemodel andhigh-ﬁdelitymodelruns.Simulationscouldbecarriedoutinthe backgroundwhiletheuserwouldbeworkingwiththeuncertain surro-gatemodelestimatesasastart.Inourcasetheresultswouldbeupdated after2minutesand10seconds,whichcorrespondstotheapproximate runtimeofonesimulation.

AnotherissueisthatthecomputationalcostofevaluatingaBayesian modelincreasescomparedtoadeterministicsurrogatemodel.When us-ingdropoutBNNs,weperformMonteCarlo(MC)dropout,i.e.we re-peatedlyevaluatetheBNNwhereasineachrunthesetof”dropped” neu-ronschangesandtherewith,theoutputsofthenetworkchange.Mean

𝜇 andstandarddeviation𝜎 oftheestimatesconvergewithincreasing numbersofMCevaluations,whichisshowninFig.10.Weperformed between10and2000MCevaluationsandreportedthemeanandthe standarddeviationoftheresultingestimates.Weconsiderbothmean

andstandarddeviationtohaveconverged,whentheyremainwithina bandof±1%ofthemeanweobservedafter2000MCdropoutruns.

Intheplotwevisualizedtheconvergenceoftheheatingdemand estimatesforasinglebuildingdesign.Theplotimpliesthatittakes ap-proximately0.8s,whichcorrespondsto30MCdropoutruns,forboth themeananduncertaintyestimatestoconverge.Without paralleliza-tion,thiswouldmeanthatMCdropoutsamplingofaBNNis30times slowerthantheevaluationofacommonfeed-forwardneuralnetwork, anditwouldpreventinteractivebuildingdesignprocesses.However, theindependentMCdropoutrunscaneasilybeparallelizedto multi-plecores.Pleasenotethattheconvergenceratedependsonthespeciﬁc buildingdesignparameters(surrogatemodelinputs)ortheconsidered buildingperformanceoutput(surrogatemodeloutputs).Aﬁrst heuris-ticcheckforvariousinputsandoutputsindicatedthatestimatesalways convergedwithin100orlessMCdropoutruns.

Theseandotherquestionshavetobestudiedinmoredetailbefore integratingBayesiansurrogatemodelsintosoftwareproductsfor build-ingdesigners.

6.4. AccuracyoftheBayesianmodelcomparedtoadeterministicsurrogate model

Wecan comparetheresultsof thisstudytoa non-Bayesian feed-forwardneuralnetworktrainedonthesamedataset(seeTable2inthe Appendix).Detailsonthenon-bayesiannetworkusedcanbefoundin [5].IthasaverysimilarlayouttothedropoutBNN(2hiddenlayers with512neurons,leakyrectiﬁedlinearunitactivationfunction)and wastrainedusingthesamecostfunctionandoptimizer(1200training epochswithAdamoptimizer).

The𝑅2_,_𝑀𝐴𝑃_{𝐸 and}_𝐴𝑃_𝐸

90scoresofthedeterministicmodel

com-putedonthetestdataarebetterformostoutputswhennouncertainty basedsampleﬁlteringisapplied(seeTable2).However,whenusing uncertaintythresholdstheBayesianmodelproduceslower𝑀𝐴𝑃𝐸 and 𝐴𝑃𝐸90errorsproposingthattheBNNisausefulmeanstoincreasethe

robustnessofsurrogatemodels.7

7_Here,_we_used_a_uniformly_distributed_set_of_building_design_samples_as_our

testdata.However,thismaynotberepresentativeofactualdesignprocesses.In future,acomparisonofbothneuralnetworktypes(Bayesiansurrogatemodel,

(13)

Table2

ComparisonofBayesiandropoutneuralnetwork(BNN)andnon-bayesiandeterministicneural network(ANN).Theperformanceofthedropoutneuralnetwork(BNN)isprovidedwithandwithout theapplicationofuncertainty-basedthresholding(90%/80%).

(i) 𝑅 2 -score

ANN BNN BNN 90% BNN 80%

Pumps [MWh/y] 0.992 ± 0 . 000 0 . 990 ± 0 . 001 0 . 989 ± 0 . 001 0 . 989 ± 0 . 001 Heating supply, Other [MWh/y] 0.995 ± 0 . 001 0 . 990 ± 0 . 003 0 . 989 ± 0 . 004 0 . 988 ± 0 . 004 Fans [MWh/y] 0.994 ± 0 . 002 0 . 991 ± 0 . 004 0 . 990 ± 0 . 004 0 . 989 ± 0 . 004 Heating supply, Elec. [MWh/y] 0.994 ± 0 . 000 0 . 992 ± 0 . 001 0 . 992 ± 0 . 001 0 . 992 ± 0 . 001 Heating supply, Gas [MWh/y] 0.995 ± 0 . 001 0 . 992 ± 0 . 002 0 . 992 ± 0 . 002 0 . 991 ± 0 . 002 Cooling supply, Elec. [MWh/y] 0.994 ± 0 . 001 0 . 992 ± 0 . 002 0 . 993 ± 0 . 001 0 . 992 ± 0 . 002 Heating demand [MWh/y] 0.996 ± 0 . 000 0 . 995 ± 0 . 001 0 . 995 ± 0 . 001 0 . 993 ± 0 . 002 Cooling demand [MWh/y] 0.997 ± 0 . 000 0 . 997 ± 0 . 000 0 . 996 ± 0 . 000 0 . 995 ± 0 . 000 Interior lights [MWh/y] 0.999 ± 0 . 000 0 . 998 ± 0 . 000 0 . 997 ± 0 . 000 0 . 997 ± 0 . 000 Interior equipment [MWh/y] 0.999 ± 0 . 000 0 . 998 ± 0 . 000 0 . 998 ± 0 . 000 0 . 997 ± 0 . 000 Water heating, Gas [MWh/y] 1.000 ± 0 . 000 0 . 999 ± 0 . 000 0 . 998 ± 0 . 000 0 . 998 ± 0 . 001 PV Generation [MWh/y] 1.000 ± 0 . 000 0 . 999 ± 0 . 000 0 . 998 ± 0 . 000 0 . 998 ± 0 . 000

(ii) 𝑀𝐴𝑃 𝐸

Pumps [MWh/y] 6 . 480 ± 0 . 170 7 . 180 ± 0 . 180 6 . 200 ± 0 . 130 5.850 ± 0 . 130 Heating supply, Other [MWh/y] 8 . 550 ± 0 . 630 9 . 820 ± 0 . 350 8 . 380 ± 0 . 310 7.480 ± 0 . 410 Fans [MWh/y] 7 . 610 ± 1 . 000 8 . 630 ± 0 . 380 7 . 300 ± 0 . 470 6.690 ± 0 . 540 Heating supply, Elec. [MWh/y] 6 . 530 ± 0 . 370 7 . 150 ± 0 . 290 6 . 070 ± 0 . 270 5.670 ± 0 . 320 Heating supply, Gas [MWh/y] 8 . 040 ± 0 . 220 9 . 400 ± 0 . 380 7 . 880 ± 0 . 370 7.190 ± 0 . 400 Cooling supply, Elec. [MWh/y] 3 . 280 ± 0 . 260 3 . 550 ± 0 . 200 3 . 320 ± 0 . 200 3.150 ± 0 . 170 Heating demand [MWh/y] 3 . 710 ± 0 . 290 3 . 960 ± 0 . 330 3 . 550 ± 0 . 370 3.410 ± 0 . 370 Cooling demand [MWh/y] 2.240 ± 0 . 160 2 . 440 ± 0 . 050 2 . 310 ± 0 . 050 2 . 250 ± 0 . 060 Interior lights [MWh/y] 1.830 ± 0 . 170 2 . 410 ± 0 . 100 2 . 290 ± 0 . 090 2 . 180 ± 0 . 070 Interior equipment [MWh/y] 2 . 810 ± 0 . 390 2 . 790 ± 0 . 100 2 . 290 ± 0 . 080 2.130 ± 0 . 090 Water heating, Gas [MWh/y] 0.660 ± 0 . 060 1 . 220 ± 0 . 130 1 . 110 ± 0 . 130 1 . 050 ± 0 . 120 PV Generation [MWh/y] 1.650 ± 0 . 120 3 . 030 ± 0 . 090 1 . 900 ± 0 . 150 1 . 660 ± 0 . 180

(iii) 𝐴𝑃 𝐸 90

Pumps [MWh/y] 12 . 450 ± 0 . 530 14 . 830 ± 0 . 510 12 . 280 ± 0 . 310 11.480 ± 0 . 230 Heating supply, Other [MWh/y] 20 . 400 ± 1 . 480 22 . 300 ± 0 . 750 17 . 160 ± 0 . 580 15.240 ± 0 . 610 Fans [MWh/y] 15 . 810 ± 1 . 540 18 . 120 ± 0 . 770 14 . 950 ± 0 . 910 13.800 ± 1 . 050 Heating supply, Elec. [MWh/y] 13 . 790 ± 0 . 810 15 . 130 ± 0 . 290 12 . 470 ± 0 . 490 11.670 ± 0 . 640 Heating supply, Gas [MWh/y] 18 . 320 ± 0 . 640 21 . 440 ± 0 . 620 16 . 660 ± 0 . 610 14.970 ± 0 . 690 Cooling supply, Elec. [MWh/y] 6 . 780 ± 0 . 560 7 . 490 ± 0 . 560 6 . 920 ± 0 . 460 6.540 ± 0 . 320 Heating demand [MWh/y] 7 . 670 ± 0 . 550 8 . 040 ± 0 . 710 7 . 260 ± 0 . 740 6.940 ± 0 . 770 Cooling demand [MWh/y] 4 . 620 ± 0 . 300 4 . 980 ± 0 . 090 4 . 710 ± 0 . 090 4.610 ± 0 . 090 Interior lights [MWh/y] 3.840 ± 0 . 330 5 . 050 ± 0 . 160 4 . 790 ± 0 . 170 4 . 560 ± 0 . 170 Interior equipment [MWh/y] 5 . 320 ± 0 . 960 5 . 650 ± 0 . 200 4 . 780 ± 0 . 200 4.450 ± 0 . 240 Water heating, Gas [MWh/y] 1.340 ± 0 . 100 2 . 590 ± 0 . 260 2 . 350 ± 0 . 270 2 . 210 ± 0 . 250 PV Generation [MWh/y] 2.460 ± 0 . 320 6 . 040 ± 0 . 100 4 . 120 ± 0 . 300 3 . 530 ± 0 . 350

7. Conclusionandoutlook

Inthisstudyweproposedtoaugmentandhybridizephysics-based simulation software with Bayesian (deep) learning surrogate mod-els.Byquantifying thesurrogate model(epistemic) uncertainty, the Bayesianparadigmacknowledgesthat surrogatemodelsare approxi-mations of original simulationmodels, andit oﬀersa tool to eﬀec-tivelyreasonunderthatincurreduncertaintywhileexploitingthemuch fasterruntimeofsurrogatemodelstoproduceengineeringperformance estimates.

Ina casestudy weshowcased the applicationof Bayesian surro-gatemodelsforthedesignofnet-zeroenergybuildings.Wefoundthat dropoutneuralnetworkmodelsprovidedwell-calibrateduncertainty es-timates,whichcanbeusedtoidentifybuildingdesignchoicesforwhich thesurrogatemodelproduceslargeerrors.Thelatterenablesusto re-ferthosedesignstothehigh-ﬁdelityenergysimulationtooltoassure accurateestimatesforthearchitectorbuildingdesigner.Thatreferral processsigniﬁcantlyloweredtheerrorsincomparisontoacommon de-terministicsurrogatemodel.

non-bayesiansurrogatemodel)thattakesarchitecturaldesignpreferencesinto accountwhenchoosingthetestdatashouldbeconsidered.

Althoughallﬁndingsareboundtothecasestudyofabuilding sim-ulationsurrogate,resultsmotivatetoapplyBayesianlearningtoother ﬁeldswheresurrogatemodelsarecommonlyused[19].

Infuture,weforeseethatBayesianmodelswillallowustohybridize

data-drivensurrogatemodelsandhigh-ﬁdelitysimulationmodels[18]. This particularlyrequiresstudiesonhowhybridmodelscanworkin practiceinasurrogatemodel-baseddesignprocess.

Apartfromthat,futureresearchcouldmakeuseofBayesian surro-gate modelsforgeneralizingsurrogatemodelstocovermorebuilding designproblems[12,56].TheBayesianparadigmcouldhelp identify-ingwhenthesurrogatemodelisusedfordesignproblemsitwasnot trainedfor.Finally,Bayesianlearningformsafoundationforadaptively samplingsimulationruns,forwhichthesurrogatemodelisparticularly uncertain.Thisprogress,calledactivelearning,willbeexploredinan upcomingstudy[57].

CodeandDataavailability

Theentiresourcecodeofthiswork,theEnergyPlusdescriptionﬁle (.idf)ofthebuildingtemplate,andinstructionsonhowtodownloadthe datausedinthisstudyareavailableinaGitLabrepository.8

(14)

DeclarationofCompetingInterest

Theauthorswishtoconfirmthattherearenoknownconflictsof in-terestassociatedwiththispublicationandtherehasbeennosignificant financialsupportforthisworkthatcouldhaveinfluenceditsoutcome.

Acknowledgement

ThisresearchwassupportedbygrantfundingfromCANARIEviathe BESOSproject(CANARIERS-327).

References

[1] John Dulac CD, Abergel T. Tracking buildings. Tech. Rep.. Internation Energy Agency; 2019 . URL: https://www.iea.org/reports/tracking-buildings

[2] Westermann P, Evins R. Surrogate modelling for sustainable building design – a review. Energy Build 2019;198:170–86. doi: 10.1016/j.enbuild.2019.05.057 .

[3] Jusselme T . Data-driven method for low-carbon building design at early stages. EPF Lausanne; 2020. Ph.D. thesis .

[4] Open Technologies. The building pathﬁnder. URL http://www.buildingpathﬁnder. com/ .

[5] Paul Westermann, David Rulﬀ, Kevin Cant, Gaelle Faure, Ralph Evins. Net-zero navigator: a platform for interactive net-zero building design using surrogate mod- ellingURL http://www.enerarxiv.org/page/thesis.html?id = 1975 .

[6] Waibel C , Wortmann T , Evins R , Carmeliet J . Building energy optimization: an ex- tensive benchmark of global search algorithms. Energy Build 2019;187:218–40 .

[7] Rivalin L , Stabat P , Marchio D , Caciolo M , Hopquin F . A comparison of methods for uncertainty and sensitivity analysis applied to the energy performance of new commercial buildings. Energy Build 2018;166:489–504 .

[8] Hester J, Gregory J, Kirchain R. Sequential early-design guidance for residential single-family buildings using a probabilistic metamodel of energy consumption. En- ergy and Buildings 2017;134:202–11. doi: 10.1016/j.enbuild.2016.10.047 . URL < Go to ISI > ://WOS:000390624800018

[9] Brown NC . Design performance and designer preference in an interactive, data– driven conceptual building design scenario. Des Stud 2020 .

[10] De Wilde P . The gap between predicted and measured energy performance of buildings: a framework for investigation. Autom Constr 2014;41:40–9 .

[11] Ostergard T, Jensen RL, Maagaard SE. A comparison of six metamodeling techniques applied to building performance simulations. Applied Energy 2018;211:89–103. doi: 10.1016/j.apenergy.2017.10.102 . URL < Go to ISI > ://WOS:000425075600008 [12] Westermann P , Evins R . Using a deep temporal convolutional network as a

building energy surrogate model that spans multiple climate zones. Appl Energy 2020;264:114715 .

[13] Kendall A , Gal Y . What uncertainties do we need in Bayesian deep learning for computer vision?. In: Advances in neural information processing systems; 2017. p. 5574–84 .

[14] Damianou A , Lawrence N . Deep Gaussian processes. In: Artiﬁcial intelligence and statistics; 2013. p. 207–15 .

[15] Gal Y , Ghahramani Z . Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: International conference on machine learning; 2016. p. 1050–9 .

[16] Hensman J , Fusi N , Lawrence ND . Gaussian processes for big data. In: Uncertainty in artiﬁcial intelligence. Citeseer; 2013. p. 282 .

[17] Filos A, Farquhar S, Gomez AN, Rudner TG, Kenton Z, Smith L, et al. A systematic comparison of Bayesian deep learning robustness in diabetic retinopathy tasks. arXiv preprint arXiv:1912.10481 2019.

[18] Reichstein M , Camps-Valls G , Stevens B , Jung M , Denzler J , Carvalhais N , et al. Deep learning and process understanding for data-driven earth system science. Nature 2019;566(7743):195–204 .

[19] Wang GG , Shan S . Review of metamodeling techniques in support of engineering design optimization. J MechDes 2007;129(4):370–80 .

[20] Ritter F , Geyer P , Borrmann A . Simulation-based decision-making in early design stages. In: 32nd CIB W78 conference, Eindhoven, The Netherlands; 2015. p. 27–9 .

[21] Vazquez-Canteli J , Demir AD , Brown J , Nagy Z . Deep neural networks as surrogate models for urban energy simulations. Journal of Physics: Conference Series 2019;1343:012002 . IOP Publishing

[22] Prada A , Gasparella A , Baggio P . On the performance of meta-models in building design optimization. Appl Energy 2018;225:814–26 .

[23] Eisenhower B, O’Neill Z, Narayanan S, Fonoberov VA, Mezic I. A method- ology for meta-model based optimization in building energy models. Energy and Buildings 2012;47:292–301. doi: 10.1016/j.enbuild.2011.12.001 . URL < Go to ISI > ://WOS:000301989800034

[24] Bre F , Roman N , Fachinotti VD . An eﬃcient metamodel-based method to carry out multi-objective building performance optimizations. Energy Build 2020;206:109576 .

[25] Hopfe CJ , Hensen JL . Uncertainty analysis in building performance simulation for design support. Energy Build 2011;43(10):2798–805 .

[26] Coakley D , Raftery P , Keane M . A review of methods to match building energy simulation models to measured data. RenewSustainEnergy Rev 2014;37:123–41 .

[27] Manfren M, Aste N, Moshksar R. Calibration and uncertainty analysis for computer models - a meta-model based approach for integrated building energy simulation. Applied Energy 2013;103:627–41. doi: 10.1016/j.apenergy.2012.10.031 . URL < Go to ISI > ://WOS:000314669500059

[28] Heo Y , Choudhary R , Augenbroe G . Calibration of building energy models for retroﬁt analysis under uncertainty. Energy Build 2012;47:550–60 .

[29] Sokol J , Davila CC , Reinhart CF . Validation of a Bayesian-based method for deﬁning residential archetypes in urban building energy models. Energy Build 2017;134:11–24 .

[30] Kristensen MH , Hedegaard RE , Petersen S . Hierarchical calibration of archetypes for urban building energy modeling. Energy Build 2018;175:219–34 .

[31] Garud SS , Karimi IA , Kraft M . Design of computer experiments: a review. Comput Chem Eng 2017;106:71–95 .

[32] Roman ND , Bre F , Fachinotti VD , Lamberts R . Application and characterization of metamodels based on artiﬁcial neural networks for building performance simulation: a systematic review. Energy Build 2020:109972 .

[33] Crawley DB , Lawrie LK , Winkelmann FC , Buhl WF , Huang YJ , Pedersen CO , et al. En- ergyplus: creating a new-generation building energy simulation program. Energy Build 2001;33(4):319–31 .

[34] Tian W , Heo Y , De Wilde P , Li Z , Yan D , Park CS , et al. A review of uncertainty analysis in building energy assessment. Renew Sustain Energy Rev 2018;93:285–301 .

[35] Rasmussen CE . Gaussian processes in machine learning. In: Advanced lectures on machine learning. Springer; 2004. p. 63–71 .

[36] Østergård T, Jensen RL, Maagaard SE. Building simulations supporting decision making in early design–a review. Renewable and Sustainable Energy Reviews 2016;61:187–201 . URL https://www.sciencedirect.com/science/article/ pii/S136403211600280X

[37] Blei DM , Kucukelbir A , McAuliﬀe JD . Variational inference: a review for statisticians. J AmStat Assoc 2017;112(518):859–77 .

[38] Bauer M , van der Wilk M , Rasmussen CE . Understanding probabilistic sparse Gaus- sian process approximations. In: Advances in neural information processing systems; 2016. p. 1533–41 .

[39] Gal Y . Uncertainty in deep learning. University of Cambridge 2016;1(3) .

[40] Pearce T, Zaki M, Brintrup A, Anastassacos N, Neely A. Uncertainty in neural networks: Bayesian ensembling arXiv preprint arXiv:1810.05546

[41] Neal RM . Bayesian learning for neural networks, 118. Springer Science & Business Media; 1995 .

[42] Srivastava N , Hinton G , Krizhevsky A , Sutskever I , Salakhutdinov R . Dropout: a simple way to prevent neural networks from overﬁtting. JMachLearnRes 2014;15(1):1929–58 .

[43] Titsias M . Variational learning of inducing variables in sparse gaussian processes. In: Artiﬁcial intelligence and statistics; 2009. p. 567–74 .

[44] Salimbeni H , Deisenroth M . Doubly stochastic variational inference for deep Gaus- sian processes. In: Advances in neural information processing systems; 2017. p. 4588–99 .

[45] Svendsen DH , Morales-Álvarez P , Ruescas AB , Molina R , Camps-Valls G . Deep Gaus- sian processes for biogeophysical parameter retrieval and model inversion. ISPRS J Photogramm Remote Sens 2020;166:68–81 .

[46] Crawley DB , Pedersen CO , Lawrie LK , Winkelmann FC . Energyplus: energy simulation program. ASHRAE J 2000;42(4):49 .

[47] National Energy Code of Canada for Buildings 2017. National Research Council Canada (NRCan); 2017. URL https://nrc.canada.ca/en/certiﬁcations-evaluations- standards/codes-canada/codes-canada-publications/national-energy-code-canada- buildings-2017 .

[48] Box GE , Cox DR . An analysis of transformations. J R Stat Soc 1964;26(2):211–43 .

[49] Chollet F., et al. Keras. 2015.

[50] Abadi M , Barham P , Chen J , Chen Z , Davis A , Dean J , et al. Tensorﬂow: a system for large-scale machine learning.. In: OSDI, 16; 2016. p. 265–83 .

[51] GPy. GPy: A gaussian process framework in python. URL http://github.com/ SheﬃeldML/GPy , since; 2012.

[52] Edwards RE, New J, Parker LE, Cui B, Dong J. Constructing large scale surrogate models from big data and artiﬁcial intelligence. Applied Energy 2017;202:685–99. doi: 10.1016/j.apenergy.2017.05.155 . URL < Go to ISI > ://WOS:000407188500055 [53] Kuleshov V , Fenner N , Ermon S . Accurate uncertainties for deep learning using calibrated regression. In: International conference on machine learning; 2018. p. 2796–804 .

[54] Platt J , et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. AdvLarge Margin Classiﬁers 1999;10(3):61–74 .

[55] Scalia G , Grambow CA , Pernici B , Li Y-P , Green WH . Evaluating scalable uncertainty estimation methods for deep learning based molecular property prediction. J Chem Inf Model 2020 .

[56] Geyer P , Singaravel S . Component-based building performance prediction using systems engineering and machine learning. Appl Energy 2017;228:1439–53 .

[57] Westermann P, Evins R. Adaptive sampling for building simulation surrogate model derivation using the Lola-Voronoi algorithm. In: International Building Performance Association (IBPSA), editor. Proceedings of the international building performance simulation association, 16; 2019. p. 1559–63. doi: 10.26868/25222708.2019.211232 .