• No results found

A Conditional Generative adversarial Network for energy use in multiple buildings using scarce data

N/A
N/A
Protected

Academic year: 2021

Share "A Conditional Generative adversarial Network for energy use in multiple buildings using scarce data"

Copied!
15
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Citation for this paper:

Baasch, G., Rousseau, G., & Evins, R. (2021). A Conditional Generative adversarial

Network for energy use in multiple buildings using scarce data. Energy and AI, 5, 1-14.

https://doi.org/10.1016/j.egyai.2021.100087.

UVicSPACE: Research & Learning Repository

_____________________________________________________________

Faculty of Engineering

Faculty Publications

_____________________________________________________________

A Conditional Generative adversarial Network for energy use in multiple buildings

using scarce data

Gaby Baasch, Guillaume Rousseau, Ralph Evins

September 2021

© 2021 Gaby Baasch et al. This is an open access article distributed under the terms of the

Creative Commons Attribution License.

https://creativecommons.org/licenses/by/4.0/

This article was originally published at:

(2)

ContentslistsavailableatScienceDirect

Energy

and

AI

journalhomepage:www.elsevier.com/locate/egyai

Perspective

A

Conditional

Generative

adversarial

Network

for

energy

use

in

multiple

buildings

using

scarce

data

Gaby

Baasch

,

Guillaume

Rousseau

,

Ralph

Evins

Energy and Cities Group, Department of Civil Engineering, University of Victoria, Canada

h

i

g

h

l

i

g

h

t

s

CreatingmultivariatetimeserieswhereeachbuildingrepresentsavariablesignificantlyeasesstrictdatarequirementsforbuildingGANSwhilemaintaining per-buildingloadcharacteristics.

ConditioningonmeanmonthlyoutdoortemperatureimprovesGANperformance.

TheGANmodelledtheresidentialdatawithhighfidelitybutcouldnotcompletelycapturethecomplextemporalbehaviourinthecommercialdata.

MetricsthataremostcommonlyusedtoevaluateGANsinthebuildingdomaindonotsufficientlycapturetemporalbehaviour.

a

r

t

i

c

l

e

i

n

f

o

Article history:

Received 30 January 2021

Received in revised form 20 April 2021 Accepted 8 May 2021

Available online 15 May 2021 Keywords:

Generative adversarial network Building load profile Machine learning Data scarcity

a

b

s

t

r

a

c

t

Buildingconsumptiondataisintegraltonumerousapplicationsincludingretrofitanalysis,SmartGridintegration andoptimization,andloadforecasting.Still,duetotechnicallimitations,privacyconcernsandtheproprietary natureoftheindustry,usabledataisoftenunavailableforresearchanddevelopment.Generativeadversarial networks(GANs)-whichgeneratesyntheticinstancesthatresemblethosefromanoriginaltrainingdataset -havebeenproposedtohelpaddressthisissue.PreviousstudiesuseGANstogeneratebuildingsequencedata,but themodelsarenottypicallydesignedfortimeseriesproblems,theyoftenrequirerelativelylargeamountsofinput data(atleast20,000sequences)anditisunclearwhethertheycorrectlycapturethetemporalbehaviourofthe buildings.InthisworkweimplementaconditionaltemporalGANthataddressestheseissues,andweshowthat itexhibitsstate-of-the-artperformanceonsmalldatasets.22differentexperimentsthatvaryaccordingtotheir datainputsarebenchmarkedusingJensen-Shannondivergence(JSD)andpredictiveforecastingvalidationerror. Ofthese,thebestperformingisalsoevaluatedusingacuratedsetofmetricsthatextendsthoseofpreviouswork toincludePCA,deep-learningbasedforecastingandmeasurementsoftrendandseasonality.Twocasestudies areincluded:oneforresidentialandoneforcommercialbuildings.ThemodelachievesaJSDof0.012onthe formerdataand0.037onthelatter,usingonly396and156originalloadsequences,respectively.

1. Introduction

Theinformationrevolutioncarriesthepromise ofinnovationand high-impact industry disruption. Indeed, the potential for machine learningandbig-datatoassistbuildingdecarbonizationistremendous. TheSmartGridisoptimizedusingdetailedsupplyanddemand informa-tion[1],high-resolutionmetereddatasupportscity-wideretrofit strate-gies[2]andforecastingmodelstrainedonbig-data areusedfor en-ergymanagement[3].Infact,areviewbyHongetal.foundover9579 studiesonmachinelearninginbuildings.However,ofthe153studies thattheyselectedforin-depthanalysisnonehadbeenadoptedbroadly bythebuildingindustry[4].Thisislargelybecause,inpractice,

build-Abbreviations:GAN,GenerativeAdversarialNetwork;JSD,Jensen-ShannonDivergence;KLD,Kullback-LeiblerDivergence;PCA,PrincipleComponentAnalysis; MAE,MeanAbsoluteError;MAPE,MeanAbsolutePercentageError;RMSE,RootMeanSquaredError;DTW,DynamicTimeWarping;MMD,MaximumMean Discrepancy;PRD,PrecisionandRecallforDistributions;SSIM,StructuralSimilarity;TOVO,TrainedonOriginal,ValidatedonOriginal;TGVO,TrainedonGenerated, ValidatedonOriginal.

Correspondingauthor.

E-mailaddresses:gbaasch@uvic.ca(G.Baasch),guillaumer@uvic.ca(G.Rousseau),revins@uvic.ca(R.Evins).

ingdatacollectionispreventedbytechnical,regulatoryandeconomic challenges.Forexample,smartmetersthatmeasuretemporalelectricity consumptionarebecomingincreasinglypervasive(withover1billion devicesinstalledasof2020[5]),butconcernsoverprivacyprevent util-itycompaniesfromdisclosingthisinformation[6,7].

Thelackofbuildingdataavailabilityhaslead toakeenresearch interestinitsgeneration.Oneapproachtogenerativemodellingisto usedetailedphysicalmodelstosimulatetemporalbuildingbehaviour [6].Intheemergingfieldofurbanbuildingenergymodelling,for ex-ample,city-widebuildingconsumptionissimulatedbasedonasmaller subsetbuildingarchetypes[8].Theavailabilityofrepresentative build-ingreferencemodelsmakesthisapproachdesirable, butitissubject tomodellingassumptions,thereferencesmaynotcloselyrepresentthe

https://doi.org/10.1016/j.egyai.2021.100087

(3)

Table1

SummaryoftheGANstudiesreviewedoverthecourseofthiswork.Themetrics col-umncontainsthekeyindicatorsthatwereusedtoevaluatetheresultsforthegiven study.Wedonotguaranteethatthereviewisexhaustive,butwebelievethisisstilla goodrepresentationofexistingwork.StudiesthatusedGANsinbuildingsfor differ-entapplicationsthanloadprofilegeneration(suchas[19,20]and[21])areexcluded fromthistable.

# input temporal metrics sequences period generator? Data augmentation for forecasting

[3] forecasting: building 1: 280 daily no (2019) MAE, MAPE, correlation building 2: 81

[15] forecasting: MAPE, RMSE, (# unclear) unclear no (2019) DTW

[16] PCA, forecasting: 3 case studies unclear no (2020) MAPE, RMSE, MAE (# unclear)

[17] forecasting: MAPE, MAE, 2 case studies unclear yes (2020) (# unclear) R-GAN

General generation: use case unspecified (same as this work)

[18] MMD, clustering 36,500 daily no

(2018) forecasting: MAPE (type unclear)

[9] JSD, RMSE, auto-correlation, 33,760 weekly no (2019) mean and standard variance (residential)

[6] KLD of 5 key parameters, 56,957 daily no (2020) mean and standard deviation (commercial)

[10] JSD, RMSE, PRD, SSIM, 20,000 weekly no

(2020) mean and standard variance (residential)

actualbuildings,andthereisanestablishedperformancegapbetween simulatedandrealbuildings[6].Data-drivenmethodsofferadifferent approachthathelpstoovercome theselimitations,buttheytypically eitherrequiredetailedend-userdataortheyfailtomodelthefull diver-sity,accuracyandcomplexityoftheoriginal[9,10].Fortunately, gen-erativeadversarialnetworks(GANs)offerapromisingalternative.

InitiallyintroducedbyGoodfellowetal.in2014,aGANconsistsof twoneuralnetworksthatareincompetitionwithoneanother:a genera-torthatactsasacounterfeiterandadiscriminatorthatactsasadetective [11].Theyhavebeenappliedacrossvariousdomainstoproduce real-istichumanfaces[12],tocomposemusic[13]andtocreatepaintings [14]withalmostuncannyfidelity.InbuildingsGANsmayofferthe solu-tiontotheinformationshortagethatpreventsthewide-spreadadoption ofdata-drivenbuildingdecarbonizationtechniques,whilealsooffering privacyguarantees.

1.1. Literaturereview

Thebuildingsresearchcommunityhasacknowledgedthepotential ofGANsandseveralpapershavebeen published,butat thetimeof writingtherehasnotyetbeenanoverviewofthestateoftheresearch. Table1thereforeprovidesasuccinctsummaryofthekeycontributions ofthepreviousworksthatwerereviewedforthisstudy.Theworksin thetablecanbeclassifiedintotwocategories:thosethatfocuson fore-castingandthosethatareuse-caseagnostic.Thisworkfallsintothe lattercategorysothosewillbe thefocushere.Severalresearchgaps wereidentified.First,eventhoughallofthereviewedworksgenerated time-seriessequences,noneoftheuse-caseagnosticstudiesusedan un-derlyingnetworkarchitecturethatdirectlymodeledtemporal dynam-ics,andnoneevaluatedthetemporalcomponentsofthegenerateddata suchastrendandseasonality.Second,theuse-caseagnosticworksuse largepre-existingdatasetstotraintheirgenerativemodels,buttheseare notalwaysavailableinpractice.1Finally,mostworksonlytesttheir

ap-proachonasingledatasetsoitisunclearhowwelltheywillgeneralize, especiallybetweenresidentialandcommercialbuildings.

1 ThishighlightsamoregeneralcircularityproblemwithGANs:aswithother deeplearningtechniques,theirtrainingbenefitsfromlargedata.Forexample, theinitialGANdevelopedbyGoodfellowetal.wastrainedontheMNISTdataset

[22]whichhas60,000trainingexamples.

1.2. Keycontributions

Inthisworkweaddresstheaforementionedissues.Weapplya re-centlydeveloped, timeseriesGAN(TimeGAN)thatachieves state-of-the-artperformancefortemporalgeneration[23],andaddanovel ex-tensionsothatitworksinaconditionalsetting.Wedemonstratethat, basedongenerateloaddistribution,theperformanceofourmodelis competitivewithexistingbuildingGANs,evenusingadatasizethatis only1–2%thesize,onbothresidentialandcommercialbuildings.We augmentthemetricsusedbypreviousstudiestoincludePCAand time-seriesspecificevaluationtoevaluateourresults,anddeterminethatthe standardmetricsthatarecurrentlyusedforevaluationinexistingworks missimportantshortcomingsofgenerativemodels.

The remainder of this paper is organized as follows: Section2presentsthetheoreticaloverviewofGANs(2.1)andTimeGAN (2.2); Section 3overviews the experimental methodology,including the GAN implementation(3.1), the residentialandcommercial case studies (3.2), the 22 different modelling experiments (3.3) and the metrics(3.4);Section4presentstheresults(4.1forall22experiments, 4.2 for the residential case study and 4.3 for the commercial case study);Section5presentsthediscussion.

2. Conditionaltimeseriesgenerativeadversarialnetwork

TimeGAN,developedbyYoonetal.in[23],isalogicalextension oftheoriginalGANarchitecturebyGoodfellowetal.,sothissectionof thepaperwillstartbyoverviewingthelatter(Fig.1).Next,TimeGAN andC-TimeGAN(Fig.2)willbedescribed.Formoreinformationonthe originalTimeGANimplementation,thereaderisreferredto[23].2

2.1. OriginalGAN

A generic GANconsists of two neural networks:a generator (𝑔) andadiscriminator(𝑑),whohavelearnableparameters𝜃𝑔and𝜃𝑑,

re-spectively.𝑔 acceptsrandomnoisevectors𝑍∈,whereisavector

2TheTimeGANformulationpresentedin[23]includesbothstaticand tempo-ralinputs,buttheimplementationprovidedbytheauthorsdoesnotyetinclude thestaticcomponent.ThisisoneofthereasonsthatweintroduceC-TimeGAN.

(4)

Fig.1. ThearchitectureoftheGANdescribed in Section 2.1. The input values 𝑦 and ̂𝑦 make the GAN conditional, as described in

Section2.1.1.

Fig.2. ThearchitectureoftheTimeGAN de-scribedinSection2.1.Theinputvalues𝑦and ̂𝑦maketheGANconditional,asdescribedin

Section2.2.1.

spaceofknowndistributions(e.gGaussiandistributions),andmapsit tolearneddistribution ̂𝑝.Itsgoalistolearnadensity ̂𝑝(𝑋)thatbest approximatestheoriginaldistribution𝑝(𝑋),for𝑋∈.Thegenerated samplesareinputinto𝑑,which alsoacceptssamplesfromatraining dataset{𝑥𝑛}𝑁𝑖=1.𝑑’sobjectiveistooutputtheprobabilitythatexample 𝑥camefrom𝑝(𝑋),asopposedtô𝑝(𝑋),i.e.todistinguishwhichsamples ccmefromtheoriginaldataandwhichweregenerated.𝑔 and𝑑 playa two-playerminimaxgamewithvaluefunction𝑈:

min

𝜃𝑔

max

𝜃𝑑𝑈

=𝔼𝑥𝑝[log𝑑(𝑥)]+𝔼𝑧̂𝑝[log(1−𝑑(𝑔(𝑧)))] (1)

where𝑑 aimstomaximizetheprobabilityofassigningthecorrectlabel totheoriginalandgeneratedsamples[log𝑑(𝑥)],and𝑔 aimstominimize theprobabilitythat𝑑 correctlyclassifiessamples[log(1−𝑑(𝑔(𝑧)))]. 2.1.1. Conditionalextension

AconditionalextensiontotheoriginalGANformulationwas pro-posedin[24].𝑔 and𝑑 areconditionedonsome extrainformation𝑦, for𝑌 ∈ byaddinganadditionalinputlayertotheneuralnetworks, resultinginthefollowingextensiontoEq.1:

min 𝜃𝑔 max 𝜃𝑑𝑈 =𝔼𝑥𝑝[log𝑑(𝑥|𝑦)]+𝔼𝑧̂𝑝[log(1−𝑑(𝑔(𝑧|𝑦)))] (2) 2.2. TimeGAN

TimeGANconsistsofthesameadversarialcomponentandloss()

astheoriginalGANformulation,butwithtwomajorextensions:an

au-toencodingcomponentandasupervisedautoregressive(AR)learning objective.Thesearedescribedbelow.

AutoencodingComponent:Anautoencoderisadimensionality reduc-tiontechniquewhereembedding(𝑒)andrecovery(𝑟)functionslearn stochasticmappings𝑝𝑒(ℎ|𝑥)and𝑝𝑟(̃𝑥|ℎ)[25].isaninstanceof𝐻∈, where isthelatentvectorspacecorrespondingto.3Simplyput,𝑒

transformsthedataintoitslatentcode,and𝑟undoesthe transforma-tion.InTimeGAN,𝑒and𝑟areneuralnetworksthataretrainedusingthe recoverylossfunction:

𝑅=𝔼𝑥1∶𝑇𝑝 [ 𝑡 ||𝑥𝑡𝑟(𝑒(𝑥𝑡))||2 ] (3) InTimeGAN,𝑔 and𝑑 operateinthelatentspaceasfollow:(1)𝑔 trans-formsrandomnoisevectorsintolatentcodesand𝑒transformstraining samplesintolatentcodes,(2)𝑑 classifieswhetherthelatentcodescome fromthegeneratedororiginaldata,and(3)𝑟transformsdatafromthe latentspacebackintoitsoriginalform.

ARLearningObjective:Generatingtemporaldatacanbe adifficult learningproblemforaGAN,especiallyiftheinputdataiscomprised oflongsequences.Tointroducetemporalrelationshipsdirectlyintothe learningarchitecture,TimeGANusesasupervisedlossfunctionbased

3Instatistics,latent-orunobservable-variablesarevariables(typically low-dimensional)thatretainimportantfeaturesoforiginal,multidimensionaldata

[26].ThemotivationbehindtheirinclusioninTimeGANisthattheyretain im-portanttemporaldynamicsthatareotherwiselostinGANtraining.

(5)

Fig.3. ThearchitectureoftheTCN.

onARdecomposition,sothattheGANcanspecificallylearnthe condi-tionaltemporalprobabilities.4Forsimplicity,thissupervisedlossisnot

includedinFig.2,butitisusedtotrainboth𝑔 and𝑒. 2.2.1. Conditionalextension(C-TimeGAN)

InthisworkwesuggestanovelextensiontoTimeGANthatissimilar totheextensiontothegenericGAN, so𝑔 and𝑑 are conditionedon someadditionalinformation𝑦,andadversarial objectiveisthesame asinEq.2.Networks𝑒 and𝑟donot changebetween TimeGANand C-TimeGAN.

3. Methodology

3.1. GANImplementation

TwovariantsofC-TimeGANwereimplemented,asinglechanneland amultichannelmodel.Thesinglechannelmodelacceptsinputsofshape [672x1]i.e.asinglevariatetimeseries.Themultichannelmodelinstead acceptsinputsofshape[672x12],whereeachofthe12variablesisa distinctbuilding. Forsimplicity,thethis sectionwillonlyreport the networkshapesfromthemultichannel672timestepmodels.

𝑔,𝑑, 𝑒and𝑟areallimplementedin Tensorflow5 usingTemporal

ConvolutionalNetworks(TCN)thatsharethesamearchitecture6.Each

TCNconsistsof 4stacksof residualblocks,witheachresidualblock containing5hiddenlayersandeachlayerusinganincreasingnumber ofdilations.ThisarchitecturecanbeseeninFig.3.

Bothnon-conditionalandconditionaldatainputswereusedinthis work(seeSection3.3fordetails).Botharchitecturesaredescribedin Fig.4.Forthenon-conditionalcase𝑔,𝑑,𝑒and𝑟allfollow thesame structure.Samplesofshape[672x12]arepassedthroughtheinputlayer andintotheTCN.TheTCN’soutputisthenfedtoadenselayerto trans-formitintothefinaloutput.𝑔,𝑒and𝑟allcontainasigmoidactivation functionintheiroutputlayerwhile𝑑 doesnot.

Fortheconditionalmodels,only𝑔,𝑑 aremodified.Asecondinput layerisaddedtoaccepttheconditionalinput,theconditionalinputis thenpassedintoadenselayertotransformsothatitcanbeconcatenated totheoriginalinput.ThisconcatenatedinputisthenpassedintotheTCN inthesameprocessdescribedbeforehand.

𝑔,𝑑,𝑒and𝑟arealltrainedusingAdamoptimizationandabatch sizeof128.Trainingissplitupinto3mainphases,embeddingtraining, supervisedtrainingandjointtraining.Duringtheembeddingtraining,𝑒 and𝑟aretrainedtogetherusingtherecoverylossfor50,000iterations. Forthesupervisedtraining,𝑔 istrainedexclusivelyusingthesupervised lossfor50,000iterations. Lastly, thejoint trainingphaseconsistsof

4 AnARmodelisonewheretheprobabilityofobservingavalueataspecific time𝑡isconditionalonvaluesfromtheprevious𝑡−1timesteps.

5 https://www.tensorflow.org/

6 TheTimeGANrepositoryisavailableathttps://github.com/jsyoon0823/

TimeGAN

training𝑔,𝑑,𝑒and𝑟usingtheirrespectivelossfunctionsforanother 50,000iterations

3.2. Residentialandcommercialdatasets

Twodatasetswereusedinthiswork:(1)aresidentialsmartmeter datasetand(2)asubsetofeducationandgovernmentbuildingsfrom theBuildingDataGenomeProject[27].Eachconsistsof12buildings withhourlyelectricload.Fortheformer,thestartingindexesarenot alignedbytimestep,andthe(hourly)weatherdatawasobtainedfrom theGovernmentofCanadahistoricalweatherservice,forthe VANCOU-VERHARBOURCSstationbetween2015and20197.Forlatter,the

start-ingindexesarealignedbytimestamp,andtheweatherwasalready included.OtherfeaturesofthetwodatasetsarelistedinTable2and samplesequencesareplottedinFig.5.

3.2.1. Datapreprocessing

Forbothdatasets,eachbuildingwasindividuallystandardizedtothe range[0,1]usingMin-Maxnormalization

̂𝑥= 𝑥𝑚𝑖𝑛(𝑥)

𝑚𝑎𝑥(𝑥)−𝑚𝑖𝑛(𝑥) (4)

where𝑥isthevectorofallloadvaluesfromoneofthebuildingsinthe dataset.Forthesinglechannelmodels,thedatasetswereinstead nor-malizedovereverybuilding,therefore𝑥wouldrepresentthevectorof allloadvaluesfromeverybuildinginthiscase.Missingvalueswere re-placedbyvaluessampledfromthatbuilding’sdistribution.Dataisthen brokenupintosmallersequencesof672or168timestepsdepending onthemodelbeingtrained.Tocreatemoredatafortraining,asliding windowwithalagof1timestepwasalsoappliedduringthesequencing step.

3.3. Datainputexperiments

Theresidentialdatasetwasusedtotestcaseforalargearrayof dif-ferentexperimentswhoseparametersaresummarizedinTable3.GANs haveahighcomputationalcostsoonly12CwithLandLMwerealso testedusingthecommercialdata.Manyoftheresidentialexperiments werenotabletogeneratedatathatrepresentedtheoriginal,buttosave thetimeofotherresearcherswebelievethatitisimportanttopublish negativeresults.Therefore,theresultsofallunsuccessfulexperiments willstillbepresented,8butonlythebestperformingwillbeevaluated

inmoredetail.

The clustering(LC)case was includedbecauseallof [9,10], and [6]usedclustering.Itwasdoneusingak-meansmodeltrainedusing thesetofrealloadcurves.Adifferentk-meansmodelwastrainedfor boththe168and672timestepsets.Inordertofindtheoptimal num-berofclusters𝑘,twocommonclusteringmetricsareused.Thefirstis Davie-BoulinIndex(DBI),whichisusedtoquantifytheclusterscatter andseparation.ThesecondistheErrorSumofSquares(SSE),which quantifies thedifferencebetweensamplesin each individualcluster. Forbothsetsofloadcurvesthenumberofclusterstestedrangedfrom2 to18.Theoptimalnumberofclusterswas6and7forthe672andthe 168timestepsetrespectively

3.4. Metrics

Themetricsthatareusedinthisstudywerechosen(1)toprovidea numericalcomparisonwiththeresultsfrompreviousworks,and(2)to describethedistributionandtemporalbehaviouroftherealvs. gener-ateddata.JSDwaschosenasthekeynumericalmetricbecauseithas

7https://climate.weather.gc.ca/historical_data/search_historic_data_e.html 8For brevity, only the JSD and the forecaster errors described in

Section3.4.1willbereportedinthemainarticle.RefertoAppendixAforall thedistributionplots.

(6)

Fig.4.Themodelarchitecturesforthe uncon-ditionalandconditionalmodel.Inthe condi-tionalcase,𝑒and𝑟stillusetheunconditional architecture.

Table2

Therelevantfeaturesofthetwodatasetsusedtogeneratestochasticloadprofiles.Thenames specifywhethertheresidentialorcommercialdatasetwasused,aswellasthelengthofthe inputsequences.168isaweeklysequencesand672is28days.

Name Location Timespan Input Sequences Input Sequences Output Sequences (sliding window) (raw)

res 168 Vancouver 3 years 269,544 1608 1608

res 672 Vancouver 3 years 263,496 396 396

comm 672 London 1 year 97,056 156 156

Fig.5. Examplesequencesfromthatillustratethetypesoffeaturesfoundintheresidentialandcommercialdata.

beenusedincomprehensiveGANstudiesthatcoveramultitudeof dif-ferentmodellingapproaches(seeTable1).Forecastingvalidationerror isalsoappliedtoindicatethefidelityofthegeneratedsequences,and toofferasanitycheckonJSD.Ofthedistributionmetrics,normalized loadhistograms,standardvarianceandmeanloadarewellusedinthis domain,PCAwasimplementedbythecreatorsofTimeGAN,and,tothe bestofourknowledge,wearethefirsttointroducethestrengthofthe seasonalityandtrend.Allofthesemetricsaredescribedinthefollowing sections.

3.4.1. Numericalscores

JensenShannonDivergence(JSD):JSD wasappliedtoquantifythe differencebetweentheoriginalandgeneratedloaddistributions,using aprocesssimilartothatin[10].BeforetakingtheJSD,theoriginaland generateddatawere(1)logtransformed,(2)normalizedusingEq.4and (3)transformedintotheprobabilitydistributions𝑃𝑜 and𝑃𝑔. For(3),

eachofthedatasetsweredividedinto𝐾 segments(inthiswork𝐾= 100),sothattherangeofthe𝑘𝑡ℎintervalis[𝑘𝐾−1,𝐾𝑘]:

𝑃𝑜= [𝑁 𝑜,1 𝑁 , 𝑁𝑜,2 𝑁 ,, 𝑁𝑜 𝑁 ] 𝑃𝑔= [𝑁 𝑔,1 𝑁 , 𝑁𝑔,2 𝑁 ,, 𝑁𝑔 𝑁 ] (5)

(7)

Fig.6. The twovalidation schemes (TOVO, andTGVO)usedtoevaluatethe time series forecaster.

Table3

Thenomenclatureforthedatainputsforthe ex-periments.

Channels

single channel 1C 12 channel 12C

Load + Conditional Input load only L load + cluster label LC load + mean outtemp. LM load + time stepped, hourly outtemp. LH load + month label LL load + cluster label + mean outtemp. LCM

where𝑁𝑜,𝑘and𝑁𝑔,𝑘arethenumberofloadvalueswithinthe𝑘𝑡ℎ

inter-valoftheoriginalorthegenerateddata.𝐽𝑆𝐷(𝑃𝑜||𝑃𝑔)wasthen calcu-latedbytakingthesquareoftheJSdistance.9,10

ForecastingValidationError(MAE): Timeseriesforecasting canbe usedtotestwhetherthegenerateddataisagoodrepresentationofthe original.Theoriginalandsyntheticdata-setsarebothbrokenupinto se-quencesof24timestepsinlengthtocreatetrainingandtestingdatafor theforecaster.Thegoaloftheforecasteristotakeoneofthesesequences asinputandoutputapredictionforthe25thtimestepofthatsequence. AsimplesinglelayerTemporalConvolutionalNetwork(TCN)wasused forallforecasters.eachforecasterwastrainedfor2000iterationswitha batchsizeof128andanAdamoptimizer.Toquantifyhowwellthe gen-erateddatarepresentedtheoriginal,twocross-validationprocessesare used,andthevalidationscoresbetweenthemarecompared(seeFig.6):

1. TrainedonOriginal,ValidatedonOriginal(TOVO) 2. TrainedonGenerated,ValidatedonOriginal(TGVO)

AdifferentTOVOforecasterwastrainedforthetwodatasets.ForTGVO, adifferent modelis trainedonevery experimentthatis specifiedin Table3.TheMeanAbsoluteError(MAE)isthevalidationmetric,and alowerscore signifyingbetterperformance.Thegoalistoachievea TGVOscorethatisasclosetotheTOVOscoreaspossible.

9 https://scipy.github.io/devdocs/generated/scipy.spatial.distance.

jensenshannon.html.See[10]forthefullequation.

10 TheequationforJSDuseslogarithms,andtheboundsontheresultschange dependingonthebaseofthelogs.Forlogbase2itisboundedby0and1;for logbaseeitisboundedby0andln(2).Inbothcases,ascoreof0meansthatthe distributionswereidentical.Inthisworkweuselogbase2becauseitisintuitive tothinkofarangebetween0and1.

3.4.2. Distributions

Allofthefollowingdistributionswerecalculatedbetweenthe gen-erateddataandtheoriginalsequencesthatexistedbeforeapplyingthe slidingwindow.

1. NormalizedLoadHistograms:Histogramsofthelognormalizedloads areusedtoshow howdistributionsof theoriginal andgenerated datadiffer.11

2. PrincipalComponentAnalysis(PCA):PCAisusedtocompressdata inton-dimensional spacewhileretaining asmuchof theoriginal variabilityaspossible[28].Here,thelog normalizeddatainto 2 dimensions.Assuggestedby[23],thisprovidesaqualitative assess-mentofthediversityoftheoriginalvs.thegenerateddata. 3. StandardVarianceandMeanLoad:Themeanandthestandard

vari-ancearecalculatedforeverysequenceintheoriginalandgenerated datasetsandtheyareplottedagainsteachothertocomparetheir re-lationship.

4. StrengthofSeasonalityandTrend:Anytimeseriessequence𝑦𝑡 can

bedecomposed intoits seasonal(𝑆𝑡),trend(𝑇𝑡) andresidual(𝑅𝑡)

components.12 Thedecompositioncanbewritten as𝑦

𝑡=𝑇𝑡+𝑆𝑡+

𝑅𝑡.Thevariationoftheseasonalityandtrendcanthenbemeasured

relativetothevariationintheresidualsusing 𝐹𝑇 =𝑚𝑎𝑥 ( 0,1− 𝑣𝑎𝑟(𝑅𝑡) 𝑣𝑎𝑟(𝑇𝑡+𝑅𝑡) ) 𝐹𝑆=𝑚𝑎𝑥 ( 0,1− 𝑣𝑎𝑟(𝑅𝑡) 𝑣𝑎𝑟(𝑆𝑡+𝑅𝑡) ) (6) where𝐹𝑇 and𝐹𝑆 measurethestrength ofthetrendandseasonality, respectively.Forboth,therangeofpossiblevaluesliesbetween0and 1,where0isthelowestpossiblestrength[29].

Numerousmethodscanbeappliedtodecomposeatimeseriesintoits components.InthisworkSTLdecompositionisused[30].13Compared

withothermethods,STLismorerobusttooutliersanditcan handle manydifferenttypesofseasonality[29].Basedondomainknowledge aboutthebehaviourofbuildings,adailyseasonalityisspecifiedinthis work.

11Thelogwastakenbeforeregularizationsothatthehistogramslookmore normalandareeasiertointerpret.

12Accordingto[29],aseasonalpatternoccurswhenthetimeseriesisaffected byseasonalfactorssuchasthedayoftheweekorthehouroftheday,atrend patternoccurswhenthereisalong-termincreaseordecreaseinthedata,and theresidualcomponentiswhateverisleftover.

13We use the STL implementation from the statsmodels Python library:

https://www.statsmodels.org/stable/examples/notebooks/generated/stl_ decomposition.html

(8)

Table4

JSdivergencesforallexperiments(lowerisbetter).Foreachdatasetthe bestperformingcasehighlightedinbold.RefertoTable3forthe nomen-clature. L LC LM LH LL LCM res 168 1C 0.309 0.424 0.995 0.491 0.323 0.314 12C 0.077 - 0.099 0.145 0.201 - res 672 1C 0.953 0.408 0.342 0.568 0.507 0.361 12C 0.15 - 0.012 0.172 0.133 - comm 672 12C - - 0.037 - - - Table5

Thevalidationerrors(MAE)forthetrainedforecasters.RefertoTable3and

Section3.4.1,ForecastingValidationError,forthenomenclature.Thebest per-formingcases(i.e.,theTGVOscoresthatareclosesttotheTOVOscores,andthe TGVOscorethatarethelowest)foreachtrainingsetarehighlightedinbold.

TGVO TOVO L LC LM LH LL LCM res 168 1C 0.095 0.072 0.119 0.081 0.154 0.096 0.042 12C 0.051 - 0.056 0.078 0.081 - res 672 1C 0.116 0.101 0.118 0.105 0.079 0.090 12C 0.063 - 0.046 0.059 0.052 - comm 672 12C - - 0.166 - - - 0.041 4. Results

Theresultsforall22experimentsarepresentedintheSection4.1. Thesectionsthatpresenttheresultsfortheresidentialandcommercial casestudies(4.2and4.3)willfocusonlyonthebestperformingmodel (i.e.the12Cmonthlymodelconditionedonmeantemperature). 4.1. Allexperiments

4.1.1. JSDivergence

Table4displaystheJSDsforalloftheconductedexperiments.The highJSDsforRES168_1C_LMandRES672_1C_Lindicatesthatthemodels

werenotabletoconvergeduringtraining.Thesewillnotbeconsidered intherestofthispaper.

Asidefromthenon-convergentcases,theJSDsontheresidentialdata hadawiderangeof[0.012,0.568],andthe12Cmodeloutperformed the1Cmodelforeverydatainputcase.Overall,the1Cbaselines(i.e. L,LC) werenotabletogeneratesequencesthatrepresentedthosein theoriginaldataset.Inmanycases,the168modeloutperformedthe 672model,butthelowestJSDwasfromthelatter.ForRES168,thecase

withnoconditioninghadlowestJSDforboththe1Cand12Ccase.For 12C,conditioningonmonthlylabelalsoproducedarelativelylowJSD. RES672,conditioningonmeanoutdoortemperature(LM)significantly

outperformedtheotherexperiments,withaJSDof 0.012,compared withthenextlowestof0.078.

4.1.2. Forecasting

The cross-validation results for TOVO and TGVO defined in Section3.4.1arepresentedinTable5.RecallthatifaGANgenerates high-fidelitysequencesthanthevalidationerror ontheoriginaldata shouldbesimilarregardlessofwhethertheforecasterswastrainedon generatedororiginaldata.Ontheotherhand,iftheGANgenerates se-quencesthatdonotrepresenttheoriginals,thentheforecasterthatis trainedontheformerwillnotperformwellwhenvalidatedonthelatter. ThereisonlyoneTOVOscorefortheresidentialdatabecausethe fore-castersweretrainedwithsequencesoflengthtodo,sowhetherlength oftheinputsequenceswasirrelevant.

ThemodelsthathadthelowestJSDsalsohadthelowestforecasting errorsand,aswiththeJSDsthe12Cmodeloutperformedthe1Cmodel foreverytypeofdatainput.Thisshowsthatthereisconsistencybetween thetwometricsandfurtherimpliesthatthe672 step,multi-channel

modelconditionedonmeanmonthlyoutdoortemperaturewasthebest performing. Thismodelwillthereforebepresented inmoredetailin thefollowingsections.Thedistributionsforalloftheotherresidential experimentsareavailableinAppendixA.14

4.2. Casestudy1:Residentialdata 4.2.1. Residentialdistributions

Fig.7illustratesthat,inadditiontoachievingalowJSDandTGVO error,RES672_12C_LMwasabletomodelotherimportantattributesin

thetimeseriesdata.Thedistributionsalongthexandy-axisinFig.7b showthattheamountofvariabilitycapturedbytheprinciple compo-nentsissimilarbetweentheoriginalandthegenerateddata. Qualita-tively, themostnoticeabledifferencewasthat theoriginaldatawas more spreadoutalongcomponent1thanthegenerated data,which hadahigherpeakoflowervalues.Thisindicatesthatthegenerateddata containedlessmorediversitythantheoriginal,whichisnotsurprising. Component1hadameanandstandarddeviationof0.0(±2.8)forthe originaland-0.2(±2.75)forthegenerated;component2hadamean andstandarddeviationof0.0(±0.92)fortheoriginaland-0.1(±0.73) forthegenerated.

Themodel’sabilitytocapturetherelationshipbetweenthestandard variancesandthemeansofthesequences(Fig.7c)wasquite impres-sive,sinceitwasabletomodelclustersofoutliers,butthegenerated sequencestendedtohavelowervariancethantheoriginal.This isa similarobservationtothatwhichwasdiscoveredfromthePCA.

Thedistributionsof the𝐹𝑆 and𝐹𝑇 (Fig.7d)exhibited reasonable

overlapbetweentheoriginalandgenerateddata,thoughlesssothan theothertypesofdistributionsdiscussedabove.For𝐹𝑆,themeanand

standarddeviationwere0.22(±0.12)and0.35(±0.17)fortheoriginal andgenerateddata,respectively. TheGANtendedtomodelstronger seasonalitythanwhatwaspresentintheoriginaldata.𝐹𝑇 hadamean

andstandarddeviationof0.07(±0.09)fortheoriginaland0.15(±0.19) forthegenerateddata;boththemeanandvariabilityin𝐹𝑇 iswas

no-tablehigherforthelatter(byabout10%oftherangeofpossible𝐹𝑇s).

4.2.2. Residentialforecasting

Theforecastingmodelthatwastrainedandvalidatedontheoriginal residentialdata(TOVO)achievedaMAEof0.042.Theforecasterwas trainedonthenormalizeddata,sothetotalrangeofvalueswas[0,1]. Relativetothis,anMAEof0.042islow(4.2%ofthetotalrange).The MAEoftheforecasterthatwastrainedonthegeneratedandvalidated ontheoriginal(TGVO)was0.046.Relativetotherangeofvaluesinthe data,theTGVOvalidationonlyincreasedby0.4%whencomparedto TOVO.Thisshowsthatforecasterthatwastrainedgenerateddatawas usefulforpredictingontheoriginal.

4.2.3. Residentialexamples

Fig.8displaystwogeneratedandtwooriginalsequences.Thereisno inherentrelationshipbetweenthechosenexamples;theywereselected tohighlightsomeoftheinterestingfeaturesintheoriginaldatathatthe generatorwasabletocapture.

4.3. Casestudy2:Commercialdata 4.3.1. Commercialdistributions

Fig.9displaysthedistributionplotsforthecommercialdata. Vi-sualizingthelognormalizedloadhistogramshighlightsthedifferences betweentheoriginalandgenerateddatathatcannotbeinferredfrom theJSDalone.Herewecanseethattheoriginaldatahadcertainbins withhighercountsthantheoriginaldata(thehighestcountinthe orig-inalwasalmost1000times higherthanthegenerated),andthatthe generateddatahadashapethatlookedwasbi-modalthantheoriginal.

14Overthecourseoftheresearch,thesewerealsoevaluatedtodeterminethat the672step,12C_LMmodelwasindeedthebestperforming.

(9)

Fig.7. Thedistributionresultsontheresidentialdata.

Fig.8.Examplesequencesfromtheoriginalandgeneratedresidentialdata.

The mean and standard deviations from PCA (Fig. 9b) were 0.0(±2.71)ontheoriginaland0.07(±1.44)onthegenerateddatafor component1;and0.0(±1.66)and3.08(±1.5)forcomponent2.The dif-ferenceinthemeansoftheoriginalandgenerateddistributionsfor com-ponent2isnotable:thegenerateddataappearstohavemoresamples withhigherdiversity.

Thebehaviourofthecommercialgeneratorintermsofthevariances vs.meanswassimilartothatoftheresidential,inthatclustersof build-ingswithaggregatestatisticswereappropriatelymodelled.Theoverlap ofthedistributionson thexandy-axesiftheFig.9cshow thatthe shapeswerealmostperfectlymodelled.Forthetimeseriescomponents (Fig.9d),however,thecommercialmodeldidnotperformwell. Specif-ically,itwasunabletocapturethestrengthofthetrendintheoriginal data.

4.3.2. Commercialforecasting

FromTable5,theTOVOscorewas0.041buttheTGVOscorewas 0.166.TheTOVOscorewassimilartotheresidentialdata,buttheTGVO

scorewasworsethanalloftheTGVOmodels,includingtheonesthat didconverge.Thisisapoorperformanceprovidesfurtherevidencethat eventhoughtheJSDandmeanandstandarddeviationappeared rea-sonable,thegeneratedcommercialdatalikelylacksfidelity,anditis notusefulforforecasting.

4.3.3. Commercialexamples

Asdemonstratedbythesamplesequencesin Fig.5, the commer-cialdataexhibitedadifferenttypeofseasonalitythantheresidential data.Specifically,thecommercialbuildingsoftenhadaweekly season-alitywithhighusageontheweekdaysandlowusageontheweekends. Thisisanexpectedpatternincommercialdata.Fig.10showsthat,even thoughthegeneratorisabletoproduceweekdaysequenceswithhigh fi-delity,itisnotabletomodeltheweeklyseasonality.Thislikelyexplains whythedistributionmetricslookedgoodbuttheforecastingvalidation error waslow.Thiscouldalsohelptoexplainwhytheoriginaldata inthedistributionplotshadsomevalueswithmuchhighercounts:the

(10)

Fig.9. Thedistributionresultsonthecommercialdata.

Fig.10. Examplesequencesfromtheoriginalandgeneratedcommercialdata.

lowerusagevaluesontheweekendsarenotpresentinthegenerated sequences.

4.4. Per-buildingdistributions

WhileFigs.7aand9aplottheregularizedloaddistributionsacross theentiredatasets for theresidentialandcommercialcases, Fig.11 showsthedistributionsforeachbuildingindividually,intheformof boxplots.Theseillustratethatthemodelseachindividualbuilding,even forthecommercialcasewherethebuildingshavelargedifferences(note that,forvisibility,theloadswereloggedin11b).

5. Discussion

Inthissectionwewilldiscusstheperformanceofourmodelin rela-tiontootherworksusingmetricsthatarestandardinthisfieldofstudy (seeTable1).This approachtocomparisonprovidesvaluableinsight intotherelativeperformanceofourapproach,butitdoesnotaccount fortheuseof adifferentGANarchitectureanddatasetthanprevious studies.Weacknowledgethatamoredirectanalysismustbeconducted totrulyestablishtheperformance.Thisleadstoageneralobservation aboutthisdomain;thereisastrongopportunitytointroduceandenforce

(11)

Fig.11. Per-buildingdistributions.

standardpracticesthatwillhelptoacceleratetheresearch.Futurework shouldcompareandbenchmarkdifferentapproachesmorerobustly, in-cludingdifferentapproachestoconditioning.

5.1. Competitiveperformancewithlessdata

Thisresearchsuccessfullydevelopedaloadsequencegeneratorthat iscompetitivewithexistingworks,butthatcanbeappliedfor signif-icantlysmallerdatasets.Unliketheotheruse-caseagnosticstudiesin Table1wedemonstrateourapproachforbothresidentialand commer-cialdata.Ourbestperformingmodel(i.e.the672step,multi-channel model,conditionedonmeanmonthlytemperature)achievedaJSDof 0.012ontheresidentialcasestudyand0.037onthecommercialcase study, using only396 and 156 originalinput sequences (before ap-plying thesliding window). Usinglog e insteadof log 2, these val-uesare0.008and0.024.Thebestperformingarchitecture(ACGAN) intheworkbyWangetal.,whichisthemostcomprehensivestudyon buildingGANstodate,achievedaJSDof0.0463using20,000weekly sequences[10]. Bothourresidentialandcommercialmodels outper-formed[10]usinglessthat2%ofthedata.Guetal.alsousedACGAN andachievedaJSDof0.0045using33,760sequences(loge).Our res-identialresult(0.008)wasclosetotheirs,usingonly1%ofthedata. Ourcommercialresultwasslightlyworse,butourdatasetwas0.5%the size.15

InadditiontoJSD,[9]and[10]alsoplotthedistributionsofthe realandgeneratedmeansandstandardvariances.Thereisno numer-icalmetricthatsummarizesthese,butbasedon visualinspectionwe concludethatourapproachperformsjustaswellorbetter.Thereader isinvitedtocomparetheseplotsbetweenthepapers.Wedonot com-pareourresultswith[18]and[6]becausetheyuse dailysequences, whichisamucheasiergenerationproblem.16

Itisworthnotingthatthecomputationalresourcesrequiredtorun GANsareveryhigh,sohyperparameterexplorationformany experi-mentswasunfeasible.Itis possiblethat ourresultscouldhave been improvedhadmoreoptimalhyperparametersbeenselected,however,

15 Wehadtomakeseveralassumptionsinthiscomparison,sinceneitherof thepapersreportthelogbaseorKvalueinEq.5.Weassumethatbothpapers useK=100tomatchourwork.For[10]weassumetheyuselog2.Sincewe uselogbase2wearepotentiallygivingtheirresultsanadvantageoverours, becausetheJSDforlogereturnssmallerJSDingeneral.Iftheyusedlogbase 10thecomparisonisfair,butiftheyusedlogetheirJSDswillappearsmaller. For[9]wefirstsquare theirresultsbecausetheyreportJSdistance,notJS divergence.Thisresultsinavaluethatis2ordersofmagnitudelowerthan[10], eventhoughtheyareusingthesamemodelarchitectureanddataset.Therefore, weassumethattheyareusingloge.

16 Dailysequencesareeasiertogenerate,buttheyarenotasuseful.

theoriginalTimeGANpaper[23]suggeststhatitisfairlystable,sowe donotexpectthistosignificantlyimpacttheresults.

5.2. Benefitofmultichannelandconditionalformulations

Fromtheanalysisabove,wecanconcludethatourmodelling ap-proachwasabletoreducedatarequirementswithoutdepreciationin performance.Inthissectionweaimtodescribethekeycontributionsof themodellingapproachthatallowedforthisresult.

Thepoorperformanceofthe1Cmodels(Table4)showsthatour ap-proachdidnotacheivelowJSDbecauseofsmalldata,butratherinspite ofit.Theprimaryinsightthatledtotheimprovedperformancewasthe creationofamultivariatetimeseries,inwhicheachvariablerepresents asinglebuilding.Tables4and5showthat,ontheresidentialdata,the 12Cinputdatashapesignificantlyimprovedmodelperformancefor ev-eryinputcase.Intuitively,treatingasetofbuildingsinthiswayhelps theGANtofindtemporalassociationsbetweenthebuildingsbyidentify thedifferencebetweendefinitepatternsandrandomness.Additionally, themulti-channelapproachcircumventstheneedforclustering,which dependsontheuserinterventiontodeterminetheoptimalamountof clusters,andwhichmayhavedifferentdegreesofsuccessondifferent datasets.17

Anothernovelinsightfromthisworkistheconditioningonmean outdoortemperature.Intheresidentialmonthly,multichannelmodel, thisconditioningresultedinalowerJSDandvalidationerrorthanall othercases.Noneoftheotherconditioningcasesoutperformedtheload onlybaseline,whichshowsthattheperformancegainsfromusingthe weatherarenoteworthy,andnotnecessarilyeasytoreplicate. Interest-ingly,however,thebuildingsintheoriginalresidentialdatahadstrong monthlycorrelationswithoutdoortemperaturethatwerenotretained inthegenerateddata,despitetheconditioning.Futureworkshould con-siderinvestigatingwhythesecorrelationswerelost.

5.3. Issueswithtemporalcorrelations

Forthecommercialdata,theJSDandmeanandstandarddeviation plotswerecompetitivewithotherworks,butthetrendwasnot mod-eled,theforecastingvalidationerrorwashigh,andthedistributionsof theprinciplecomponentshadnotabledifferencesbetweentherealand generateddata.Fig.10providesanexplanationforthisbehaviour:the dailysequencesinthedatawerecapturedwithhighfidelity,butthe weeklypatternswerenot.WetestedthisfurtherbytakingthePearson correlationofthedailymeanloadsbetweenallthebuildingsinthe orig-inaldataset.Insomecases,forinstancebuildingsthathadhighweekday andlowweekendusage,therewasastrongcorrelationthatwasnot re-tainedinthegenerateddata.ThisshowsthattheGANisabletocapture

(12)

dailybutnotweeklyseasonality,whichshouldbeaddressedinfuture work.Further,thisprovides strongevidencethatthestandardsetof metricsusedforevaluationisnotsufficient.

5.4. Generatingdataformultiplebuildings

Fig.11showthattherelativemagnitudesandspreadoftheloads weremaintainedforthebuildings,evenforthecommercialdatawhere thereisalargedifferenceinscale.Theabilitytoretainthese distribu-tionsispromisingforpracticalapplicationssuchasstorageandenergy systemdesign.

5.5. Limitationsandfuturework

Alimitationofourapproachisthatitgeneratessetsofsequences, butthereisnowaytoknowwhattimesthesequencesarefor,andsince weusedarollingwindowtocreatemoredataforthetrainingprocess, wedonotknowthestarttimeofanygeneratedsample.Thisalsomeant thatwecouldnotimplementRMSEasametric,sincethedailypeaks inthegenerateddatahappenedatrandomtimestheaggregated them-selvesout.Anattempttoovercomethisissuewastoconditiononhourly temperature(LH)andtousethatasaproxyfortime,buttheresultsdid notexhibitstrongperformance.Futureworkshouldexploreotherways toovercomethisshortcoming.

Importantly,themetricsusedtoevaluatebuildingloadGANsrequire morediscussion.Thisisparticularlyapparentbasedontheevaluation ofthebehaviouroncommercialdata.InthispaperweintroducePCA andseasonalandtrendstrength.Ouranalysisofthesewasstill qualita-tiveandshouldbequantified,buttheresultsshowedthattheyprovide fundamentallyvaluableinformationthatisotherwisemissed[10].also usedPRDandSSIMasscores,buttheseareimage-specificsotheycannot beappliedhere[18].suggesttheuseofMMD,which,likeJSD, quan-tifiesthedifferencebetweentwoprobabilitydistributions[6].usethe KLDofthe5keyparametersforanalyzingelectricloadshapesuggested by[31].Thesewerenotappliedinthisworkbecausetheymeasuredaily

loadprofilesanditislessclearhowtheyshouldbeappliedforsequences withweekly ormonthlyseasonality.Boththese andMMD shouldbe exploredfurtherin futurework;astudydedicatedtotheanalysisof differentmetricsisrequired.

6. Conclusions

GANsarereceivingincreasingattentionforgeneratingbuildingload sequences,buttheyoftendependonlargeamountsofdataarenot al-waysavailableandthetemporalnature ofthegenerateddata isnot quantitativelyevaluated.Thisstudydevelopedanapproachthatuses substantially smaller datasets thanthose of previous works and ex-pandedthemetricsusedforanalysis.Itwasfoundthatamulti-channel TimeGAN,conditionedonmeanmonthlyoutdoortemperature gener-atesloadsequencesthatareapproximately1monthinlengthwithhigh fidelityontheresidentialcasestudy,butthatthecommercialcasehad issuescapturingweeklyseasonality.Inbothcases,thenumericalresults arecompetitivewithotherGANsinthedomain,evenusingonly1–2%of thedata.Usingamultivariatetimeserieswhereeachbuildingrepresents aneuralnetworkinputchannelandconditioningonoutdoor tempera-turearetwonovelinsightsthatleadtostronggenerativeperformance ondata.Thesearekeyinsightsthatprovideimperativeinformation to-wardsreducingdatascarcityinthebuildingsdomain.

DeclarationofCompetingInterest

Theauthorsdeclarethattheyhavenoknowncompetingfinancial interestsorpersonalrelationshipsthatcouldhaveappearedtoinfluence theworkreportedinthispaper.

Acknowledgements

ThisprojectwasfundedbyaCanariegrant.ComputeCanada pro-videdthecloudresourcesusedtoruntrainthenetworks.Theleadauthor wasfundedviaanNSERCBritishColumbiaGraduateScholarship.

(13)
(14)
(15)

References

[1] Goy S, Sancho-Tomás A. 4 - Load management in buildings. In: Eicker U, editor. Urban Energy Systems for Low-Carbon Cities. Academic Press; 2019. p. 137–79. ISBN 978-0-12-811553-4. doi: 10.1016/B978-0-12-811553-4.00004-4 .

URL http://www.sciencedirect.com/science/article/pii/B9780128115534000044

[2] Pasichnyi O, Levihn F, Shahrokni H, Wallin J, Kordas O. Data-driven strategic planning of building energy retrofitting: the case of stockholm. J. Clean. Prod. 2019;233:546–60. doi: 10.1016/j.jclepro.2019.05.373 . URL

http://www.sciencedirect.com/science/article/pii/S0959652619319158

[3] Tian C, Li C, Zhang G, Lv Y. Data driven parallel prediction of building energy consumption using generative adversarial nets. Energy Build. 2019;186:230– 43. doi: 10.1016/j.enbuild.2019.01.034 . URL http://www.sciencedirect. com/science/article/pii/S0378778818322965

[4] Hong T, Wang Z, Luo X, Zhang W. State-of-the-art on research and applications of machine learning in the building life cycle. Energy Build. 2020;212:109831. doi: 10.1016/j.enbuild.2020.109831 . URL http://www.sciencedirect.com/ science/article/pii/S0378778819337879

[5] Scully P . Smart meter market report. Tech. Rep.. IOT Analytics; 2019 .

[6] Wang Z, Hong T. Generating realistic building electrical load profiles through the generative adversarial network (GAN). Energy Build. 2020;224:110299. doi: 10.1016/j.enbuild.2020.110299 . URL http://www.sciencedirect.com/ science/article/pii/S0378778820307234

[7] Hu J, Vasilakos AV. Energy big data analytics and security: challenges and opportu- nities. IEEE Trans. Smart Grid 2016;7(5):2423–36. doi: 10.1109/TSG.2016.2563461 .

Conference Name: IEEE Transactions on Smart Grid

[8] Roth J, Martin A, Miller C, Jain RK. Syncity: using open data to create a synthetic city of hourly building energy estimates by integrating data-driven and physics-based methods. Appl. Energy 2020;280:115981. doi: 10.1016/j.apenergy.2020.115981 .

URL http://www.sciencedirect.com/science/article/pii/S0306261920314306

[9] Gu Y, Chen Q, Liu K, Xie L, Kang C. GAN-based model for residential load gen- eration considering typical consumption patterns. In: 2019 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT); 2019. p. 1–5. doi: 10.1109/ISGT.2019.8791575 . ISSN: 2472–8152

[10] Wang Y, Chen Q, Kang C. Residential load data generation. In: Wang Y, Chen Q, Kang C, editors. Smart Meter Data Analytics: Electricity Consumer Be- havior Modeling, Aggregation, and Forecasting. Springer; 2020. p. 99–135. ISBN 9789811526244. doi: 10.1007/978-981-15-2624-4_5 .

[11] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, editors. Advances in Neural Information Processing Systems, 27. Curran Associates, Inc.; 2014. p. 2672–80 . URL

https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3 -Paper.pdf

[12] Karras T., Laine S., Aila T.. A style-based generator architecture for generative ad- versarial networks. arXiv:1812.04948 .

[13] Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H. Musegan: multi-track sequential gen- erative adversarial networks for symbolic music generation and accompaniment. Proceedings of the AAAI Conference on Artificial Intelligence 2018;32(1) . Number: 1, URL https://ojs.aaai.org/index.php/AAAI/article/view/11312

[14] Xue A. End-to-end chinese landscape painting creation using generative ad- versarial networks. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2021. p. 3863–71 . URL https://openaccess. thecvf.com/content/WACV2021/html/Xue_End-to-End_Chinese_Landscape_Painting _Creation_Using_Generative_Adversarial_Networks_WACV_2021_paper.html

[15] Pang Y, Zhou X, Xu D, Tan Z, Zhang M, Guo N, et al. Generative adversarial learning based commercial building electricity time series prediction. In: 2019 IEEE 31st In- ternational Conference on Tools with Artificial Intelligence (ICTAI); 2019. p. 1800– 4. doi: 10.1109/ICTAI.2019.00271 . ISSN: 2375-0197

[16] Moon J, Jung S, Park S, Hwang E. Conditional tabular GAN-based two-stage data generation scheme for short-term load forecasting. IEEE Access 2020;8:205327–39. doi: 10.1109/ACCESS.2020.3037063 . Conference Name: IEEE Access

[17] Fekri MN, Ghosh AM, Grolinger K. Generating energy data for machine learn- ing with recurrent generative adversarial networks. Energies 2020;13(1):130. doi: 10.3390/en13010130 . Number: 1 Publisher: Multidisciplinary Digital Publish- ing Institute, URL http://www.mdpi.com/1996-1073/13/1/130

[18] Zhang C, Kuppannagari SR, Kannan R, Prasanna VK. Generative adversarial net- work for synthetic time series data generation in smart grids. In: 2018 IEEE In- ternational Conference on Communications, Control, and Computing Technolo- gies for Smart Grids (SmartGridComm); 2018. p. 1–6. doi: 10.1109/SmartGrid- Comm.2018.8587464 .

[19] Chokwitthaya C, Zhu Y, Mukhopadhyay S, Collier E. Augmenting building performance predictions during design using generative adversarial net- works and immersive virtual environments. Autom. Constr. 2020;119:103350. doi: 10.1016/j.autcon.2020.103350 . URL http://www.sciencedirect.com/ science/article/pii/S0926580520309304

[20] Kababji SE, Srikantha P. A data-driven approach for generating synthetic load patterns and usage habits. IEEE Trans. Smart Grid 2020;11(6):4984–95. doi: 10.1109/TSG.2020.3007984 . Conference Name: IEEE Transactions on Smart Grid

[21] Chen Y, Wang Y, Kirschen D, Zhang B. Model-free renewable scenario generation using generative adversarial networks. IEEE Trans. Power Syst 2018;33(3):3265–75. doi: 10.1109/TPWRS.2018.2794541 . Conference Name: IEEE Transactions on Power Systems

[22] LeCun Y, Cortes C. MNIST handwritten digit databaseURL

http://yann.lecun.com/exdb/mnist/ . 2010.

[23] Yoon J, Jarrett D, van der Schaar M. Time-series generative adversarial net- works. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc Fd, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems, 32. Curran Associates, Inc.; 2019. p. 5508–18 . URL https://proceedings.neurips. cc/paper/2019/file/c9efe5f26cd17ba6216bbe2a7d26d490-Paper.pdf

[24] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv:1411.1784 . [25] Goodfellow IJ, Bengio Y, Courville A. Deep learning. Cambridge, MA, USA: MIT

Press; 2016 . http://www.deeplearningbook.org

[26] Bartholomew D . Latent variable models and factor analysis. a unified approach. 3 auflage. Chichester: Wiley; 2011 .

[27] Miller C, Meggers F. The building data genome project: an open, public data set from non-residential building electrical meters. Energy Procedia 2017;122:439– 44. doi: 10.1016/j.egypro.2017.07.400 . {CISBAT} 2017 International Conference Future Buildings; Districts - Energy Efficiency from Nano to Urban Scale, URL

http://www.sciencedirect.com/science/article/pii/S1876610217330047

[28] Bryant FB , Yarnold PR . Principal-components analysis and exploratory and confir- matory factor analysis. In: Reading and understanding multivariate statistics. Amer- ican Psychological Association; 1995. p. 99–136. ISBN 978-1-55798-273-5 .

[29] Hyndman R, Athanasopoulos G. Forecasting: principles and practice. 2nd edition. Melbourne, Australia: OTexts; 2018 . OTexts.com/fpp2

[30] Cleveland RB , Cleveland WS , McRae JE , Terpenning I . Stl: a seasonal-trend decom- position procedure based on loess (with discussion). J. Off. Stat. 1990;6:3–73 .

[31] Price P . Methods for analyzing electric load shape and its variability. Tech. Rep.. Lawrence Berkeley National Lab(LBNL), Berkeley, CA (United States); 2010 .

Referenties

GERELATEERDE DOCUMENTEN

De leerlingen uit de diverse landen waren echter zo enthousiast dat alle landen niet alleen de opdracht hebben gemaakt die ze moesten maken, maar ook alle andere

The first column gives the dimension, the second the highest density obtained by the above described construction and the third column the section in Which the packing is

kruistabellen is de reisduur van 6 minuten als grens genomen. Kruistabellen waarbij de grens op 5 of 7 minuten wordt gelegd, geven hetzelfde resultaat. Hoe langer

Met deze wijziging van de beleidsregels heeft het College voor zorgverzekeringen (CVZ) een bedrag van 1,781 miljoen euro in mindering gebracht op de middelen bestemd voor

He argues: “To view animals the way Nussbaum does, to care for them in a corresponding way, and at the same time to retain the ability to eat them or experiment on them, requires

This is why, even though ecumenical bodies admittedly comprised the avenues within which the Circle was conceived, Mercy Amba Oduyoye primed Circle theologians to research and

In our earlier individual analyses, 10% of the orig- inal NICHD HPTN 040 maternal cohort had positive syphilis results and were twice as likely to transmit HIV to their

Rugkant meestal blougrys met 'n kenmerkende ligte streep op middellyn Wyfies: Nie helder gekleurd me Rugkant vertoon blougrys of brumgrys met die maagkant w it Keelen