• No results found

Unsupervised energy prediction in a smart grid context using reinforcement cross-buildings transfer learning

N/A
N/A
Protected

Academic year: 2021

Share "Unsupervised energy prediction in a smart grid context using reinforcement cross-buildings transfer learning"

Copied!
10
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

ContentslistsavailableatScienceDirect

Energy

and

Buildings

jo u r n al h om ep age :w w w . e l s e v i e r . c o m / l o c a t e / e n b u i l d

Unsupervised

energy

prediction

in

a

Smart

Grid

context

using

reinforcement

cross-building

transfer

learning

Elena

Mocanu

,

Phuong

H.

Nguyen,

Wil

L.

Kling

1

,

Madeleine

Gibescu

DepartmentofElectricalEngineering,EindhovenUniversityofTechnology,5600MBEindhoven,TheNetherlands

a

r

t

i

c

l

e

i

n

f

o

Articlehistory: Received15June2015 Receivedinrevisedform 11December2015 Accepted22January2016 Availableonline16February2016 Keywords:

Buildingenergyprediction Reinforcementlearning Transferlearning DeepBeliefNetworks Machinelearning

a

b

s

t

r

a

c

t

InafutureSmartGridcontext,increasingchallengesinmanagingthestochasticlocalenergysupplyand demandareexpected.Thisincreasedtheneedofmoreaccurateenergypredictionmethodsinorder tosupportfurthercomplexdecision-makingprocesses.Althoughmanymethodsaimingtopredictthe energyconsumptionexist,alltheserequirelabelleddata,suchashistoricalorsimulateddata.Still, suchdatasetsarenotalwaysavailableundertheemergingSmartGridtransitionandcomplexpeople behaviour.Ourapproachgoesbeyondthestate-of-the-artenergypredictionmethodsinthatitdoesnot requirelabelleddata.Firstly,tworeinforcementlearningalgorithmsareinvestigatedinordertomodel thebuildingenergyconsumption.Secondly,asamaintheoreticalcontribution,aDeepBeliefNetwork (DBN)isincorporatedintoeachofthesealgorithms,makingthemsuitableforcontinuousstates.Thirdly, theproposedmethodsyieldacross-buildingtransferthatcantargetnewbehaviourofexistingbuildings (duetochangesintheirstructureorinstallations),aswellascompletelynewtypesofbuildings.The methodsaredevelopedintheMATLAB®environmentandtestedonarealdatabaserecordedoverseven years,withhourlyresolution.Experimentalresultsdemonstratethattheenergypredictionaccuracyin termsofRMSEhasbeensignificantlyimprovedin91.42%ofthecasesafterusingaDBNforautomatically extractinghigh-levelfeaturesfromtheunlabelleddata,comparedtotheequivalentmethodswithout theDBNpre-processing.

©2016ElsevierB.V.Allrightsreserved.

1. Introduction

Predictionofenergyconsumptionasafunctionoftimeplays anessentialroleinthecurrenttransitiontofutureenergysystems. Withinthenewcontextofso-calledSmartGrids,theenergy con-sumptionofbuildingscanberegardedasanonlineartimeseries, dependingonmanycomplexfactors.Thevariabilityintroducedby thegrowingpenetrationofwindandsolargenerationsourcesonly strengthenstheroleofaccuratepredictionmethods[1].Prediction formsanintegralpartintheefficientplanningandoperationofthe wholeSmartGrid.

Ontheonehand,advancedenergypredictionmethodsshould beeasilyexpandabletovariouslevelsof dataaggregationatall timescales[2].Ontheotherhand,theyhavetoautomaticallyadapt withdecisionstrategiesbasedondynamicbehaviorofactive con-sumers(e.g.newandsmart(er)buildings)[3].Applicationsofthese newmethodsshouldfacilitatethetransitionfromthetraditional

∗ Correspondingauthor.

E-mailaddress:e.mocanu@tue.nl(E.Mocanu).

1 DeceasedMarch14,2015.

single-tariffgridtotime-of-use(TOU)andreal-timepricing.The effectswillbefeltbyallplayersinthegridfromtransmission(TSO) anddistributionsystemoperator(DSO)totheend-user,including resourceassessment and analysisofenergy efficiency improve-ments, flexible demand response (DR) and other continuous projectiononplanningstudies.The jointconsideration of deci-sionsregardingnewrenewablegeneration,TSOdevelopment,and demand-sidemanagement(DMS)programsinanintegrated fash-ionrequires demandforecasts.Consequently,thesewillrequire changesinthewayhowthedataarecollectedandanalyzed[4].

Priorstudieshaveshownthatbyusingstatisticalmethods,more recentinspiredbysupervisedmachinelearningtechniques,such asSupportVectorMachines[5,6],ArtificialNeuralNetworks[7,8], Autoregressivemodels[9],Conditionalrandomfield[10],or Hid-denMarkovModel[11],onecanimprovetheaccuracyofenergy predictionsignificantly.Ontheotherhand,therearemanymethods basedonphysicalprinciples,includingalargenumberofbuilding parameters,tocalculatethermaldynamicsand energybehavior atthebuildinglevel.Moreover,toshape theevolutionoffuture buildingssystems,therearealsosomehybridapproacheswhich combinesomeoftheabovemodelstooptimizepredictive perfor-mance,suchas[12–16].Interestedreadersarereferredforamore http://dx.doi.org/10.1016/j.enbuild.2016.01.030

(2)

Nomenclature

˛ learningrate,

˛∈[0,1]  discountfactor,

∈(0,1) E[·] expectedvalueoperator

A thesetofactions,

a∈A

D dataset

R therewardfunction

S thesetofstates,

s ∈S T transitionprobabilitymatrix

h vectorcollectingallthehiddenunits,hj∈{0,1} v vectorcollectingallvisibleunits,

v

i ∈R Wvh matrixofallweightsconnectingvandh E totalenergyfunctionintheRBMmodel k thenumberofhiddenlayers

M buildingenergyconsumptionmodel p,P probabilityvalue/vector

Q thequalitymatrix

t time

Z normalizationfunction

comprehensivediscussionontheapplicationofenergydemand managementto[9,14,17,18].

Althoughtheyremainattheforefrontofacademicandapplied research,allthesemethodsrequirelabeleddataabletofaithfully reproducetheenergyconsumptionofbuildings.Intheremainingof thispaperwerefertothelabeleddataastothehistorical(known) dataoftheanalyzedbuilding.Usuallythelackofhistoricaldatacan bereplacedbysimulateddata.Still,both,historicalorsimulated data,areemployedintheseforecastingmethodsinanon-adaptable waywithoutconsideringthefutureeventsorchangeswhichcan occurintheSmartGrid.

Astrongermotivationfor thispaper,isgivenbythenottoo wellexploitedfactthatsometimestherearenothistoricallydata consumptionavailableforaparticularbuilding.Fromthemachine learningperspectivethisisatypicalunsupervisedlearning prob-lem. One of the most used methods of unsupervised learning, reinforcementlearning(RL),wasintroducedinpowersystemarea to solvestochastic optimalcontrol problems [19]. RL methods areusedinawiderangeofapplications,suchassystemcontrol [20],playing gamesormorerecent intransferlearning [19,21]. Theadvantageofthecombinationofreinforcementlearningand transferlearningapproachesisstraightforward.Hence,wewantto transferknowledgefromaglobaltoalocalperspective,toencode theuncertaintyofthebuildingenergydemand.

Owingtothecurseofdimensionality,thesemethodsfailinhigh dimensions.Morerecently,therehasbeenarevivalofinterestin combiningDeep Learningwithreinforcementlearning.Therein, RestrictedBoltzmannMachineswereproventoprovideavalue functionestimation[22]orapolicyestimation[23].Morethanthat, Minhetal.[24]combinedsuccessfullydeepneuralnetworksand Q-learningtocreateadeepQ-networkwhichsuccessfullylearned controlpoliciesinarangeofdifferentenvironments.

Inthispaper,wecomprehensivelyexploreandextendedtwo reinforcementlearning(RL)methodstopredicttheenergy con-sumptionat the buildinglevel usingunlabelled historical data, namely State-Action-Reward-State-Action (SARSA) [25] and Q-learning[26].Duetothefactthatintheoriginalformbothmethods can not handle well continuous states space, this paper con-tributestheoreticallytoextendthembyincorporatingaDeepBelief Network[27]forcontinuousstatesestimationandautomatically featuresextractioninaunifiedframework.OurproposedRL meth-odsareappropriatewhenwedonothavehistoricalorsimulated data,butwewanttoestimatetheimpactofchangesinSmartGrid,

suchastheappearanceofabuildingorseveralbuildingsina cer-tainarea,ormorecommonlyachangeinenergyconsumptiondue tobuildingrennovation.Inthispaper,wehaveshownthe appli-cabilityandefficiencyofourproposedmethodinthreedifferent situations:

1.In the case of a new type of buildingbeing connected with theSmartGrid,thustransferringknowledgefromacommercial buildingtoaresidentialbuilding.Specifically,inSection 6.2.1, fourdifferenttypesofresidentialbuildingswereanalyzed. 2.Inthecaseofarennovatedbuilding,thustransferring

knowl-edgefromanon-electricheatbuildingtoabuildingwithelectric heating.

3.Additionally,weproposeexperimentstohighlightthe impor-tanceofexternalfactorsfortheestimationofbuildingenergy consumption,suchaspriceinformation.InSection 6.2.2, trans-ferlearningisapplied,fromabuildingunderastatictarifftoa buildingwithatime-of-usetariff.

Accordingtoourknowledge, this isthe firsttime when the energy prediction is performed without using any information aboutthatbuilding,suchashistoricaldata,energyprice,physical parametersofthebuilding,meteorologicalconditionor informa-tionabouttheuserbehavior.

The paper is organized as follows. In Section 2 we explain therationalityunderlyingourapproach.Section 3 presentsthe mathematicalmodelingofthereinforcementlearningapproaches. Section 4describesthenoveltymethodtoestimatecontinuous statesinreinforcementlearningusingDeepBeliefNetworks.The experimentssetupand resultsareillustratedin Sections 5and 6,respectively.Thepaperconcludeswithadiscussionandfuture work.

2. Problemformulation

Inthispaper,weproposeamethodtosolvetheunsupervised energypredictionproblemwithcross-buildingtransferbyusing machinelearningtimeseriespredictiontechniques.Inthemost generalstatement,theproposedReinforcementandtransfer learn-ing setup is depictedin Fig.1. Given theunevenly distributed buildingenergyvaluesduringtime,firstly,a specialattentionis

Fig.1.Theunsupervisedlearningexploreandextendsreinforcementandtransfer learningsetup,byincludingaDeepBeliefNetworkforcontinuousstatesestimation.

(3)

giventothequestion:Howtoestimateacontinuousstatespace?The ideaistofindalower-dimensionalrepresentationoftheenergy consumptiondatathatpreservesthepairwisedistancesaswellas possible.

More formally, the energy prediction using unlabeled data problempresentedinthispaperisdividedintothreedifferent sub-problems,namely:

1.Continuousstateestimationproblem:Givenadataset,D:IR→

S,findaconfinedspacestaterepresentationS1.

2.Reinforcementlearningproblem:GivenabuildingmodelM1 = S1,A1,T·(·,·),R1,findaoptimalpolicy,∗1.

3.Transfer learning problem: Given a model, M1 = S1,A1,T·(·,·),R1, a reasonable 1∗ and M2 = S2,A2,T·(·,·),R2,findagood2.

TheproposedsolutionispresentedinSection 4,whereanew methodtoestimatecontinuousstatesinreinforcementlearning usingDeepBeliefNetworkisdetailed.Furtherthisstateestimation methodisintegratedinSARSAandQ-learningalgorithmsinorder toimprovethepredictionaccuracy.

3. Reinforcementlearning

Reinforcement learning [28] is a field of machine learning inspiredbypsychology,which studieshowartificialagents can performactionsinanenvironmenttoachieveaspecificgoal. Practi-cally,theagenthastocontroladynamicsystembychoosingactions inasequentialfashion.Thedynamicsystem,knownalsoasthe environment,ischaracterizedbystates,itsdynamics,anda func-tionthatdescribesthestate’sevolutiongiventheactionschosen bytheagent.Afteritexecutesanaction,theagentmovestoanew state,whereitreceivesareward(scalarvalue)whichinformsit howfaritisfromthegoal(thefinalstate).Toachieve thegoal, theagenthastolearnastrategytoselectactions,dubbedpolicyin theliterature,insuchawaythattheexpectedsumoftherewards ismaximizedovertime.Besidesthat,astateofthesystem cap-turesalltheinformationrequiredtopredicttheevolutionofthe systeminthenextstate,givenanagentaction.Also,itisassumed thattheagentcouldperceivethestateoftheenvironmentwithout error,anditcouldmakeitscurrentdecisionbasedonthis informa-tion.TherearetwodifferentcategoriesofRLalgorithms,(i)Online RLwichareinteractionbasedalgorithms,suchasQ-learning[26], SARSA[25]orPolicyGradient,and(ii)OfflineRL,likeLeast-Square PolicyIterationorfittedQ-iteration.Foramorecomprehensive dis-cussionofRLalgorithmswereferto[29].Intheremainingofthis paperwillreferjusttoonlineRL.

ARLproblemcanbeformalizedusingMarkovdecisionprocess (MDPs).MDPsaredefinedbya4-tupleS,A,T·(·,·),R·(·,·), whereSis asetofstates,

s ∈S, Ais asetofactions,

a∈A, T:S×A×S→[0,1]isthetransitionfunctiongivenbythe prob-abilitythatbychoosingaction ain statesattimet thesystem willarrivetostatesattimet+1,suchthatpa(s,s)=p(st+1=s|st=s, at=a),andR:S×A×S→Ristherewardfunction,wereRa(s,s) istheimmediatereward(orexpectedimmediatereward)received bytheagentafteritperformsthetransitiontostatesfromstate s.AnimportantpropertyinMDPsistheMarkovpropertywhich makestheassumptionthatthestatetransitionsaredependentjust onthelaststateofthesystem,andareindependentofanyprevious environmentstatesoragentactions,i.e.p(st+1=s,rt+1=r|st,at)for alls,r,st,andat.TheMDPstheorydoesnotassumethatSorAare finite,butthetraditionalalgorithmsmakethisassumption.In gen-eral,theycanbesolvedbyusinglinearordynamicprogramming. Theinterestedreaderisreferredto[30]foramore comprehen-sivediscussionaboutMDPs.Furthermore,inthereal-world,the

statestransitionsprobabilitiesT·(·;·)andtherewardsR·(·;·)are unknown,andthestatesspaceSortheactionsspaceAmightbe continuous.Thus,RLrepresentsanormalextensionand generaliza-tionoverMDPsforsuchsituations,wherethetasksaretoolargeor tooill-defined,andcannotbesolvedusingoptimal-controltheory [25].

3.1. Q-learning

First, the Q-learning algorithm [26] is recommended like a standardsolutioninRLweretherulesareoftenstochastic.This algorithmthereforehasafunctionwhichcalculatestheQualityof astate-actioncombination,definebyQ:S×A→R.Before learn-inghasstarted,Qmatrixreturnsaninitialvalue.Then,eachtime theagentselectsanaction,andobservesarewardandanewstate thatbothmaydependonthepreviousstateandtheselectedaction. Theaction-valuefunctionofafixedpolicywiththevaluefunction V:SRis

Q(s,a)= r(s,a)+



s

p(s|s,a)V(s),

sS,aA (1)

Thevalueofstate-actionpairs,Q(s,a),representstheexpected outcomewhenoneagentisstartingfroms,executingaandthen followingthepolicyafterwards,suchthatV(x)=Q(x,(x)),with theircorrespondingBellmanequation

Q∗(s,a)= r(s,a)+



s

p(s|s,a)max b Q

(s,b) (2)

wherethediscountfactor∈[0,1]tradesofftheimportanceof rewardsandb=max(a).Thus,theoptimalvalueareobtainedfor

sS,V∗(s)= max

a Q

(s,a)and(s)= argmax a Q

(s,a).Thevalue ofstate-actionpairsisgivenbythesameformalexpectationvalue, E,ofanexpectedtotal returnrt,suchthatQ (s,a)= E(rt|st = s,at = a).Theoff-policyQ-learningalgorithmhavetheupdaterule defineby

Qt+1(st,at)

=Qt(st,at)+˛t[rt+1+maxQt(st+1,a)−Qt(st,at)] (3) wherert+1istherewardobservedafterperformingat inst,and where˛t(s,a),withall˛∈[0,1],isthelearningratewhichmay bethesameforallpairs.Q-learningalgorithmhasproblemswith bignumbersofcontinuousstatesanddiscreteactions.Usually,it needsfunctionapproximations,e.g.neuralnetworks,toassociate tripletslike(state,action,Q-value).ExplorationofoneMDPcanbe doneunderMarkovassumption,totakeintoaccountjustcurrent stateandaction,butbecauseintherealworldwehavePartially ObservableMDPs,wemayhavebetterresultsifanarbitraryk num-berofhistorystatesandactions(St−k,at−k,···,St−1,at−1)willbe considerate[31],toclearlyidentifyatripletSt,At,Qtattimet. 3.2. SARSA

An interesting variation for Q-learning is the State-Action-Reward-State-Action(SARSA)algorithm[25],whichaimsatusing Q-learningaspartofaPolicyIterationmechanism.Themajor differ-encebetweenSARSAandQ-Learning,isthatthemaximumreward forthenextstateinSARSAisnotnecessarilyusedforupdatingthe Q-values.Therefore,thecoreoftheSARSAalgorithmisasimple valueiterationupdate.Theinformationrequiredfortheupdateis atuple(st,at,rt+1,st+1,at+1),andtheupdateisdefinedby Qt+1(st,at)

(4)

wherert+1istherewardand˛t(s,a)isthelearningrate.Inpractice, Q-learningandSARSAarethesameifweuseagreedypolicy(i.e. theagentchoosesthebestactionalways),butaredifferentwhen the



-greedypolicyisused,whichfavorsmorerandomexploration. In traditional reinforcement learning algorithms, only MDPs withfinitestates andactionsareconsidered.However,building energyconsumptioncantakenearlyarbitraryrealvalueresulting inaverylargenumberofstatesinMDPs.Duetothefactthatenergy consumptioncanbeseenasatimeseriesproblem,anprior dis-cretizationofthestatesspaceisnotveryuseful.So,wetrytofind algorithmsthatworkwellwithlarge(orcontinuous)statesspaces, asitisshownnext.

4. StatesestimationviaDeepBeliefNetworks

Deep architectures [27] showed very good results in differ-ent applications,such as to perform non-linear dimensionality reduction [32], images recognition [33], video sequences, or motion-capturedata [34].A comprehensiveanalysison dimen-sionalityreductionanddeeparchitecturescanbereferredto[35]. Overall,DeepBeliefNetworks(DBN)couldbeawaytonaturally decomposetheproblemintosub-problemsassociatedwith differ-entlevelsofabstraction.

4.1. RestrictedBoltzmannMachines

DBNsarecomposedofseveralRestrictedBoltzmannMachines (RBMs)stackedontopofeachother[36].ARBMisastochastic recurrentneuralnetworkthatconsistsofalayerofvisibleunits,v, andalayerofbinaryhiddenunits,h.Thetotalenergyofthejoint configurationofthevisibleandhiddenunits(v,h)isgivenby: E(v,h)= −



i,j

v

ihjWij−



i

v

iai−



j hjbj (5)

whereirepresentstheindicesofthevisiblelayer,jthoseofthe hiddenlayer,andwi,jdenotestheweightconnectionbetweenthe ithvisibleandjthhiddenunit.Further,

v

iandhjdenotethestate oftheithvisibleandjthhiddenunit,respectively,aiandbj rep-resentthebiasesofthevisibleandhiddenlayers.Thefirstterm,



i,j

v

ihjWijrepresentstheenergybetweenthehiddenandvisible unitswiththeirassociatedweights.Thesecond,



i

v

iairepresent theenergyinthevisiblelayer,whilethethirdtermrepresentsthe energyinthehiddenlayer.TheRBMdefinesajointprobabilityover thehiddenandvisiblelayerp(v,h)

p(v,h)= e−E(v,h)Z (6)

whereZisthepartitionfunction,obtainedbysummingtheenergy ofallpossible(v,h)configurations,Z=



v,he−E(v,h).Todetermine

theprobabilityofadatapointrepresentedbyastate

v

,themarginal probabilityisused,summingoutthestateofthehiddenlayer,such thatp(

v

)=



hp(v,h).

Theaboveequationcanbeusedforanygiveninputto calcu-latetheprobabilityofeitherthevisibleorthehiddenconfiguration tobeactivated.Thisvaluesarefurtherusedtoperforminference inordertodeterminetheconditionalprobabilitiesinthemodel. Tomaximizethelikelihoodofthemodel,thegradientofthe log-likelihoodwithrespecttotheweightsmusttobecalculated.The gradientofthefirstterm,aftersomealgebraicmanipulationscan bewrittenas

log(



hexp(−E(

v

,h)))

Wij =

v

i·p(hj =1|

v

) (7)

However,computingthegradientofthesecondtermisintractable.

Fig.2.AgeneralDeepBeliefNetworkstructurewiththreehiddenlayers.Thetop twolayershaveundirectedconnectionsandformanassociativememory,where denotesbinaryneuronsand◦representstherealvalues.

TheinferenceofthehiddenandvisiblelayersinRBMcanbe doneaccordinglywiththenextformulas

p(hj =1|v) =(bj+



i

v

iWji) (8) p(vi = 1|h)= (ai+



j hjWji) (9)

where (.)representsthe sigmoidfunction. Moreover,tolearn anRBMwecanusethefollowinglearningrulewhichperforms stochasticsteepestascentinthelogprobabilityofthetrainingdata [37]:

log(p(v,h))

Wij

= 

v

ihj0−

v

ihj∞ (10) where·0denotestheexpectationsforthedatadistribution(p0), and·∞denotestheexpectationsunderthemodeldistribution. 4.2. DeepBeliefNetworks

Overall,aDeepBeliefNetwork[27]isgivenbyanarbitrary num-berofRBMsstackonthetopofeachother.Thisyieldsacombination between a partially directed and partially undirected graphical model.Therein,thejointdistributionbetweenvisiblelayerx(input vector)andthelhiddenlayershkisdefineasfollows

p(x,h1,...,hk)= l−2



k=0

P(hk|hk+1)P(hl−1,hl) (11)

whereP(hk|hk+1)isaconditionaldistributionforthevisibleunits conditionedonthehiddenunitsoftheRBMatlevelk,andP(hl−1, hl)isthevisible-hiddenjointdistributioninthetop-levelRBM.An exampleofaDBNwith3hiddenlayers(i.e.h1(j),h2(k),andh3(l)) isdepictedinFig.2.

ThetoplevelRBMinaDBNactsasacomplementarypriorfrom thebottomleveldirectedsigmoidlikelihoodfunction.ADBNcan betrainedinagreedyunsupervisedway,bytrainingseparately eachRBMfromit,inabottomtotopfashion,andusingthehidden layerasaninputlayerforthenextRBM[38].Furthermore,the DBN,canbeusedtoprojectourinitialstatesacquiredfromthe environmenttoanotherstatespacewithbinaryvalues,byfixing theinitialstatesinthebottomlayerofthemodel,andinferringthe tophiddenlayerfromthem.Intheend,thetophiddenlayercanbe directlyincorporatedintotheSARSAorQ-learningalgorithms,as itisdescribedinAlgorithm1.

(5)

Algorithm1. RLextensionincludingaDBNforstateestimation. 1:%%DBNforstateestimation

2:InitializeDBN

3:InitializetrainingsetXwiththestates 4:foreachRBMkinDBN

5: repeattrainingepoch

6: Foreachtraininginstancex∈X

7: SetRBMvisible k =x

8: RunMarkovchaininRBMk

9: GetStatisticsforRBMk

10: UpdateWeightsforRBMk

11: endfor

12: Untilconverge

13: foreachtraininginstancex∈X

14: SetRBMvisible k =x

15: InferRBMhidden k

16: ReplacexinXwithRBMhidden k

17: endfor

18:endfor

19:%%UsethelastcomputedXasstatesforRL(·) 20:%%RL(1):SARSAalgorithm

21:InitializeQ(s,a)arbitrarily,wheres∈X

22:repeat(foreachepisode) 23: Initializes

24: ChooseafromsusingpolicyderivedfromQ

25: repeat(foreachstepofepisode)

26: Takeactiona,observer,s

27: ChooseafromsusingpolicyderivedfromQ

28: Q(s,a)←Q(s,a)+˛[r+Q(s,a)Q(s,a)] 29: s←s 30: a←a 31: untilsisterminal 32:untilQoptimal 33:%%RL(2):Q-learningalgorithm 34:InitializeQ(s,a)arbitrarily,wheres∈X

35:repeat(foreachepisode) 36: Initializes

37: repeat(foreachstepofepisode)

38: ChooseafromsusingpolicyderivedfromQ 39: Takeactiona,observer,s

40: Q (s,a)←Q (s,a)+˛[r+max ˛ Q (s,a)−Q (s,a)] 41: s←s 42: untilsisterminal 43: untilQoptimal

Nowthatwehaveconsideredtheproblemofstateestimation andweincorporatedallthreesub-problemsinaunifiedapproach welookintotheexperimentalvalidation.

5. Datasetcharacteristics

The proposed solution is experimentally evaluated using a datasetrecordedoversevenyears,moreexactlybetweenJanuary 6,2007andJanuary31,2014.Theloadsprofiles,including differ-entresidentialandcommercialbuildings,aremadeavailableby BaltimoreGasandElectricCompany[39].Foreverytypeof build-inganalyzedtheavailablehistoricalloaddatainkWhrepresentan averagebuildingprofileperhour.Overall,therearefivedifferent buildingsprofiles,aspresentedinTable1.

Fora morecomprehensiveviewof thedatasets usedin this paperwehaveshowninFig.3thehourlyevolutionoftheelectrical energyconsumptionforaGeneralService(G)dataset,including

Table1

Buildingtypesindatasets.

Residential

R Residential(non-electricheat) R(ToU) ResidentialTime-of-Use

(non-electricheat) RH Residential(electricheat) RH(ToU) ResidentialTime-of-Use(electric

heat)

Commercial G GeneralService(<60kW), Commercial,Industrial&Lighting

Fig.3. ElectricalenergyconsumptionforaCommercial,Industrial&Lighting(G) datasetandforaresidentialbuildingwithnon-electricheat(R)buildingover dif-ferenttimehorizons.

Commercial, Industrial &Lighting, and a residential with non-electricheat(R)buildingoverdifferenttimehorizons.Moreover, somegeneralcharacteristicsfortheentiredatasetaregraphically depictinginFig.4.Inallexperimentsthedatawereseparatedinto thetrainingandtestingdatasets.Moreprecisely,thedatacollected from1stJune2007until1stJanuary2013(2041days)wereusedin thelearningphaseandtheremainingdata,betweenJanuary2013 and31January2014(396days)wereusedtoevaluatethe perfor-manceofthemethods.Themetricsusedtoassessthequalityofthe differentbuildingsenergyconsumptionpredictionaredescribed further.

5.1. Metricsforpredictionassessment

Aswe mentionearlier, thegoal is toachieve good general-ization by makingaccurateprediction for new buildingenergy consumption data. Firstly, some quantitative insight into the dependenceofthegeneralizationperformanceofourapproachare evaluatedusingtheroot-mean-square errordefinedbyRMSE =



(1/N)



Ni=1(

v

i−

v

ˆi)2,whereNrepresentsthenumberof multi-stepspredictionwithinaspecifiedtimehorizon,

v

irepresentsthe realvaluesforthetime-stepiandˆ

v

irepresentsthemodelestimated valueatthesametime-step.Then,byusingthePearson product-momentcorrelationcoefficient(R),insightsaregivenonthedegree oflinear dependencebetweenthereal valueandthepredicted value.Hence R(u,

v

)= (E[(u−u)(

v

−v)]/uv), where E[·] is

Fig.4. Generalcharacteristicsofalldatasets:abox-plotwiththeexactlyvaluefor meanandstandarddeviationencodedinit.

(6)

Fig.5. (Left)TheRMSEvaluesobservedfordifferentRBMconfigurationsintheDBNarchitecture,withvaryingnumberofhiddenneurons,asafunctionoftrainingepochs. (Right)PerformancemetricsforthechosenRBMconfigurationwith10hiddenneurons.

theexpectedvalueoperatorwithstandarddeviationsu andv. Thecorrelationcoefficientmaytakeonanyvaluewithintherange [−1,1].Thesignofthecorrelationcoefficientdefinesthedirection oftherelationship,eitherpositiveornegative.Finally,weperform theKolmogorov–Smirnovtest[40]inordertogaininsightson sta-tisticalsignificanceofourresults.TheKolmogorov–Smirnovtest hastheadvantageofmakingnoassumptionaboutthedistribution ofdata.Thiselaboratestatisticaltestisnotatypicallymetricused intheanalyzeofthepredictionaccuracy,butisimposedbythe factthatthelearningandthetestingprocedureismadeusing dif-ferentbuildingtypes.Hence,exceedingthestatisticalsignificance level,p<0.05, wouldbeexpectedandwillvalidatethedifferent probabilitydistributionfunctionfromwherethisdataareprovided. 6. Empiricalresults

Toassesstheperformanceofourextendedreinforcementand transfer learning approaches presented in Section 4, we have designed five different scenarios. These are selected to cover variousmulti-steppredictionatdifferentresolution,andare sum-marizedinTable2.Furtheron,beforetogointhedeepanalysesof thenumericalresults,firstlywepresentsomedetailsofthe imple-mentation.

6.1. Implementationsdetails

Theimplementationhasbeendoneintwoparts.Firstly,aDBNis implemented,andsecondlytheRLalgorithmsusetheDBNintheir implementationsforcontinuousstatesestimation,asitisshown next.

6.1.1. ContinuousstatesestimationsusingDBN

WeimplementedtheDBNinMATLAB®fromscratchusingthe mathematicaldetailsdescribedinSection 4.Inordertoobtaina goodpredictionweinvestigatecarefullythechoiceoftheoptimal numberofhiddenunitsinourDBNconfigurationwithrespectto theRMSEevolution,seeFig.5.

Table2

Summaryoftheexperiments.

Notation Timehorizon Resolution Scenario1 S1 1h 1haverage Scenario2 S2 1day 1haverage Scenario3 S3 1week 1haverage Scenario4 S4 1month 1haverage Scenario5 S5 1year 1weekaverage

Thus,thenumberofhiddenneuronswassetto10andthe learn-ingratewas10−3.Themomentumwassetto0.5andtheweight decayto0.0002.Wetrainedthemodelfor20epochs,butasitcan beseeninFig.5(right),themodelconvergedafterapproximately 4epochs.Moredetailsabouttheoptimalchoiceoftheparameters canbefoundin[41].

6.1.2. SARSAandQ-learning

WeimplementedtheSARSAandQ-learninginMATLAB®using themathematicaldetailsdescribedinSection3.Inbothcasesthe learningratewassetto0.4andthediscountfactorwasconsiderate 0.99.Bothparametershaveadirectinfluenceontheperformance ofthebothalgorithms.

The choice of theseparameters wasmade after a thorough examinationoftheRMSEoutcome,asisshownforexampleinFig.6. Overall,thelearning ratedeterminestowhat extentthenewly acquiredinformation will overridetheold information and the discountfactordeterminestheimportanceoffuturerewards,for example=0willmaketheagent“opportunistic”byonly consid-eringcurrentrewards,whileadiscountfactorapproaching1will makeitstriveforalong-termhighreward.

6.2. Numericalresults

Inthissection,wetestandillustratethetwounsupervised learn-ingapproachesusingthedatasetdescribedinSection5.Different scenarios,assummarizedinTable2,havebeencreatedtoassess theperformanceoftheproposedmodels.

Fig.6. AnalysesofRMSEvaluesobtainedfromdifferent˛valueinexploration step,fordifferentscenarios.ThisinvolvethepredictionofGdataset(Commercial, Industrial&Lightingconsumption,GeneralService(<60kW)).

(7)

E. Mocanu et al. / Energy and Buildings 116 (2016) 646–655 Table3

UsingCommercial,GeneralService(G)(<60kW)datasettopredictrezidentialenergyconsumption,suchasR,R(ToU),RHandRH(ToU)valuesusingSARSA,Q-learning,SARSAwithDBNextensionandQ-learningwithDBN extension.

Method G R R(ToU) RH RH(ToU)

RMSE R p-Val RMSE R p-Val RMSE R p-Val RMSE R p-Val RMSE R p-Val

S1

SARSA 0.18 0.90 5.6e−10 0.02 0.99 2.8e−23 0.10 0.93 3.9e−12 0.36 0.91 1.2e−10 0.42 0.88 4.2e−09

Q-learning 0.22 0.86 2.2e−08 0.02 0.99 2.8e−23 0.04 0.99 2.5e−21 0.34 0.92 3.3e−11 0.34 0.92 3.3e−11

SARSA+DBN 0.04 0.01 1.7e−05 0.02 0.99 3.4e−20 0.06 0.98 4.7e−14 0.04 0.99 1.2e−23 0.04 0.01 7.8e−26

Q-learning+DBN 0.01 0.99 6.9e−38 0.03 0.97 7.1e−16 0.09 0.94 2.1e−12 0.04 0.99 1.1e−23 0.02 0.99 5.3e−33

S2

SARSA 0.65 0.36 0.0097 0.75 0.52 1.3e−04 0.47 0.42 0.0029 1.23 0.55 4.3e−05 1.20 0.65 3.5e−07

Q-learning 1.09 0.17 0.2460 0.98 0.30 0.0358 0.40 0.81 1.4e−12 1.28 0.52 1.4e−04 1.55 0.41 0.0038

SARSA+DBN 0.38 0.65 1.3e−05 0.37 0.98 0.39 0.37 0.79 0.0014 0.46 0.81 0.0023 0.47 0.67 2.6e−05

Q-learning+DBN 0.33 0.84 6.2e−14 0.37 0.08 0.5539 0.29 0.74 1.7e−09 0.41 0.50 2.3e−04 0.66 0.26 0.0671

S3

SARSA 1.27 0.12 0.0877 1.73 0.21 0.0026 1.36 0.27 1.1e−04 1.59 0.29 2.6e−05 1.33 0.26 2.3e−04

Q-learning 1.39 0.09 0.2052 1.10 0.24 7.6e−04 0.83 0.23 0.0012 1.47 0.25 3.1e−04 1.61 0.06 0.3589

SARSA+DBN 0.69 0.90 1.4e−05 1.31 0.99 0.88 0.55 0.98 0.1623 1.33 0.99 0.7896 1.18 0.99 0.5244

Q-learning+DBN 0.62 0.38 4.8e−08 0.98 0.11 0.0978 0.58 0.30 2.1e−05 1.26 0.12 0.0950 1.30 0.03 0.5932

S4

SARSA 1.55 0.09 0.0128 3.70 −0.25 1.3e−11 2.39 0.07 0.0361 2.05 0.07 0.0397 1.89 0.16 1.0e−05

Q-learning 1.41 0.15 6.1e−05 1.24 0.07 0.0404 1.14 −0.04 0.2000 1.67 0.08 0.0309 1.71 0.02 0.4621

SARSA+DBN 1.14 0.98 2.2e−04 1.45 0.99 0.29 1.17 0.98 0.0025 1.33 0.99 −8.51e−5 1.21 0.99 0.1347

Q-learning+DBN 0.98 0.34 2.5e−20 1.40 0.01 0.8960 0.87 −0.13 5.2e−04 1.52 0.11 0.0022 1.55 0.17 3.2e−06

S5

SARSA 1.01 −0.08 0.5419 2.61 −0.15 0.2484 2.04 −0.20 0.1298 2.16 −0.31 0.0197 1.95 −0.29 0.0276

Q-learning 0.72 0.30 0.0208 2.28 −0.20 0.1334 1.81 −0.20 0.1267 1.83 −0.33 0.0125 1.59 −0.08 0.5542

SARSA+DBN 0.05 0.65 1.4e−08 0.08 0.66 5.3e−09 0.10 0.74 6.4e−12 0.11 0.89 6.02e−22 0.24 0.48 8.7e−05

Q-learning+DBN 0.03 0.02 0.8245 0.02 0.37 0.0031 0.02 0.37 0.0028 0.03 0.22 0.0873 0.03 0.06 0.6315

(8)

Fig.7. Overviewoferrorsobtained,where(a)usingGdatasetwepredictR,R(ToU),RHandRH(ToU)values.(b)UsingRwepredictRHand(c)usingR(ToU)wepredictthe RH(ToU).Fourmethodsareused:SARSA,Q-learning,SARSAwithDBNextensionandQ-learningwithDBNextension.

6.2.1. Commercialtoresidentialtransfer

In this setof experiments, we useCommercial,Industrial & LightingdatatotraintheDBNmodel.Furthermore,we usethe trainedDBNmodeltopredictfourdifferenttypesofunseen res-identialbuildingconsumption, such asresidential withelectric heatandwithoutelectricheat,andresidentialelectricconsumption withTOUpricing,asitisshowninTable3andFig.7.Theanalysisof thedifferenttypesofresidentialbuildingsadvancestheinsighton thegeneralizationcapabilitiesofourproposedmethodand stud-iesitsrobustnessbytestingthebehaviourondifferentprobability distributions(seeFig.4).

6.2.2. Residentialtoresidentialtransfer

Duringtheseexperiments welearnand transferonetype of residentialbuildingenergydemandprofiletoanothertypeof res-identialbuildingwithdifferentcharacteristics.Moreexactly,we usedtotrainthelearningalgorithm(i)Aresidentialbuildingprofile withoutelectricheat(R),and(ii)Aresidentialbuildingwith elec-tricheat(RH).Thepredictionresultsofthesetwobuildingmodels canbeseeninTables4and5.

InTables3–5,theRMSEvaluesshowagoodagreementbetween therealvaluesandthemodelestimatedvalues.Inaddition,the confidenceinourresultsisformallydeterminednotjust bythe

Table4

Predictionofresidentialbuildingwithelectricheatconsumptionusingdata col-lectedfromaresidentialwithnon-electricheatbuilding.

Methods RMSE R p-Value

Scenario1

SARSA 0.42 0.88 4.2e−09 Q-learning 0.44 0.87 1.1e−08 SARSAwithDBN 0.42 0.88 5.8e−09 Q-learningwithDBN 0.03 0.99 7.2e−27

Scenario2

SARSA 2.15 −0.18 0.2175 Q-learning 1.93 −0.10 0.4802 SARSAwithDBN 1.25 0.61 3.7e−06 Q-learningwithDBN 0.5 0.64 9.2e−07

Scenario3 SARSA 2.63 −0.27 8.6e−05 Q-learning 2.57 −0.18 0.0094 SARSAwithDBN 2.67 0.13 0.06 Q-learningwithDBN 0.69 0.09 0.1863 Scenario4 SARSA 2.23 0.04 0.2504 Q-learning 2.14 0.11 0.0015 SARSAwithDBN 1.97 −0.09 0.01 Q-learningwithDBN 0.71 −0.10 0.0072 Scenario5 SARSA 0.74 0.62 2.8e−07 Q-learning 0.57 0.62 2.1e−07 SARSAwithDBN 0.03 0.43 4.8e−04 Q-learningwithDBN 0.02 0.51 0.0259

RMSEvalues,butalsobythecorrelationcoefficientandthe num-berofstepspredictedintothefuture.Forexample,ifthereisjust onestepahead,suchasinScenario1,thenthePearsoncorrelation coefficientneedstobeverycloseto1or−1inordertoconsiderit statisticallysignificant.However,inthecaseofScenarios3and4, wherethepredictionismadeon168and672futuresteps,a coef-ficientcloseto0canstillbeconsideredhighlysignificant.More discussionsabouttherobustnessofthecorrelationcoefficientcan befoundin[42].Still,theinaccuracywasreflectedinanegative correlationcoefficientin24%oftheexperimentswhenweused thesimpleformoftheSARSAandQ-learningmethods.By con-trast,ourtwoimprovedapproaches,SARSAwithDBNextension andQ-learningwithDBNextension,showsanegativecorrelation injust4%ofthecases.Overall,theKolmogorov–Smirnovtestin mostcasesconfirmsthatthedatadoindeedcomefromdifferent distributions.Thisispartiallyduetotheuniquecharacteristicsof thisdataset,givenbythepresenceofahighlynon-linearprofile shapeandlargeoutliervalues,asseeninFig.4.Allofthese observa-tions,giveastrongargumentforemployingamorecomprehensive examinationofthedistributionsusedinthetransferlearning. Nev-ertheless,theresultspresentedinTables 3–5,demonstratethat theenergy predictionaccuraciesin terms ofRMSEsignificantly improvein91.42%ofthecasesafterusingaDBNforautomatically

Table5

Predictionofresidentialbuildingconsumptionwithelectricheatusingdata col-lectedfromaresidentialwithnon-electricheatbuilding,bohtwithToUpricing.

Methods RMSE R p-Value

Scenario1

SARSA 0.50 0.83 1.8e−07 Q-learning 0.16 0.99 1.7e−25 SARSAwithDBN 0.28 0.94 6.1e−13 Q-learningwithDBN 0.24 0.99 2.0e−21

Scenario2

SARSA 1.69 0.33 0.0200 Q-learning 0.91 0.83 3.0e−13 SARSAwithDBN 1.42 0.55 4.09e−05 Q-learningwithDBN 1.18 0.77 1.0e−10

Scenario3

SARSA 2.69 −0.11 0.1205 Q-learning 1.65 0.17 0.0031 SARSAwithDBN 1.98 0.27 1.2e−04 Q-learningwithDBN 1.55 0.21 0.0167

Scenario4

SARSA 2.45 −0.01 0.9477 Q-learning 1.62 0.17 3.7e−06 SARSAwithDBN 2.38 0.24 4.7e−11 Q-learningwithDBN 1.60 0.28 3.3e−14

Scenario5

SARSA 0.67 0.19 0.0014 Q-learning 0.41 0.47 2.0e−04 SARSAwithDBN 0.03 0.34 0.006 Q-learningwithDBN 0.02 0.42 6.4e−04

(9)

computinghigh-levelfeaturesfromtheunlabelleddata,as com-paredtothesituationwhenthecounterpartRLmethodsareused withoutanyDBNextension.

Notably,theproposedapproachisalsosuitablewhenwehave accesstohistoricaldata.Inthescopeofthisargument,theresult obtainedinfirstcolumnofTable3areexpectedtobeequivalent withtheresultsobtainwithanysupervisedlearningmethods,such asANNorSVM like.Nevertheless,theRMSEaccuracyobtained usingtheQ-learning algorithmwiththeDBNextension forthe long-termforecastingofbuildingsenergyconsumption(Scenario 5)isgreaterthan90%inalltheexperimentsthanQ-learning with-outDBNextension.Forexample,inTable4theRMSEis0.02ifwe useQ-learningwithDBNversus0.57forQ-learningwithoutDBN, yieldingtoa96.5%improvedRMSEaccuracy.

7. Discussionandconclusion

In this paper, a new paradigm on building energy predic-tion has been introduced, which does not require historical data from the specific building under scrutiny. In a uni-fied approach, we can successfully learn a building model by including a generalization of the state space domain, then we transfer it across other building. The contribution is two-fold.First, we present a Deep Belief Networkfor automatically featureextractionandsecond,weextendtwostandard reinforce-ment learning algorithms able to perform knowledge transfer between domains (buildings models), namely State-Action-Reward-State-Action(SARSA)algorithmandQ-learningalgorithm byincorporatingthestatesestimatedwiththeDeepBelief Net-work.Thenovelproposedmachinelearningmethodsforenergy predictionareevaluatedoverdifferenttimehorizonswith differ-enttimeresolutionsusingrealdata.Notably,itcanbeobserved thatasthepredictionhorizonisincreasing,SARSAandQ-learning extensions by including a DBN for states estimation seem to bemore robust and theirprediction error is approximately 20 timeslower that of theirunextended versions.The strength of thismethodisgivenbytheDBNgeneralizationcapabilitiesover theunderlying statespace for a new buildingand the robust-nesses to invariance in the states representation. However, a forthcomingdeep investigation canbe doneat different Smart Gridlevelsin order tohelp thetransitionto thefuture energy system.

Acknowledgements

ThisresearchhasbeenfundedbyNLEnterpriseAgencyRVO.nl under the TKI Switch2SmartGrids project of Dutch Top Sector Energy.

References

[1]L.Yang,H.Yan,J.C.Lam,Thermalcomfortandbuildingenergyconsumption implications–areview,Appl.Energy115(2014)164–173,http://dx.doi.org/ 10.1016/j.apenergy.2013.10.062.

[2]E.A.Bakirtzis,C.K.Simoglou,P.N.Biskas,D.P.Labridis,A.G.Bakirtzis, Comparisonofadvancedpowersystemoperationsmodelsforlarge-scale renewableintegration,Electr.PowerSyst.Res.128(2015)90–99,http://dx. doi.org/10.1016/jepsr.2015.06.025.

[3]A.Costa,M.M.Keane,J.I.Torrens,E.Corry,Buildingoperationandenergy performance:monitoring,analysisandoptimisationtoolkit,Appl.Energy101 (2013)310–316,http://dx.doi.org/10.1016/j.apenergy.2011.10.037. [4]M.Simoes,R.Roche,E.Kyriakides,S.Suryanarayanan,B.Blunier,K.McBee,P.

Nguyen,P.Ribeiro,A.Miraoui,Acomparisonofsmartgridtechnologiesand progressesinEuropeandtheU.S.,IEEETrans.Ind.Appl.48(4)(2012) 1154–1162,http://dx.doi.org/10.1109/TIA.2012.2199730.

[5]X.Li,D.Gong,L.Li,C.Sun,NextdayloadforecastingusingSVM,in:J.Wang, X.-F.Liao,Z.Yi(Eds.),AdvancesinNeuralNetworks–ISNN2005,Vol.3498of LectureNotesinComputerScience,2005.

[6]W.-C.Hong,Electricloadforecastingbysupportvectormodel,Appl.Math. Model.33(5)(2009)2444–2454.

[7]S.Wong,K.K.Wan,T.N.Lam,Artificialneuralnetworksforenergyanalysisof officebuildingswithdaylighting,Appl.Energy87(2)(2010)

551–557.

[8]S.A.Kalogirou,Artificialneuralnetworksinenergyapplicationsinbuildings, Int.J.Low-CarbonTechnol.1(3)(2006)201–216.

[9]T.Mestekemper,G.Kauermann,M.S.Smith,Acomparisonofperiodic autoregressiveanddynamicfactormodelsinintradayenergydemand forecasting,Int.J.Forecast.29(1)(2013)1–12.

[10]M.Wytock,J.Z.Kolter,Large-scaleprobabilisticforecastinginenergysystems usingsparsegaussianconditionalrandomfields,in:Proceedingsofthe52nd IEEEConferenceonDecisionandControl,CDC2013,December10–13,2013, Firenze,Italy,2013,pp.1019–1024.

[11]E.Mocanu,P.H.Nguyen,M.Gibescu,W.Kling,Comparisonofmachine learningmethodsforestimatingenergyconsumptioninbuildings,in: Proceedingsofthe13thInternationalConferenceonProbabilisticMethods AppliedtoPowerSystems(PMAPS),July10–13,Durham,UK,2014. [12]M.Aydinalp-Koksal,V.I.Ugursal,Comparisonofneuralnetwork,conditional

demandanalysis,andengineeringapproachesformodelingend-useenergy consumptionintheresidentialsector,Appl.Energy85(4)(2008) 271–296.

[13]L.Xuemei,D.Lixing,L.Jinhu,X.Gang,L.Jibin,Anovelhybridapproachof KPCAandSVMforbuildingcoolingloadprediction,in:KnowledgeDiscovery andDataMining,2010.ThirdInternationalConferenceonWKDD‘10,2010. [14]L.Suganthi,A.A.Samuel,Energymodelsfordemandforecasting–areview,

Renew.Sustain.EnergyRev.16(2)(2012)1223–1240.

[15]M.Krarti,EnergyAuditofBuildingSystems:AnEngineeringApproach, MechanicalandAerospaceEngineeringSeries,2nded.,Taylor&Francis,2012. [16]A.I.Dounis,Artificialintelligenceforenergyconservationinbuildings,Adv.

Build.EnergyRes.4(1)(2010)267–299.

[17]A.Foucquier,S.Robert,F.Suard,L.Stéphan,A.Jay,Stateoftheartinbuilding modellingandenergyperformancesprediction:areview,Renew.Sustain. EnergyRev.23(0)(2013)272–288.

[18]H.xiangZhao,F.Magoulès,Areviewonthepredictionofbuildingenergy consumption,Renew.Sustain.EnergyRev.16(6)(2012)

3586–3592.

[19]D.Ernst,M.Glavic,F.Capitanescu,L.Wehenkel,Reinforcementlearning versusmodelpredictivecontrol:acomparisononapowersystemproblem, IEEETrans.Syst.ManCybern.B:Cybern.39(2)(2009)

517–529.

[20]R.Crites,A.Barto,Improvingelevatorperformanceusingreinforcement learning,in:in:AdvancesinNeuralInformationProcessingSystems,vol.8, MITPress,1996,pp.1017–1023.

[21]H.Ammar,D.Mocanu,M.Taylor,K.Driessens,K.Tuyls,G.Weiss, Automaticallymappedtransferbetweenreinforcementlearningtasksvia three-wayrestrictedBoltzmannmachines,in:in:MachineLearningand KnowledgeDiscoveryinDatabases,Vol.8189ofLectureNotesinComputer Science,2013,pp.449–464.

[22]B.Sallans,G.E.Hinton,Reinforcementlearningwithfactoredstatesand actions,J.Mach.Learn.Res.5(2004)1063–1088.

[23]N.Heess,D.Silver,Y.W.Teh,Actor-criticreinforcementlearningwith energy-basedpolicies,in:JMLRWorkshopandConferenceProceedings: EWRL2012,2012.

[24]V.Mnih,K.Kavukcuoglu,D.Silver,A.A.Rusu,J.Veness,M.G.Bellemare,A. Graves,M.Riedmiller,A.K.Fidjeland,G.Ostrovski,S.Petersen,C.Beattie,A. Sadik,I.Antonoglou,H.King,D.Kumaran,D.Wierstra,S.Legg,D.Hassabis, Human-levelcontrolthroughdeepreinforcementlearning,Nature518 (7540)(2015)529–533.

[25]R.S.Sutton,A.G.Barto,IntroductiontoReinforcementLearning,1sted.,MIT Press,Cambridge,MA,USA,1998.

[26]C.J.C.H.Watkins,P.Dayan,Technicalnote:Q-learning,J.Mach.Learn.Res.8 (3-4)(1992)279–292.

[27]Y.Bengio,LearningdeeparchitecturesforAI,Found.TrendsMach.Learn.2(1) (2009)1–127,alsopublishedasabook.NowPublishers,2009.

[28]M.Wiering,M.vanOtterlo,ReinforcementLearning:State-of-the-Art, Springer,2012.

[29]L.Busoniu,D.Ernst,B.DeSchutter,R.Babuska,Approximatereinforcement learning:anoverview,in:2011IEEESymposiumonAdaptiveDynamic ProgrammingAndReinforcementLearning(ADPRL),2011,pp.1–8. [30]M.L.Puterman,MarkovDecisionProcesses:DiscreteStochasticDynamic

Programming,1sted.,JohnWiley&Sons,Inc.,NewYork,NY,USA,1994. [31]M.Castronovo,F.Maes,R.Fonteneau,D.Ernst,Learning

exploration/exploitationstrategiesforsingletrajectoryreinforcement learning,in:in:EWRL,Vol.24ofJMLRProceedings,2012,pp.1–10. [32]G.Hinton,R.Salakhutdinov,Reducingthedimensionalityofdatawithneural

networks,Science313(5786)(2006)504–507.

[33]G.E.Hinton,S.Osindero,Y.-W.Teh,Afastlearningalgorithmfordeepbelief nets,NeuralComput.18(7)(2006)1527–1554.

[34]G.W.Taylor,G.E.Hinton,S.T.Roweis,Twodistributed-statemodelsfor generatinghigh-dimensionaltimeseries,J.Mach.Learn.Res.12(2011) 1025–1068.

[35]L.J.vanderMaaten,E.O.Postma,H.J.vandenHerik,Dimensionalityreduction: acomparativereview,J.Mach.Learn.Res.10(1-41)(2009)66–71.

[36]G.E.Hinton,S.Osindero,Afastlearningalgorithmfordeepbeliefnets,Neural Comput.18(2006)2006.

[37]G.E.Hinton,Trainingproductsofexpertsbyminimizingcontrastive divergence,NeuralComput.14(8)(2002)1771–1800.

(10)

[38]R.Salakhutdinov,LearningdeepboltzmannmachinesusingadaptiveMCMC, in:Proceedingsofthe27thInternationalConferenceonMachineLearning (ICML-10),Haifa,Israel,2010,pp.943–950.

[39]B.G.E.Company.http://www.supplier.bge.com(lastvisit17.10.15). [40]F.J.Massey,TheKolmogorov–Smirnovtestforgoodnessoffit,J.Am.Stat.

Assoc.46(253)(1951)68–78.

[41]G.E.Hinton,ApracticalguidetotrainingrestrictedBoltzmannmachines,in: in:NeuralNetworks:TricksoftheTrade,LectureNotesinComputerScience, 2nded.,Springer,2012,pp.599–619.

[42]J.R.K.Susan,J.Devlin,R.Gnanadesikan,Robustestimationandoutlier detectionwithcorrelationcoefficients,Biometrika62(3)(1975)531–545.

Referenties

GERELATEERDE DOCUMENTEN

The solution uses deep Q- learning to process the color and depth images and generate a -greedy policy used to define the robot action.. The Q-values are estimated using

Op gemaaide stukken waar de ou­ de heide sleehte hergroei vertoont worden de stukken waar geen hei­ de meer staat geplagd.. Hierdoor ontstaat een lappendeken van

When P was adequate in the diet, 5μg vitamin D3/kg appeared to be sufficient for maximal weight gain which suggests that at adequate levels of Ca and P the requirement for the

 Transformational leadership style: In terms of this leadership style, leaders have the ability to lead an organisation by combining leadership components such as

Although there are limited studies on the number of players involved in the breakdown during successful matches in rugby sevens, rugby union has shown attacking teams to be

Download date: 02.. Ontwerp van een watersensor I1-verslag van A.H.. Licht van Philips is enige jaren geleden een miniatuur drukmeter voor water damp ontwikkeld. Op

Bij de vertaling van zwelling en krimp, onder invloed van temperatuur en vocht, in een inwendige belastingstoestand wordt aangenomen dat er een evenredig verband

13 De Psycat is een valide vroegsignaleringsinstrument voor ouders van kinderen van 7-11 jaar, dit instrument dient geschikt te worden gemaakt voor toepassing in de JGZ... De