To be precise, it considers the combination of a photovoltaic system and a heat pump which stores the produced electricity in the form of heat

(1)

Master’s Thesis Operations Research

Scheduling a heatpump under stochastic power supply and demand

A.F. Konst, (s2157187) January 15, 2016

Abstract

The Paris Agreement of December 12, 2015¹, states that countries must stimulate the use of renewable energy. However, problems arise if this renewable energy is added to the current grid, since the renewable energy is often highly volatile and uncertain.

For this reason so-called smart grids are introduced. The idea of smart grids is to shift demand during period with high demand to periods of low demand. Smart grids rely on the law of large numbers for which these systems are only applicable on a society level. Therefore, smart grids are not implemented that easily. This paper considers a smart solution for the efficient use of renewable energy on a household level. To be precise, it considers the combination of a photovoltaic system and a heat pump which stores the produced electricity in the form of heat. The aim is to schedule the heat pump, i.e., to determine when to switch the heat pump on or off, in such a way that it uses less electricity from the grid and more renewable energy from the photovoltaic system. The proposed algorithm, the simulation algorithm, takes the future into account and therefore deals with the stochasticity in both demand and supply for energy. It relies heavily on the so-called approximated dynamic programming (ADP) algorithm, that schedules the heat pump under the assumption that future supply and demand for energy are known. The ADP algorithm gives solutions that are slightly worse than the optimal solution in a deterministic setting and the simulation algorithm outperforms a greedy algorithm saving a household e239.46 on the energy bill for the months June to December of one year.

1“Framework Convention on Climate Change”. United Nations FCCC Int. United Nations. 12 December 2015.

(2)

Master’s Thesis Operations Research Supervisors

dr. W. Romeijnders University of Groningen University of Groningen drs. M.J. Konsman TNO

W.E. Wijbrandi, MSc TNO

(3)

1. Introduction

The world’s stock of fossil fuels is decreasing rapidly and the use of these fuels causes pollution that negatively influences the earth and its inhabitants. This problem motivates researchers to develop more eco-friendly sources of power, the so-called renewable energy resources. Examples of renewables are wind turbines for wind power, the hydropumps for hydropower and photovoltaic systems for solar power.

Many of the renewable energy resources depend on weather conditions and as a result they have a volatile power output that causes problems in the grid. In the grid, demand must always equal supply. Whenever supply exceeds demand, the grid will be overloaded and will shut down.

Vice versa, it is forbidden by law for an energy supplier to not meet demand. A solution to the volatility in power output is to store surplus energy and use this energy later on. However, storing energy is inefficient and expensive. Another idea is to adjust the demand instead of the supply. This is the idea behind the so-called smart grids: create flexibility by shifting demand for devices such as electric vehicles and washing machines to a point later in time. These smart grids rely on the law of large numbers, i.e., the more households that are involved, the more stable demand for electricity and the less probability of extreme values. However, integrating such a system in society takes time. Therefore, there is room and a need for a smart system along side the upcoming smart grid system that works on a household level.

To decrease the volatility in the grid caused by all renewable energy produced by households, the German government has introduced a subsidy that is rewarded to citizens who directly consume energy produced by themselves. Furthermore, the price for selling residual self-produced energy to the grid is lower than the price for purchasing electricity from the grid such that storing energy on the grid is not beneficial. Hence, a single household could lower their energy bill by inaugurating a renewable device and by optimizing the way they use their electric devices.

Since storing the energy on the grid is not beneficial, one could look for other methods to store energy. The most energy consuming demand in a household is the demand for heat. Heat, in contrast to electricity, can be stored efficiently and can be used to store energy. In contrast to the smart grid, the flexibility does not lie in shifting demand, but in storing the energy in the form of heat which can be consumed later on.

To create heat, a so-called heat pump (HP) is used. A heat pump is an electric device that extracts heat from outside air into a water heating circuit. In general, a heat pump has an efficiency between 300% to 400%. An efficiency of 300% means that for each electric Joule of input, the heat pump creates three Joules of heat as output. This phenomenon is captured in the coefficient of performance (COP). A heat pump has a minimum period of time it must be on to run at its full potential (i.e., to reach the maximum level of COP) and a minimum period of time it must be off to cool down properly and avoid wear.

The problem discussed in this paper considers a household that has installed both a solar

(4)

photovoltaic system (PV) as well as an air-to-water heat pump (HP). The aim is to efficiently schedule the use of the heat pump, i.e., to determine when to switch the HP on or off, by using the least possible amount of electricity from the grid whilst ensuring demand for heat is satisfied.

This paper dicusses five different algorithms. First, a greedy approach based on the Power- Matcher of TNO, which turns the heatpump on as soon as solar energy is available. However, it does not take into account that the heatpump must be on for a minimum period of time.

Therefore, situations arise where the heat pump is on even if there is no solar energy available.

Hence, the future supply of solar energy must be taken into account. The main algorithm, introduced in this paper and called the simulation algorithm, will schedule the heat pump while taking the uncertain future into account. The simulation algorithm is based on an algorithm that assumes all future data is known and will therefore be referred to as the deterministic algorithm or the Approximated Dynamic Programming algorithm (ADP). The simulation algorithm in the stochastic setting will be compared to a Markov Decision Process (MDP) based algorithm and the greedy algorithm. The underlying ADP algorithm will be compared to an optimal algorithm in a deterministic setting and the greedy algorithm as well, since the greedy algorithm is a real time algorithm and therefore suitable for both the stochastic and deterministic setting.

First, a literature review will be provided in Section 2. In Section 3, the problem will be defined followed by a description of the deterministic algorithm or ADP algorithm in Section 4.

In Section 5, the simulation algorithm will be defined and in Section 6 results of both the ADP algorithm and the simulation algorithm will be discussed. Section 7 concludes the paper.

2. Literature review

Cao, Hasan, and Siren (2014) introduce the combination of a heat pump and renewable energy, and state that research is needed to get this system to its full potential. However, they introduce this system in an office building, which has a demand distribution that differs from a residential building. The research done on a residential level is limited to the paper of Vialetto, Noro, and Rokni (2015). However, they make use of solid oxide fuel cells instead of a PV system.

Oxide fuel cells make use of natural gas and are therefore not dependent on weather conditions.

Furthermore, the purpose of our paper is to make use of renewable energy in an efficient way for which this paper of Vialetto et al. (2015) cannot be used. Franco and Fantozzi (2016) did investigate the perspective of a mutual interaction of a PV system and a heat pump by an experimental analysis. In their conclusion they state:

“The promotion of self-consumption strategies and in general the use of heat pumps in connection with PV plants represents surely the new frontier for the future development of systems based on renewable energy.”

To the best of our knowledge, there are no papers about the decision making algorithm of a problem with a heat pump and a renewable energy source dependent on the weather. In the literature about smart grids, it is stated by Bakker (2011) that smart grid solutions can roughly

(5)

be divided into two groups: agent based market systems, such as the PowerMatcher by TNO researcher Kok (2013), and discrete mathematical optimizations, such as Triana by Bakker (2011).

The idea of PowerMatcher is to let every household place a bid to a fictional market. An agent collects all the bids, observes the supply in renewable energy and returns a price for which demand equals supply. All households with a bid lower than the price returned by the agent will not consume electricity, and all others do. The strength of the PowerMatcher lies in its scalability and its simplicity. It makes use of the fact that the demand for electricity for a whole country does not contain huge outliers, is therefore balanced and can be better forecasted, i.e., the PowerMatcher relies on the law of large numbers. The demand of a single household does have these huge outliers and differences in demand from one minute to the other could be very large. This outlier of a single household will, in return, not be noticable on a bigger scale.

If the PowerMatcher would operate in a single household with a HP such as our problem, it will only real time look at the demand for heat disregarding demand of the future. Furthermore, it will turn the HP on when there is solar energy available disregarding the possibility that this supply of energy could only be a peak and a cloud could appear in the next minute leading to the use of electricity from the grid and that is not the idea of an eco-friendly smart system. The PowerMatcher only operates real time and this causes problems on the scale of a household, where demand and supply are very unpredictable. The algorithm introduced in this paper is compared to the performance of the PowerMatcher. The decisions the PowerMatcher makes on a household level, will be referred to as the greedy algorithm.

Triana does take future demand and supply for energy into account by the use of dynamic programming. It sends, based on a forecast of supply, temporary prices for energy for the next time period to all households in the Triana system. Households optimize based on these prices their consumption and will return this to Triana. Triana again sends, based on forecasts of supply and the information from the households, temporary prices for energy for the next time period to all households and this will continue until demand equals supply at every point in time for the next time period. Although Triana does take the future demand and supply into account, it relies on the law of large numbers and is therefore not directly applicable to our problem.

A smart grid system that does take the future into account and in fact only works on a smaller scale is the system discussed by Str¨ohle, Gerding, de Weerdt, Stein, and Robu (2014). The problem that is discussed makes a schedule for different devices which are currently demanding electricity. This schedule is based on scenarios, and devices that are frequently scheduled in the beginning of these different scenarios will be turned on and others will have to wait. For the scheduling of the deterministic scenarios, Str¨ohle et al. (2014) use a greedy algorithm to fasten the actual algorithm. Note that for the different scenarios the problem is deterministic and the scheduling algorithm will be referred to as a deterministic algorithm. This idea of scenarios will be used in this paper, referred to as the simulation algorithm, and the deterministic algorithm will be replaced by a dynamic programming as introduced by Bellman (1957). The choice for

(6)

dynamic programming is based on the structure of the problem, since the problem contains decision epochs and the number of decision epochs can be easily limited.

3. Problem Formulation

We introduce a household, having a heat pump (HP), a buffer for warm water, and a solar photovoltaic system (PV). Time is considered to be discretized and at each point in time, t, a decision has to be made on whether to turn the HP on or off. At time t, the household has a stochastic demand for warm water in the form of tapwater and central heating denoted by T W_t and CH_t, respectively. Furthermore, besides the HP, there are other electric devices that demand electricity; the amount of demanded electricity is denoted by the random variable E_t. The supply of renewable energy from the PV is stochastic as well and will be denoted as P V_t for each time period t.

The HP uses a buffer to store heat water. The temperature in this buffer is limited by a maximum temperature T_max and a minimum temperature T_min that are expressed in Joules.

The HP cannot be turned off when it has been on for less than µ_on time units and cannot be turned on when it has been off for less than µ_offtime units. When the HP is on, it requires W amount of electric power. The coefficient of performance (COP) resembles the ratio between the amount of heat in Joules provided to the power W required and is denoted as β. Furthermore, after the HP has been on for µ_on time units, indicated by the indicator variable y_t, the HP runs with higher efficiency for which the COP increases by λ × 100%. Let x_t be a binary variable that is 1 if the HP is on in period t and 0 otherwise. The temperature of the buffer at time t + 1 is defined as

T_t+1= (1 − c)T_t− T W_t− CH_t+ (1 + λy_t) · β · W · x_t, (1) with

y_t=









 1 if

t−1

X

i=t−µ_on

x_i = µ_on, 0 otherwise,

and 0 ≤ c ≤ 1 is the cooling parameter denoting the adjustment of the temperature in the buffer to the temperature of its environment. The cooling relationship is assumed to be linear since the buffer is well isolated. The simulation algorithm, to be defined in Section 5, will determine the values of x_t for t ∈ {1, . . . , H} with H denoting the fixed time horizon.

To calculate the associated costs per time unit, we introduce three different prices for electricity as are currently active in Germany. The price of purchasing electricity from the grid is denoted as p_dand the price for selling electricity to the grid is p_s. Remarkably, p_s< p_dfor which self-produced electricity is worth less than electricity produced by energy companies. By setting p_slower than p_dthe German government stimulates the direct usage of self-produced electricity rather than storing the electricity on the grid. To stimulate this even further, a subsidy, p_u,

(7)

is introduced for every amount of self-produced electricity that is consumed immediately. The prices p_u, p_s and p_d are per Joule × time unit. Another cost associated to the problem is the costs of wear and tear by turning the HP on and off. Since the number of times the HP is turned on is equal to the number of times the HP is turned off, w ≥ 0 is a fixed wear costs added when the HP is turned on. The profit at time t equals,

Π_t(x_t) = p_umin {P V_t, E_t+ W · x_t} + p_s[P V_t− E_t− W · x_t]⁺

− p_d[P V_t− E_t− W · x_t]⁻+ x_t(1 − x_t−1)w, (2) with p_u, p_s, p_d> 0 and p_s< p_d. The first term in (2) describes the profit associated with the self-produced electricity that is immediately used, i.e., the sum of the electricity used by other devices E_tand the power used by the HP restricted by the total amount of self-produced energy that is available at time t. The second term describes the amount of self-produced energy that is not consumed. The third term constitute the costs in the case that the self-produced energy is not enough to cover demand for which this term constitutes the shortage that will be purchased from the grid. Lastly, whenever x_t = 1 and x_t−1 = 0, i.e., when the HP is turned on, a fixed wear costs is included.

4. Deterministic Algorithm

For the deterministic version of the problem, it is assumed that all stochastic variables T W_t, CH_t, E_tand P V_tare known in advance. For each moment in time, a decision has to be made on whether to turn the HP on or off, i.e., determine the value of x_tfor all t, while taking constraints on the buffer and the HP into account as described in Section 3.

4.1. Approximated Dynamic Programming

The restrictions on the buffer (the temperature should be between T_min and T_max) and on the HP (at least µ_on time units on and at least µ_off time units off) limit the number of possible solutions in. To fully exploit this advantage, we make use of a Dynamic programming. To speed up the algorihm, not all possible solutions will be taken into account for which our heuristic will be denoted as the Approximated Dynamic Programming (ADP) algorithm. Dynamic programming is a method that divides the problem into subproblems. Our problem is divided into decision epochs that are moments in time for which the algorithm must determine to put the HP on or off. The heuristic selection of which decision epochs to evaluate and which not are further explained in Section 4.1.3.

The decision epochs are visualized in Figure 2. Each circle denotes a moment in time t ∈ {1, . . . , H} and therefore a decision epoch since the value of x_t must be determined in that point. Let each decision epoch have information about the current state of the problem. A state is a unique characterization of all that is important in a state of the problem that is modelled. A state in our problem is defined by (T, t) ∈ S which is a pair of the temperature in the buffer, T , and the point in time t in the state space S.

(8)

Due to the constraints, in many states (T, t) there is only one feasible value for x_tand therefore no decision needs to be made. Let S¹ and S⁰ be sets of all the states for which x_tmust be equal to 1 or 0, respectively. The extended definition of these sets can be found in Section 4.1.6. The ADP algorithm will, to decrease the number of decision epochs, only consider decision epochs with values for t that satisfy (3), i.e., states the proposed algorithm will encounter are such that in the previous period the HP was on and the constraint that the HP must be on for at least µ_on time units is satisfied. This decreases the number of decision epochs rapidly since it is assumed that the HP is off for the greater part of the day.

t−1

X

i=t−µon

x_i ≥ µ_on (3)

4.1.1. Intuition

Figure 2 serves to give an intuition on what the ADP algorithm does. Assume that the ADP algorithm is at the decision epoch with t = t₀ and T = T₀. Moreover, assume that the current state (T₀, t₀) /∈ S¹∪ S⁰ and that (3) holds.

Figure 2: Possible scenario of a subproblem of the ADP algorithm with µ_on = 4 and µ_off= 3. Each circle indicates a decision epoch at which the HP is on or off. It is assumed that the HP has been on the previous period and, if left off entirely, the red dotted line indicates the point where the temperature in the buffer would be below the minimum temperature, denoted as the critical point or τ_max. The point τ_min is the point where the HP is able to be turned on taking into account µ_off. If the HP is left on in the next period, the algorithm repeats itself at t₁. Otherwise, the HP will be turned off and turned on again at τ ∈ {τ_min, . . . , τ_max}, which will be the point with the most consecutive surplus solar energy indicated by the blue rectangle. Hence, the algorithm will be called upon again at either t₁ or t₁₂.

At t₀the ADP has two choices: either leave the HP on (x_t₁ = 1) or to turn the HP off (x_t₁ = 0).

In the first case, the next decision epoch the ADP encounters is at t₁. Since (3) holds for (T₀, t₀) and x_t₁ = 1 in this case, (3) holds for this decision epoch as well, i.e. the constraint that the HP must be on for at least µ_on time units is already satisfied and does not restrict the decision on x_t

1. In the second case, the HP is turned off. Since (3) should always hold, the ADP algorithm

(9)

determines the next point in time the HP will be turned on again. This point will be denoted by τ . For all points in time between t₀ and τ − 1, the HP will be off and for all point in time between τ and τ + µ_on− 1 the HP has no other choice than to be on, since the HP must be on for at least µ_on time units. Therefore the next decision epoch will be at τ + µ_on , with again (3) satisfied. At each new decision epoch the algorithm will again choose between leaving the HP on one more time unit or turning it off. Therefore the ADP algorithm can be seen as making jumps through time moving from one decision epoch to the other while always making sure (3) holds. Of course, with every jump, the temperature in the buffer must be updated according to (1).

4.1.2. Value function

To distinguish between the two cases of either moving to t + 1 or to τ + µ_on, the value function introduced by Bellman (1957) is used, which is given by

V (t) =











Π_t(1) + V (t + 1), if x_t= 1,

τ −1

X

i=t

Π_i(0) +

τ +µ_on−1

X

j=τ

Π_j(1) + V (τ + µ_on), if x_t= 0, (4)

where Π_t(x_t) is defined in (2). Hence the value function consists of observed profit when taking a certain action and future profit. The value of x_t which gives the highest value of V (t) will be chosen. Note that V (t + 1) and V (τ + µ_on) can only be obtained by recursively calling the algorithm. The last value, which is associated with the termination of the algorithm, will be discussed in Section 4.1.5.

4.1.3. Determination of τ

The point τ is the point where the ADP algorithm will turn the HP on again after x_t= 0. To determine its value, first a feasible set of points in time will be created to make sure that τ does not cause any constraint violation. The first feasible value for τ is τ_min = τ + µ_off since the HP must be off for at least µ_off time units. Furthermore, τ is also bound by a maximum value, denoted by τ_max, which is based on a constraint that the temperature in the buffer must be above T_min. When the HP is left off, the temperature in the buffer will decrease over time due to demand and cooling and will therefore at some point drop below the minimum temperature. Let τ_maxbe referred to as the critical point. The critical point is the last point in time for which leaving the HP off from t to τ_max will not cause T to drop below T_min. Hence the HP must have been on at least once before τ_max to cover for demand and not violate the minimum temperature. In Figure 2 the critical point is visualized as the red dotted line. The calculation of τ_max is stated in (5) where T_t is known at the current time t. The parameter φ in (5) will initially be zero and, only in the case of a considerable large amount of demand (i.e., T W_τ

max+1+ CH_τ

max+1 > β · W ), will τ_max be updated with the value of φ. The parameter φ indicates how many time units the critical point τ_max should lie earlier in time in the case that the amount of demand after τ_max is so great that the buffer must contain more heath on forehand. This will be further explained in Section 4.1.4. The parameter φ is initially zero to

(10)

calculate τ_maxafter which φ is calculated such as described in section 4.1.4 and finally τ_maxwill be updated with this value of φ.

τ_max= max

j=t,...,Hj : T_j > T_min − [φ]⁺, (5) with

T_j+1= (1 − c)T_j− T W_j− CH_j, ∀j ∈ {t, . . . , H}.

However, not all values between τ_minand τ_maxare feasible values for τ due to the constraint that the temperature is not allowed to exceed T_max. Let M ⊂ {τ_min, . . . , τ_max} contain values for τ which would violate the maximum temperature restriction, i.e. putting it on at i ∈ M would cause one of the values j ∈ {i, . . . , i + µ_on} to have a temperature T_j > T_max. The temperature in the buffer is decreasing over time for which only values of τ in the beginning of the interval are to be found in M. Hence M eliminates lower values of τ and therefore actually creates a new τ_min. The point τ can only be chosen from {τ_min, . . . , τ_max}\M.

Dynamic programming is an approach were all possible decisions are evaluated according to the value function in (4). This means that a true dynamic program would evaluate the value functions belonging to all τ ∈ {τ_min, . . . , τ_max}\M which are all feasible points to turn the HP on again. However, to speed the algorithm up, τ is heuristically selected beforehand from {τ_min, . . . , τ_max}\M. The τ that is chosen does not necessarily have to be the optimal τ , i.e., τ is not necessarily the point which gives the greatest value according to the value function.

The point τ is chosen based on the available amount of renewable energy. That is, τ is selected such that the HP will be on during the period of length µ_on in {τ_min, . . . , τ_max} which has the greatest amount of surplus solar energy when the electricity of other devices, E_t, is substracted.

In this way, in the short run the HP uses as much solar energy as possible. This calculation of τ equals,

τ = arg max_i







i+µon−1

X

j=i

[P V_j − E_j]⁺ ; i ∈ {τ_min, . . . , τ_max}\M







. (6)

Now if the HP is decided to be off for the first period, x_t, . . . , x_{τ −1}= 0 and x_τ, . . . , x_{τ +µ}

on−1 = 1.

The next decision epoch will be at τ + µ_on for which, given the values for x_t, . . . , x_{τ +µ}_on−1, (3) holds. To make things clear, take a look again at Figure 2. The values above the time units represent the surplus solar energy available, which is P V_t− E_t for t ∈ {t₀, . . . , t₁₆}. The maximum sum of surplus solar energy of four consecutive time units is indicated by the blue rectangle. Hence τ will be at t₉ and the next decision epoch at t₁₂ which is τ + µ_on.

4.1.4. Considerable large amount of demand

Whenever T W_τ_max₊₁+ CH_τ_max₊₁ > β · W , i.e., whenever the demand after the critical point τ_max is too much to be covered by the HP in one time period, the critical point τ_max must be shifted backwards in time. Let ¯φ be the number of consecutive periods with a demand higher than β · W and let φ be the number of periods the HP needs to be on to cover the demand in {τ_max+ 1, . . . , τ_max+ ¯φ}. Of course, there could be usable temperature left in the buffer at τ_max

(11)

i.e. T_max− T_min, where T_maxis calculated by (1) with x_t, . . . , x_τ_max = 0. This heat T_max− T_min can be substracted from the demand in {τ_max+ 1, . . . , τ_max+ ¯φ}. The number of time periods needed to cover this demand is φ and calculated as given in (7).

φ =





Pτ_max+ ¯φ

k=τ_max+1(T W_t+ CH_t) − [T_max− T_min] β · W





− ¯φ, (7)

where the last ¯φ indicates that the HP could be on in the period {τ_max+ 1, .., τ_max+ ¯φ} as well.

This value of φ will be used to update the value of τ_max in (5), which was initially calculated by using φ = 0.

4.1.5. Termination

The termination of the algorithm depends on the critical point τ_max, described in Section 4.1.3. The termination of the algorithm follows the rule that for each V (t) with τ_max > H, all (x_t, . . . , x_H) will be equal to 0 and V (t) =PH

i=tΠ_i(0). Hence if the critical point τ_max lies beyond the time horizon H, the HP will be off until and including H.

4.1.6. Special cases

The sets S¹ and S⁰ contain states for which the HP must be turned on and respectively turned off. States that are in S¹ violate one of the following constraints:

1. The HP has been on for less than µ_on time units, i.e., (3).

2. The temperature in the buffer will be below T_minwhen leaving the HP off, i.e., (1 − c)T − T W_t− CH_t< T_min.

3. The temperature in the buffer will be below T_min when turning the HP off for at least µ_off time units, i.e., the critical point τ_max lies before τ_min

In the ADP algorithm, the states that violate the first contraints will never be encountered, since the next decision epoch is either t + 1 or τ + µ_on with both (3) satisfied. States that are in S⁰ violate one of the following constraints:

1. The HP has been off for less than µ_off time units, i.e.,

t−1

X

i=t−µoff

> 0.

2. The temperature in the buffer will be above T_maxwhen leaving the HP on, i.e., (1 − c)T − T W_t− CH_t+ β · (1 + λ) · W > T_max.

Again, since the ADP algorithm chooses τ_min = t + µ_off, this state, for which x_t must be zero, will never be encountered. The ADP algorithm is listed in Algorithm 1 making use of Algorithm 2 and 3 in Appendix B for which the initial state (T, t) must be given by the user. However, the temperature is still not guarenteed to be always above T_min. For example, whenever the demand is greater than β · W for such a long period that even a full tank is empty. In these rare cases, the buffer can make use of an emergency gas supply to save what can be saved.

(12)

4.2. Optimal Algorithm

The ADP does not necessarily give the optimal solution since not all possible values for τ are evaluated as is discussed in Section 4.1.3. Therefore, to find the optimal solution, one replaces Algorithm 3 in Algorithm 1 by Algorithm 4 in Appendix B. The optimal algorithm will compare the value of the value function of all possible values τ ∈ {τ_min, . . . , τ_max}\M. With this algorithm, the quality of the solutions of the ADP will be tested.

4.3. Greedy Algorithm

The greedy algorithm is based on the behaviour of the PowerMatcher, introduced by TNO researcher Kok (2013) and described in Section 2. In general, when the constraints are not violated, the heatpomp will be turned on immediately when there is a proportion ρ of the power needed available in solar energy, i.e., P V_t−1− E_t−1> ρ · W . The algorithm can be found in Algorithm 5 in Appendix B.

5. Stochastic Approach

In practice, the values of T W_t, CH_t, E_t and P V_t are unknown before time t. So, in constrast to the deterministic algorithm, these parameters are considered stochastic. To make a decision on x_t, some methods of dealing with this stochasticity are listed by Spall (2003): machine (reinforcement) learning, simulation-based optimization, Markov decision processes and stochastic programming (e.g. multi-stage recourse models). To choose the appropriate approach one must take into account that the algorithm should have a low computational complexity since it must be placed inside a household. Furthermore, it would be beneficial if the algorithm could process updates in forecasts without rerunning the entire program. In Section 5.1 the stochastic approaches are briefly summarized. Readers who are familiar with these approaches can continue reading at Section 5.2.

5.1. Evaluation of Approaches

A Markov Decision Process (MDP), as described by Wiering and Van Otterlo (2012), is defined by a number of states, a set of actions, a transition matrix, and a reward function which will be thouroughly explained in Section 5.3. Though an MDP is able to optimize expected value and is therefore for some stochastic problems an optimal approach, the number of states is computationally limited and all states are independent of previous states, i.e., it only matters which state the problem is and not how it came there. In our problem, the state space is continuous since T is continuous and the states are dependent e.g. when a shower has been taken earlier that day, the probability that another shower is taken is very small. Therefore, to make the MDP approach computationally feasible, information about the problem should be omitted. As a consequence it is not known if the MDP optimizes the expected value of the actual problem.

Furthermore, if new forecasts on solar energy are available, the algorithm must entirely start over.

(13)

Reinforcement learning, as described by Wiering and Van Otterlo (2012), is a special case of a Markov Decision Process in which the actions are evaluated after its effect has been observed. However, the effect of an action is in our problem known, wherefore this method does not seem suitable.

Klein Haneveld and Van der Vlerk (2012) discuss the so-called recourse models which are, in contrast to an MDP, time dependent which means that one knows previous states and can therefore make a suitable decision. Moreover, it evaluates the expected value of an action and makes therefore a good decision under uncertainty. However, the number of states grows expo- nentially with the number of stages for which the time complexity is much worse than an MDP.

Simulation is a method in which a number of scenarios are given that are deterministically solved and evaluated. Though the method does not automatically optimize the expected value, the application by Str¨ohle et al. (2014), as already stated in Section 2, gives promising results and is computationally simple. Therefore, we choose this approach to deal with the stochasticity. The quality of the solutions are tested against the deterministic algorithm to evaluate the costs associated with the stochasticity and against an MDP algorithm.

5.2. Simulation

To deal with the stochasticity of the problem, the method stated in Str¨ohle et al. (2014), will be used and referred to as the simulation algorithm and the quality of the solutions compared to a Markov Decision Process which is expected to give better solution, though it has a higher computational complexity. At each point in time t, the simulation algorithm takes a number of randomly chosen scenarios for which the ADP will schedule the HP. The simulation algorithm will evaluate all solutions for the different scenarios to make a decision upon x_t. Firstly, the explanation on how the scenarios are created is given which is followed by a description of the simulation algorithm.

5.2.1. Scenarios

A scenario, denoted by U , is defined as U = T W_{t,...,H}, CH_{t,...,H}, E_{t,...,H}, P V_{t,...,H} and gives therefore values for the stochastic variables in the time range from t to H. The values of these stochastic variables will be based on actual data. Let K = {1, . . . , 365} represent days of the year for which data is available. A scenario U , based on data from two days k_d, k_s ∈ K, is denoted by

U^k =

T W_{t,...,H}^k^d , CH_{t,...,H}^k^d , E_{t,...,H}^k^d , P V_{t,...,H}^k^s

(8) with T W_{t,...,H}^k^d , CH_{t,...,H}^k^d , E_{t,...,H}^k^d and P V_{t,...,H}^k^s being the observations on days k_dand k_sin range t to H of T W{t,...,H}, CH{t,...,H}, E{t,...,H} and P V{t,...,H}, respectively. Assume the current day is k₀. To include seasonal effects, k_dwill be randomly drawn from K_D = {k₀− m, . . . , k₀+ m}, m ∈ N and ks will be randomly drawn from K_s = {k₀− n, . . . , k₀+ n}, n ∈ N. Hence, for day k₀, a scenario U^k is build by randomly selecting k_d ∈ K_D ⊆ K and k_s ∈ K_S ⊆ K.

To create more scenarios, one draws N_d days from K_d and N_s days from K_s and all possible

(14)

combinations of these draws will be listed in the set K with k = (k_d, k_s) ∈ K creating a total of N_d· N_s scenarios. Note that K_D and K_S only depend on k₀ and not on the current point in time t. Hence, observations about demand and supply earlier that day are not taken into account.

5.2.2. Simulation algorithm

At each point in time t of day k₀, N_d days will be randomly chosen from K_d and N_s days will be randomly chosen from K_screating the set K for which a scenario U^kwith k = (k_d, k_s) ∈ K is created. For each scenario U^k, the ADP algorithm as described in Algorithm 1 will give values for x_t, . . . , x_t+H. If in γ% of the scenarios, x_t= 0, the HP will be turned off or left off, else the HP will be turned on or left on.

There are states (T, t) in which there is no choice for x_t since constraints will or have been violated. Let these states again be in the set S¹ and S⁰ for respectively x_t = 1 and x_t = 0.

States that are in S¹ violate at least one of following constraints:

1. The HP has been on for less than µ_on time units, i.e.,

t−1

X

i=t−µ_on

x_i < µ_on.

2. The current temperature is below T_min.

Since the problem contains stochasticity, it could happen that there is a great amount of demand whilst there was no or barely demand expected and accounted for. Therefore, it could happen that the temperature in the buffer is lower than the minimum allowed temperature for which the HP must be turned on immediately. States that are in S⁰ violate at least one of the following constraints:

1. The HP has been off for less than µ_off time units, i.e.,

t−1

X

i=t−µ_off

x_i> 0.

2. The current temperature is above T_max.

Again, since the problem contains stochasticity, it could happen that there is no or barely demand at all whilst a greater amount was expected and accounted for. Therefore, it could happen that the temperature in the buffer exceeds the maximum allowed temperature for which the HP must be turned off immediately.

The simulation algorithm can be found in Algorithm 6 in Appendix B. It describes how the value of x_tis calculated. To determine all values for x₁, . . . , x_H that constitute a solution, one has to iteratively call the simulation algorithm and with each iteration update (T, t).

5.3. Markov Decision Process

A Markov Decision Process is defined as a discrete stochastic dynamic program and is used when decisions have to be made under uncertainty which will influence future decisions and

(15)

rewards. These decisions have to be made at certain points in time. At such a decision epoch, the system is in a certain state. Moving from one state to the other is dependent on the decision made and a transition probability. Each state has its own reward functions, including current rewards and future rewards, regarding a certain decision or what we call actions. Hence, a Markov Decision Process contains a state space S, an action space A, a transition matrix M and a reward function R. Each element will be thoroughly discussed in Sections 5.3.1 to 5.3.4.

5.3.1. States

A state is a unique characterization of all that is important in a state of the problem that is modelled, denoted by s ∈ S. The state space S is a finite set of states {s¹, . . . , s^N} with N ∈ N being the number of different states in which the problem could appear. In our problem the elements that define a state are the temperature of the buffer, T , the number of time periods the HP has been on, N_on, or off, N_off, and the current point in time, t. For simplicity, the temperature of the buffer and the time periods are discretized for which the formal definition of a state is given by

s = {T, N_on, N_off, t}, (9)

with

T ∈ {T_min, . . . , T_max}, N_on ∈ {0, . . . , µ_on}, N_off∈ {0, . . . , µ_off}, t ∈ {1, . . . , H}.

Note that N_on > 0 ⇔ N_off= 0 and N_off> 0 ⇔ N_on= 0 since the HP cannot be both a strictly positive number of periods on and a strictly positive number of periods off at the same time.

5.3.2. Actions

The future state of the system can be influenced by actions. The set of actions A is defined as the finite set {a¹, . . . , a^L} where L ∈ N is the number of actions. For each state s ∈ S there is a subset A(s) ⊆ A with all feasible actions since not all actions can be done in every state (for example, the HP has not been on for at least µ_on time units, so the action must be “on”

for this period). The action space in our problem is the finite set A ∈ {0, 1} indicating either putting the HP off (a = 0) or on (a = 1).

5.3.3. Transition matrix

When taking action a ∈ A in a state s ∈ S, the system makes a transition to a new state s⁰ ∈ S, based on a probability distribution over the set of possible transitions. Hence, the transition matrix M is defined over: S × A × S → [0, 1], i.e., it gives the probability of moving from state s by taking action a to state s⁰. Note that 0 ≤ M (s, a, s⁰) ≤ 1 andP

s⁰∈SM (s, a, s⁰) = 1, for all

(16)

feasible combinations of states s and action a. Note that the probabilities are not influenced by states previous to state s or previous actions, i.e., the Markov property holds given by

P (s_t+1|s_t, a_t, s_t−1, a_t−1. . .) = P (s_t+1|s_t, a_t) = M (s_t, a_t, s_t+1).

For our problem, the probabilities are classified into two cases:

M (s, a, s⁰) ≥ 0 If t_s+ 1 = t_s⁰ and, N_on

s0 = 0, if a = 0 N_on

s0 > 0, if a = 1 N_off

s0 > 0, if a = 0 N_off

s0 = 0, if a = 1 T_s≥ T_s⁰, if a = 0 M (s, a, s⁰) = 0 otherwise.

For the first case, if all conditions hold, the transition probability is equal to the probability of having a T_s− T_s⁰ amount of demand if the action is “off”. Note that this probability could be zero if this size of demand is never observed. If the action is “on”, the probability is equal to having an T_s− T_s⁰+ β · W · (1 + λ [N_on− µ_on+ 1]⁺) amount of demand, where the last maximum value is equal to zero except when N_on= µ_on. Hence the probabilities are:

M (s, 0, s⁰) = P T_s− T_s⁰ = CH(t_s) + T W_t

s , M (s, 1, s⁰) = P

T_s− T_s⁰ = CH(t_s) + T W_t_s− β · W · (1 + λ [N_on− µ_on+ 1]⁺)

. Note that, since T is discretized, both CH_t

s and T W_t

s are discretized as well such that the probability will not be zero. Hence, there is a possibility that the amount of demand is too small to make a significant change in T and T will therefore not change. Accordingly, the cooling parameter is disregarded.

5.3.4. Reward Function

The Reward Function is defined as R : S × A → R, and therefore specifies a reward for doing some action in a certain state. The reward function is used to value a state for which later on, an algorithm can favour between states to come to an optimal strategy. In our problem, the reward functions are stated as follows based on (2):

R_on(s) = p_umin n

F_t^{P V}_s , F_t^E_s + W o

+ p_s h

F_t^{P V}_s − F_t^E

s − Wi+

− p_dh

s − Wi−

R_off(s) = p_umin n

F_t^{P V}_s , F_t^E_s o

+ p_s h

s

i+

− p_dh

s

i−

(17)

with F_t^{P V}_s denoting the forecast at time t_s in terms of expected value of the available solar energy and F_t^E_s the expected value of the demand for electricity for other devices at time t_s. The state space, the action space, the transition matrix and the reward function all define a Markov Decision Process, or with a formal definition cited from Wiering and Van Otterlo (2012).

Definition. A Markov decision process is a tuple hS, A, M, Ri in which S is a finite set of states, A a finite set of actions, M a transition function defined as M : S × A × S → [0, 1] and R a reward function defined as R : S × A → R.

5.3.5. Algorithm

One of the most efficient algorithm to calculate an optimal policy for a Markov decision process is the value iteration algorithm introduced by Bellman (1957). It is based on dynamic programming and makes use of the so-called value function V (s), with the optimal policy satisfying:

V^∗(s) = max

a∈A





 X

s⁰∈S

M (s, a, s⁰) R(s, a, s⁰) + V^∗(s⁰)







. (10)

Equation (10) gives the expected reward of undertaking a certain action. Let i ∈ N be an index for the number of iterations and let (11) be the updating rule for the value function.

V_i+1(s) = max

a∈A





 X

s⁰∈S

M (s, a, s⁰) R(s, a, s⁰) + V_i(s⁰)







. (11)

Starting with a value function V₀ over all states, by using (11), in the limit on the number of iterations, lim_i→∞V_i(s) = V^∗(s). In general, the algorithm will stop when ||V_i+1(s)−V_i(s)|| < .

The value iteration algorithm is illustrated in Algorithm 7 in Appendix B. Based on the final V (s), at s one chooses the action that maximizes the expected value, see (10), which will lead to an optimal policy for all states. In our problem, states will be updated according to (1) and rule t = t + 1. Then, the state defined in the MDP which comes closest to the current state, will be chosen and an action taken according to its policy.

6. Numerical Results

The performance of both the deterministic ADP algorithm and the stochastic simulation algorithm will be evaluated on instances that are created based on available data. The data consist of per minute observations on P V_t, E_t, T W_tand CH_tin the months June to December which consists of 214 days. The time t is discretized is steps of 5 minutes.

An instance is dependent on the time horizon H, i.e., the number of decision epochs, and is created by splitting the 214 days of data into periods of length H. Let H_ADP be the number