Saving energy in buildings using an Artificial Neural Network for outlier detection

(1)

Saving energy in buildings using an Artificial

Neural Network for outlier detection

Jonas Lodewegen 10203745

Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam

Faculty of Science Science Park 904 1098 XH Amsterdam

Supervisor

dhr. dr. M.W. (Maarten) van Someren Institute of Informatics (IVI)

Faculty of Science University of Amsterdam

Science Park 904 1098 XH Amsterdam

(2)

Summary

This project, based on previous work by De Nadai (2013), applies an artificial neural network for the prediction of gas consumption of buildings of the Hogeschool van Amsterdam. An outlier detection method based on Chebyshev’s theorem is used in order to detect anomalies in this consumption. Furthermore, a combined outlier detection is proposed to distinguish between causes of anomalies. Besides this, an attempt is made to create a generic predicting model that parametrises buildings based on their characteristics. The aim of this second part of the project is to predict gas consumption of buildings the ANN was not trained on. In accordance to previous work (described in section 2), the first part of this project lead to promising results. Possible implications of the (combined) outlier detection are described and have to be validated with the help of HVAC system experts. On the other hand, the results of the attempt to build a generic model were unsatisfactory and suggestions for future work are listed.

(3)

1. Introduction

The need for energy in the world is increasing rapidly due to a continuous growth of energy consumption. This need has grown by 49% between 1984 and 2004 and the consumption is still growing. It is expected that the need of developing countries (Southeast Asia, South Africa) will exceed the consumption of developed countries (Western Europe) in 2020. Moreover, China will double its need within 20 years (Pérez-Lombard et al., 2008).

This increasing demand for energy comes with multiple concerns: countries question whether they can keep providing the growing need of their economies, exhaustion of energy sources (mainly fossil sources) and the growing CO2 emission causing climate change. Because of these reasons, the European Union aims to reduce the CO2 emissions in 2020 by 20% from 1990 levels and improve the EU’s energy efficiency (European Commission, 2009).

The total global energy consumers can roughly be divided into three categories: industry, transport and ‘other’ (other consumers). Consumption by buildings accounts for almost the whole third category and thereby domestic and non-domestic buildings together account for 20-40% of the total energy consumption (U.S. Energy Information Administration, 2015). In the EU, this share was 37% with an average growing rate of 1.9% in 2004. In 2012 the ‘other’ category was responsible for 43.4% of the world’s total consumption of natural gas (International Energy Agency, 2014).

It is estimated that Heating, Ventilating and Air Conditioning (HVAC) systems use about 50% of building’s total energy consumption in the EU, which is about one fifth of the total energy consumption (Pérez-Lombard et al., 2008). However there is absence of sufficient data to support these values and therefore it can only be said that HVAC systems in residential and non-residential (e.g. offices) buildings are accountable for a significant part of the worlds total energy usage. Schein et al. (2006) state that the their proposed ‘fault detection and diagnostics’, a technique that intends to benefit building owners by reducing energy consumption, has an energy-saving potential of 10-40% of the HVAC systems total energy consumption. Moreover, since the European Com-mission wants to reach their 2020 targets to use energy more efficient and reduce CO2 eCom-mission, it is worth investigating methods to reduce the HVAC systems share in the worlds total energy usage. Previous research has indicated that Artificial Neural Networks (ANNs) are capable of predicting the energy consumption of a building. Furthermore, suggestions for improving the models and data that were used were given (see section 2). If the predictions by ANNs were used to detect anomalies in the consumption of a HVAC system, this could help experts in configurating these systems and thereby reducing the energy consumption.

De Nadai (2013) used an ANN to predict the gas consumption of a building. The input variables for training the ANN were limited to gas consumption, weather variables (temperature and so on) and an approximation of the usage of the building. Besides that, Seasonal-Trend Decomposition (STD) based on Loess was used, due to the gas consumption being highly seasonal. Most of the data De Nadai (2013) used is nonlinear. Nevertheless, a hybrid approach was proposed (an ANN combined with ARIMA) to help the ANN to train and forecast with the use of linear data. (More on STD can be found in section 3.2.2.)

(5)

and to detect anomalies in this consumption. The approach is based on De Nadai (2013), however the features generated by applying ARIMA are left out of consideration. Combined outlier detection is proposed in order to differ the outliers based on their causes. Besides that, this project questions whether ANNs are suitable for constructing a generic model in which the buildings are being parametrised, that can be used to predict the consumption of the buildings it was not trained on.

This report consists of a literature review where relevant previous research is displayed. Secondly, the methods used for training and testing the ANN are described, followed by the obtained results. The assumptions that were made, the model that was used and the usability of the results are discussed in the following part. Since it was not possible to collect all required data and results of this project have to be further investigated, some suggestions for future work are listed. Finally, the outcomes are concluded and some implications for HVAC system experts are described.

(6)

2. Literature Review

2.1. Factors for energy use in buildings

When constructing a model to predict the energy consumption of a building, it is important to know what factors contribute to the energy need of buildings, since those factors have to be represented in the model. There are many factors that influence the energy consumption of a building. Those factors can be divided into three groups: physical environmental factors, designing parameters and human thermal discomfort (De Nadai, 2013).

• Physical environmental factors may consists of factors such as the outside temperature, wind speed and sun irradiation. Amongst those, outside temperature and wind speed have significant influence, since most energy is used for heating buildings and heat gain and loss depend on outside temperature and windiness. Irradiation, sun hours and humidity can also have some influence. These factors can be represented in a dataset for which the data is available at weather stations.

• Designing parameters of the building consist of transparency, building materials and orientation of the building. The amount of transparency influences the amount of energy the building gains during sunny hours. The isolating properties of a building depend strongly on used materials. Orientation determines, for instance, how the wind streams along the building. These factors can be found in the building plans and at land maps.

• Human thermal discomfort is the least well-defined group of factors, which is complicated to measure and differs for every individual inside the building. The way humans experience the comfort of the environment is influenced by, for instance, sun irradiation, outside temperature, and also humidity. Due to the individualistic and subjective nature of human thermal discomfort, it is not possible to represent it in a dataset. Therefore, this kind of factors are not taken into consideration in this project.

2.2. Artificial Neural Networks

Artificial Neural Networks (ANNs) are mathematical models of the brain’s activity, that consists primarily of electrochemical activity in networks of brain cells called neurons (Russell and Norvig, 2010). ANNs consist of artificial neurons, a node or unit, that are connected by directed links. Each link has a weight that determines the strength and sign of the connection between the nodes. Simple ANNs consist of only one or more input nodes (in an input layer) and one or more output nodes (the output layer). So-called multilayer ANNs also have layers in between, the hidden layers. All units from a layer are connected to every unit of the next layer. The weight of these links are learned (automatically adjusted) by serving the ANN some training data. More background information about ANNs can be found in Russell and Norvig (2010)

This project uses a three-layer perceptron with an input node for all features, a varying amount of nodes in the hidden layer and one output node for the predicted gas consumption. Figure 1 displays this model.

(7)

..

.

..

.

I₁ I2 I₃ I_n H₁ H150 O1 Input

layer Hiddenlayer Ouputlayer

Figure 1: The multilayer perceptron with n input nodes (the features), one hidden layer with 150 nodes and one output node (the predicted gas consumption) that was used for some experiments in this project.

2.3. Predicting energy consumption with ANNs

Predicting energy consumption is not a new field of work. For example, different methods have been deployed at local gas distributors (that serve households), that notify the pipeline company on a daily basis about the amount of gas they expect to sell. The local distributors get penalized if the estimation error exceeds certain limits. (Brown and Matin, 1995).

Even more important is a good estimation of the energy need of a whole country. In the 1970’s there were two big energy/oil crises due to political unrest in the Middle East. During those crisis the energy prices grew drastically, resulting in countries starting to build energy reserves. Since storing energy is expensive as well, countries can benefit from good estimations of their energy needs.

Brown and Matin (1995) used an ANN to predict the energy need of small regions in the United States. They state that one of the biggest sources of errors in the forecasts are incorrect mathematical models, that are not capable of representing the non-linear data (for example, the weather). ANNs, on the other hand, perform well on non-linear data and may work better. Brown and Matin (1995) argue for and construct many features. For instance, the expected heating degree and windiness of a window of days around the day for which the energy need is being forecast. After being trained on a training set with data from 97 days, the ANN performed better than a linear regression model and the human experts that are responsible for making the estimations.

(8)

Ekici and Aksoy (2009) applied energy prediction with ANNs to the scope of one building. The aim of the project was to predict the heating energy requirements of a building, which is the amount of energy needed to heat the building. For predicting the heating energy requirements, multiple artificial designing parameters were used as features: the transparency ratio, form factor and orientation of the building, the optical and thermo-physical properties of the used materials, and the distance to other buildings. The results of an average accuracy of 94.8-98.5% indicate that the three-layered ANNs that were used can predict the amount of energy that is needed to heat a building. Therefore, ANNs may assist building designers in creating energy efficient buildings. In the second part of this project, where an approach to build a generic predicting model is proposed, features suggested by Ekici and Aksoy (2009) are used.

There are many software systems available that model the energy usage of a building. For instance EnergyPlus is a complex robust building simulation system that can predict the energy need based on information on used materials, geometric data, characteristics of the HVAC system and weather data. Neto and Fiorelli (2008) compared the performance of this software system to that of a neural network. They started with only the external temperature as input parameter for 3 different ANNs. One that was trained on and worked with only weekdays, one for the weekend days and one for all days. The latter one gave a higher error rate, regardless of the number of hidden layers and their number of neurons.

Adding more weather features to the ANN and comparing the results indicated that the outside temperature is by far the most important weather feature, since the error rate improved only slightly when humidity and solar radiation were added.

The results obtained by Neto and Fiorelli (2008) showed that the ANNs performed a little better than the specialised software system Energyplus.The researchers expect that the addition of more features can improve the performance of both systems and can facilitate in making the HVAC systems of buildings more energy efficient.

(9)

3. Method and Approach

Is this section the approach is described that was taken in order to train an Artificial Neural Network to predict the gas consumption of a building along with the implemented method for outlier detection. The approach is based on previous work by De Nadai. Secondly, it is described how the generic model for predicting gas consumption for new buildings (without training) was constructed.

3.1. Available data

In the literature review part of this project, it was stated that the factors for energy need in buildings can be divided in three categories: physical environmental factors, designing parameters of the building, and human thermal discomfort. Because of the individualistic and subjective nature of human thermal discomfort, the latter group of factors are left out of consideration in this project.

3.1.1. Physical environmental factors

When it comes to the input features for the ANN, it is obvious to start with weather variables. Outside temperature is the most influential, followed by wind speed. Other weather variables that can improve the performance of the ANN are solar irradiation and humidity. Those have their influence on the heat gain and loss of a building, but even more on human thermal discomfort, how the atmosphere ’feels’ for people present in the building. The data for this variables come from Schiphol airport and is online available.

On the other side the data that comes from the HVAC systems of the buildings of the Hogeschool van Amsterdamwas used. That data was provided by an engineering company called Ebatech and is not publicly accessible. This data contains the gas consumption, electricity consumption and water consumption of the buildings. This can be used to train the ANN in predicting the gas consumption.

3.1.2. Designing parameters

The surface area of the buildings is made available by the Hogeschool van Amsterdam. Since the area correlates with the volume of the building and the volume influences the amount of energy needed to heat the building, this can be a useful feature for the ANN.

3.1.3. Use of the building

Human factors are left out of consideration in this project. However, the HVAC systems are programmed based on the use of the building. In other words, the employment of the HVAC systems differs if there are (supposed to be) people inside the building. Therefore, some way of representing the use is needed.

In order to do so, the day of the week was added to the data: Mondays get a 0, Tuesday 1 and so on. The ANN can learn to differ weekdays from weekend days with the help of this feature.

(10)

Buildings used for education are typically opened on weekdays and closed on Saturdays and Sundays. However, this approach does not take holidays into account and since the Hogeschool van Amsterdamsometimes gives it’s students a whole week off, this might counter the functioning of the ANN.

To solve this possible cause of errors, a dataset containing all holidays of the past years was used. All holidays were processed as being a weekend day.

Figure 2: One week of data (Monday-Sunday). The gas consumption clearly shows a day/night-cycle and the difference between weekdays and weekend days. This plot displays that as the temperature increases during the week, the gas consumption decreases.

Jan 02 2008 Jan 03 2008 Jan 04 2008 Jan 05 2008 Jan 06 2008

40

60

80

100

120

140

160

180

200 ga

s [

m3

]

−20

0

20

40

60

80

100 tem

pe

rat

ure

3.2. Predicting gas consumption for one building and (combined) outlier detection

3.2.1. Feature engineering

After collecting and cleaning (interpolation for missing rows and removal of duplicate data) all the data, extra features were generated to assist the ANN in predicting the gas consumption.

For instance, the temperature of the past hours. These type of features can be supportive to the ANN, since buildings hold energy and therefore warmth and loosing the warmth gained during a couple of hot days can take longer than just one day. Another example are all the time-series

(11)

Figure 3: The same week displayed as in figure 2 with the gas consumption and radiation of the sun

Jan 02 2008 Jan 03 2008 Jan 04 2008 Jan 05 2008 Jan 06 2008 40 60 80 100 120 140 160 180 200 ga s [ m3 ] 0 20 40 60 80 100 rad iat ion of th e su n

variables. Is it argued that applying the sine function to those variables can lead to useful features. The following categories of features were generated in order to assist the ANN:

• Rolling means, maximum and sum of gas consumption: the average gas consumption of the previous 15 days, hour and two hours, since buildings can hold warmth (and cold) for a longer period than the scale on which the predictions are performed (one hour). Likewise, the sum of the 5 previous hours and maximum consumption of the previous hours are added.

• The maximum temperature and gas consumption of the current hour.

• The next day: For instance, it is assumed that the HVAC system is configured to start heating the building at Sunday evening, since the building will be used on Monday. • Sine and cosine of time variables: the feature representing the hour of the day ranges from

0 to 23. After the 23th hour, the next day starts at hour 0, which is a difference of 23 hours. However, the difference is in fact only 1 hour. Adding the sine and cosine of the hour variable can support the ANN for training on the hour variable.

A list of all features that were generated can be found in appendix A. The features that were used by De Nadai can be found there as well.

Since the available data is highly seasonal, STD is used, like De Nadai (2013). However, the hybrid approach is omitted. In the following section the results will be compared to the results obtained by De Nadai’s study.

(12)

3.2.2. Seasonal-Trend Decomposition

Seasonal-Trend Decomposition (STD) is a statistical method to deconstruct a time serie into components which all have a certain characteristic. Those components can be used to reconstruct the original time-serie by addition or multiplication (Wikipedia, 2014). Cleveland et al. (1990) suggest a method for STD based on Loess, which decomposes a time serie into trend, seasonal and remainder components. The remainder component describes how each data-point diverges from the seasonal and trend component and could be a useful feature for the ANN. Adding the remainder to the seasonal and trend component, leads, as said before, to the original time-serie (data).

u(t) = s(t) + T (t) + r(t)

Gas usage (u) at time t equals the sum of the seasonal (s), trend (T) and remainder (r) values at time t.

3.2.3. Training the model and predicting the gas consumption

The first goal of the project was to predict the gas consumption of one of the HvA-buildings. Two buildings seemed to have to most consequent and complete and thereby useable data: 740-NTH (Nicolaes Tulphuis, Tafelbergweg 51, Amsterdam) and 761-KMH (Koetsier Montaignehuis, Mauritskade 11, Amsterdam). (See appendix C for the building locations.) Features for these buildings were generated as described before and the dataset was feeded to an ANN implemented in Pylearn2, a python library for machine learning that is still being developed. Pylearn2’s multilayer perceptron was used with stochastic gradient descent as learning algorithm (learning rate 0.02). Pylearn2 has to be configured by a yaml-file. An example of the configuration file used for these predictions can be found in appendix B. The results of these first predictions can be found in table 1 and in figure 4.

Building epochs RMSE MAE

740-NTH 150 15.58 12.01 761-KMH 150 3.25 2.12

Table 1: Initial prediction results of building 740-NTH and 761-KMH, shown are the root mean square error (RMSE) and mean absolute error (MAE).

3.2.4. Outlier detection

For the detection of outliers a method using the Chebyshev’s theorem, described by Amidan et al. (2005), was implemented, that assumes that the dataset consists of independent measurements and that there are relatively little outliers. It is not necessary that the distribution of the data is known. since Chebyshev’s theorem bounds the amount of data points that fall outside k standard

(13)

01 May 2013 02 03 04 05 Days 0 5 10 15 20 25 30 35 Ga s ( m 3) actual predicted (a) Building 761-KMH 01 May 2013 02 03 04 05 Hours 0 20 40 60 80 100 120 Ga s ( m 3) actual predicted (b) Building 740-NTH

Figure 4: The predicted and actual gas consumption of two buildings. Results of one week are displayed.

deviations of the mean. Using a chi-squared distribution and the theorem that states that almost all observations fall the range [µ − 3σ, µ + 3σ], the outlying data points were detected.

The results of applying the outlier detection on the data with predicted and actual gas consumption can be found in figure 5.

3.2.5. Combined outlier detection

In the literature review part of this report, the many factors that influence the gas consumption of a building are described. Not all those factors can be represented in the ANN and therefore outliers may always occur and the reasons for that can vary. In this project, it is argued that outlier that occur at multiple buildings at one moment in time, have a different kind of cause than outliers that occur only at one building. For instance, if the weather differs from the predictions this will influence the realised gas consumption and could lead to an detected outlier on that moment in time. However, this does not indicate that the HVAC-systems of the buildings is configured faulty. On the other hand, an outlier that occurs at only one building, could indicate that the configuration is not optimal.

Hence, the usability of the outlier detection for HVAC system experts can benefit from the detection of outliers occurring at multiple buildings. For building 740-NTH and 761-KMH a plot of this method can be found in figure 6.

3.3. Building a generic model

The previous parts of this section describe a method that predicts gas consumption based on historical data of buildings. In addition to that, an attempt was made to create a generic predicting model, that works with building properties in order to predict the gas consumption. It was assumed that this would lead to a model that can predict the consumption of a new building

(14)

01 May 2013 02 03 04 05 Days 0 5 10 15 20 25 30 35 Ga s ( m 3)

actual

predicted

outliers

Figure 5: Detected outliers of building 761-KMH. Values of the outliers <= 0 indicate that no outliers were detected.

without historical data, but only with building characteristics and weather variables. The buildings were parametrised by the following features:

• Year of construction: in general, new buildings are better isolated than older buildings. Adding this year was an attempt to approximate the isolation thickness, which was not available.

• Perimeter: the perimeter of a building is length of the outside walls. Multiplying it by the height leads to the area of the outside walls.

• Area of outside walls: the area in contact with the surroundings of the building and thereby the amount of space that is influenced by the weather (minus the roof). It is the multiplication of the perimeter and the average height. The latter was approximated, since the precise heights were not available.

• Floor space: the area (one storey) of the building.

• Gross floor area: The total floor area contained within the building measured to the external face of the external walls. This is the sum of all storeys.

• Projection on the south and east: attempt to represent the orientation of the building along with an approximation of its form factor (see section 3.3.1).

The previously used buildings (740-NTH and 761-KMH) were used together with Hogeschool van Amsterdam (HvA) buildings 556-JWS and 729-DMH to train the generic model. (See

(15)

Figure 6: Combined outlier detection of buildings 740-NTH and 761-KMH. Values of the outliers <= 0 indicate that no outliers were detected.

appendix C for building locations.) The features described above were partially made available by the HvA and partially constructed with the help of the BAG-register (public registration of addresses and buildings).

3.3.1. Orientation of the building

Ekici and Aksoy (2009) used the orientation of a building as one of the factors to successfully predict the heating energy requirements of a building. The orientation of a building is one of the factor that determines the influence of the weather. For instance, if the radiation of the sun is high (which is usually in the morning and on the middle of the day), then a large, south-oriented wall gain more heat than an west-oriented wall. Since the method proposed by Ekici and Aksoy (2009) only works for rectangular buildings, two different methods for representing the orientation were tried:

1. The length of a wall times the angle in degrees, where a wall facing south has an angle of 180 degrees, a wall facing east and west 90 degrees and a north facing wall 0 degrees. However, this method is not sufficient for representing the orientation, since it equals the perimeter times 90 and thereby it is orientation invariant.

(16)

Figure 7: The south and east projection features of building 761-KMH

2. The projection of the building on the south axis. It is argued that this is a usable approximation of the heat gain by radiation, since the sun radiation is at its highest on the middle of the day. For this method, a land map was used and the lengths of the projections were calculated based on the length and angles of the walls. Secondly, the east orientation was added as well, since two axes can represent the form factor of a building in a more sufficient way. (See figure 7.)

3.3.2. Training the model

For the generic model, the same ANN was used as for the separate buildings. It was trained on a dataset containing the weather variables and historical gas consumption of the buildings combined with the building characteristics listed in section 3.3. The testing was initially done on a set with building characteristics and weather variables of the same buildings. Pylearn2’s multilayer perceptron was used with different termination criteria (in terms of the amount of epochs) and varying amounts of units in the hidden layer. The results of these experiments are described in the following section (4).

(17)

4. Results

The experiments described in the previous section can be divided into two categories: the experiments for predicting gas consumption of the buildings and detecting outliers; and the experiments for building a generic model.

For the first part, an extensive amount of features was created, for which the technique STD was used. These features formed the input for Pylearn2’s multilayer perceptron.

4.1. Predicting the gas consumption of one building

The prediction of gas consumption of one building based on historical data has led to promising results. After training for 150 epochs with 150 neurons in the hidden layer, the measured root mean square errors was 15.58 for building 740-NTH and 3.25 of building 761-KMH, where the real gas consumption has a range of 0 to 133 and 0 to 52 cubic meters of gas, respectively. The results of more experiments on building 761-KMH can be found in table 2.

epochs neurons in hidden layer RMSE MAE

50 50 3.36 2.31 50 100 3.07 1.83 50 150 3.26 2.12 100 50 3.36 2.29 100 100 3.04 1.82 100 150 3.24 2.13 150 50 3.36 2.31 150 100 3.08 1.83 150 150 3.25 2.11

Table 2: The results of prediction experiments for building 761-KMH. The best results are highlighted.

The results obtained for building 761-KMH show that the optimal amount of epochs is approximately 100. The higher error scores that result from longer training may be caused by overfitting, although the scores are only slightly higher.

The approach taken by De Nadai (2013), on which this first part of this project was based, obtained slightly better results with the ANN: a mean absolute error of 9.52 for building 740-NTH. The hybrid approach in which ARIMA was used to create features for the ANN, resulted in a MAE score of 7.33. The results of the hybrid approach for building 761-KMH, where this project obtained the best results, are unknown.

4.1.1. Gas consumption anomaly detection for one building

For detecting the outliers and thereby detecting anomalies in the buildings gas consumption, Chebyshev’s theorem was used as described in section 3.2.4. The results of the application of this method on building 761-KMH can be found in figure 5. This method for outlier detection is capable of detecting datapoints where the actual and predicted gas consumption differ more than

(18)

the threshold (calculated based on Chebyshev’s theorem and thereby on the distribution of the data). Consequently, outliers that were inserted manually in order to test the outlier detection were detected accordingly.

Both buildings sometimes show multiple outliers in a row, meaning that the outlier detection detects an outlier for consecutive hours, interspersed with individual outliers. The individual outliers are usually detected early in the morning. The most probable reason for this is the fact that the HVAC system then starts heating the building for the day. (This pattern is visible in figure 5.) Although the outliers that were found may indicate misconfiguration of the HVAC systems, not all types of anomalies can be found by this method. The next section of this report elaborates on different types of anomalies.

4.1.2. Anomaly detection for multiple buildings

In this project, it is argued that, when comparing multiple building, outliers that occur at all buildings have a different origin than outliers that occur at only one building. For instance, if the weather data (measured at the station) is different from the actual weather (that influences the consumption of the buildings), this could induce anomalies at all buildings. On the other hand, if the HVAC system of one building is configured faulty, it is likely that an outlier will be detected only at that certain building. Therefore, a method for combined outlier detection was implemented to distinguish between single outlier and overall outliers. Figure 6 shows the outliers of building 740-NTH and 761-KMH and the moments when anomalies were detected in both buildings at the same time.

With the threshold used for the outlier detection, outliers do not often occur at both buildings at the same time. Would this method be applied, then the threshold in the outlier detection could be lowered in order to retrieve more outliers at the buildings and thereby possibly more outliers occurring at the same time.

4.2. Building a generic prediction model

In the second part of this project an attempt was made to create a generic model to predict gas consumption based on building characteristics. These characteristics, described in section 3.3, were used as input for Pylearn2’s multilayer perceptron. After some initial experimenting with a small dataset without normalised features, no fruitful results were obtained, since the predicted values were the same for every data point. Manual manipulation of the data, along with normalisation, led to the first results. Nevertheless, the results obtained by an experiment with the real data and normalised features, resulted to the following error rates:

epochs RMSE MAE 30 42.98 34.32 60 75.80 65.35 90 73.98 64.72

Table 3: RMSE and MAE of three experiments for the generic model

(19)

5. Discussion

The findings of predicting the gas consumption of one building after training the multilayer perceptron on historical data show promising results. As found in previous research enumerated in section 2, ANNs have proven to be an effective model for the prediction of energy consumption of buildings.

The prediction for building 761-KMH resulted in a lower error rate than the prediction for 740-NTH. The reason for this could be that building 761-KMH is an older building (build in 1914, where 740-NTH was build in 1993) that probably has worse and less isolation. Because of this, it responds more and quicker to the weather, with which the ANN is trained to work on.

In the previous section it was mentioned that the outlier detection detects anomalies and that the detection method can help HVAC system expert in making the systems more energy efficient. On the other hand, not all possible forms of anomalies are being detected. The method based on Chebyshev’s theorem is suitable for detecting data points where the real consumption differs more than a certain threshold from the predicted consumption. However, if the system was misconfigured for a long period of time, then the ANN was trained on this misconfiguration and thereby not able to detect this suboptimal settings of the HVAC system.

Besides that, for the combined outlier detection, it was assumed that an anomaly that occurs at only one building at a certain moment, indicates a faulty configuration of the HVAC system. Nevertheless, if the building is opened on a weekend day and the system is not adjusted accordingly, anomalies could be detected, since the day of the week is one of the features on which the ANN was trained.

A HVAC system expert has to investigate the cause of detected outliers and the implications of the combined outlier detection in order to analyse the practical use of the constructed methods and to validate the implications described in this project.

5.1. The generic model

The aim of the second part of this project was to parametrise the buildings based on their characteristics. Secondly, an ANN had to be trained in order to predict the gas consumption of new buildings after adding only the characteristics of the new building.

The results obtained by the experiments on the generic model were inadequate for application on buildings on which the ANN was not trained. One of the possible reasons for this inadequateness are the features that were used to train the ANN. Because of the unavailability of some of the features that were found helpful in previous research, some approximations had to be made. A list of features that could have supported the functioning of the model, but were not available, are listed below:

• Volume of the building: the volume of the building determines the amount of space that has to be heated by the HVAC system. It can be argued that the gross floor space is a rough approximation of the volume.

• Surroundings of the building: both the speed and degree of influence of the weather on the gas consumption of the building are affected by the surroundings of the building. Large buildings on the south side can block the sun for multiple hours per day and typically

(20)

stretched buildings on the west side can prevent the wind from streaming along the building. These kind of factors are hard to represent in an ANN, however they may have extensive influence of the heat gain and loss of a building. Moreover, surrounding buildings also emit heat.

• Isolation thickness and material of the building. Ekici and Aksoy (2009) acquired these features for their predictions with an ANN with positive results. The types and thickness of the isolation affect the impact of the weather on the conditions inside the building. • Ratio of glass to wall. The proportion of the outer walls that is constructed of glass

(windows) has influence on the isolation properties of the building. Besides that, the effects of solar irradiance on parts of the outside wall with or without glass are different.

• A sun position algorithm: calculating the sun irradiance at a certain time, and thereby the effect of this on a building, requires a method to determine the position of the sun. Aydogan (2013) propeses a model to simulate and forecast solar irradiance, a 3D model to analyze how sunlight affects a room over time and finally a method to calculate the total energy gain in a single room. Extending this proposed method to an entire building could support the ANN in predicting the influence of sun irradiance.

• A better method to represent the usage of the building. Weekdays and weekend days were differed in the first part of this project. Besides that, a dataset with holidays was used in order to take the holidays into consideration. However, a more precise representation of the usage could result in better prediction. For example, if there is an open house, this could lead to faulty predictions and an outlier that was not caused by a misconfiguration of the HVAC system.

Modeling of the sun position, representing isolation material and finding a way tot take the surroundings of a certain building into consideration could have influence on the prediction of the gas consumption, but need further investigation in order to be supportive to the ANN. Besides that, if the system described in this project has to be put into practise by HVAC system experts, it has to be transformed into an application. Implications for HVAC system experts are described in the following section.

(21)

6. Conclusion

From the results obtained by the predictions of the gas consumption of the single building it can be concluded that ANNs (multilayer perceptron) form a suitable model for predicting the need of energy of a building. This conclusion is in accordance to the outcomes of previous research (see section 2).

The outlier detection that was implemented is capable of detecting anomalies in the gas consumption. In the previous sections is was discussed that, however not all types of anomalies may be detected, this could help optimise the configuration of HVAC systems.

Furthermore, the combined outlier detection that was proposed and implemented, distinguishes between single and multiple occurrences of anomalies and thereby could help identify the cause of the anomaly. This cause is expected to be internal (e.g. a misconfiguration of the HVAC system) if the outliers occur at one building and external if it occurs at multiple buildings simultaneously. Although the predictions of gas consumption for one building work appropriately and the outlier detection could help HVAC system expert make their systems more efficient, it has to be concluded that the results of experimenting with the more challenging problem of building a generic model were unsatisfactory. Future work has to demonstrate if the suggestions discussed in the previous section, for instance the addition of more descriptive building characteristics like the glass to wall ratio, can improve the results of this generic model.

6.1. Implications for HVAC System experts

If the detection of outliers is extended to the different types of anomalies discussed in section 5 and the method for detecting outliers is converted into an application, it could support HVAC system experts in their proceedings. After collecting and inserting data into this application, the system could identify possible misconfigurations. Experts that control multiple buildings can benefit from the anomaly detection even more, since this application could point in the direction of the cause of the problem.

Furthermore, if future research accomplishes to build a generic prediction model, applying the application to new buildings can be done needless of the collection of historical data.

(22)

A. Generated features

The following features were used as input for the ANN. Features marked with an asterisk were used by De Nadai (2013) for his experiments as well:

• Gas [m3_{]: the feature that has to be predicted after training *} • Gas 1 hour before *

• Gas 2 hour before • Gas peak 5 hour before * • Gas peak 1 day before * • Gas sum 1 day before * • Gas sum 5 hour before • Gas mean 15 day before

• Weekday: day of the week, Monday = 0, Tuesday = 1 ... *

• Next day: the next day of the week, Monday = 0, Tuesday = 1 ... * • Sine of the weekday

• Cosine of the weekday • Sine of the next day • Cosine of the next day

• Hour of the day * • Day of the year * • Month *

• Year

• Sine of the day of the year • Cosine of the day of the year • Sine of the month

(23)

• Sine of the year • Cosine of the year

• Electricity [kWh] *

• Electricity peak of previous 5 hours *

• Residuals (from STD) of gas consumption on the scale of a year * • Residuals (from STD) of gas consumption on the scale of a day *

• Outside temperature * • Humidity *

• Irradiation • Wind speed

• Temperature peek of previous 5 hours *

• Temperature difference compared to previous hour *

Because of the hybrid approach, De Nadai (2013) used features besides those listed with an asterisk.

(24)

B. Pylearn2 yaml configuration file

!obj:pylearn2.train.Train {

dataset: &train !obj:pylearn2.datasets.csv_dataset.CSVDataset { path: ’train_761KMH_41f.csv’,

task: ’regression’

},

model: !obj:pylearn2.models.mlp.MLP { layers: [ !obj:pylearn2.models.mlp.RectifiedLinear { layer_name: ’h0’, dim: 150, sparse_init: 1, use_bias: True }, !obj:pylearn2.models.mlp.Linear { layer_name: ’y’, sparse_init: 1, dim: 1, } ], nvis: 40, },

algorithm: !obj:pylearn2.training_algorithms.sgd.SGD { batch_size: 10,

learning_rate: .002,

learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum { init_momentum: .05

},

monitoring_dataset: {

’train’ : *train,

’valid’ : !obj:pylearn2.datasets.csv_dataset.CSVDataset {

path: ’valid_761KMH_41f.csv’,

task: ’regression’

}, },

termination_criterion: !obj:pylearn2.termination_criteria.EpochCounter { max_epochs: 150 } }, extensions: [ !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest { channel_name: "valid_objective",

(25)

save_path: "Trained_Model_761KMH_41f.pkl"

} ] }

(26)

C. Building locations

Figure 8: The four HvA buildings and the location of the weather station

Building Address

761-KMH Mauritskade 11, Amsterdam 740-NTH Tafelbergweg 51, Amsterdam 729-DMH Dr. Meurerlaan 8, Amsterdam 556-JWS James Wattstraat 79, Amsterdam Table 4: The addresses of the four HvA buildings

(27)

D. Bibliography

Amidan, B. G., Ferryman, T. A., and Cooley, S. K. (2005). Data outlier detection using the chebyshev theorem. In Aerospace Conference, 2005 IEEE, pages 3814–3819. IEEE.

Aydogan (2013). Local weather forecasting: an elobaration on solar irradiance.

Brown, R. H. and Matin, I. (1995). Development of artificial neural network models to predict daily gas consumption. In Industrial Electronics, Control, and Instrumentation, 1995., Proceedings of the 1995 IEEE IECON 21st International Conference on, volume 2, pages 1389–1394. IEEE. Cleveland, R. B., Cleveland, W. S., McRae, J. E., and Terpenning, I. (1990). Stl: A seasonal-trend

decomposition procedure based on loess. Journal of Official Statistics, 6(1):3–73.

De Nadai, V. S. (2013). Detecting short-term anomalies in gas consumption using arima and robust artificial neural networks.

Ekici, B. B. and Aksoy, U. T. (2009). Prediction of building energy consumption by using artificial neural networks. Advances in Engineering Software, 40(5):356–362.

European Commission (2009). The 2020 climate and energy package. http://ec.europa.eu/ clima/policies/package/index_en.htm. Accessed: 2015-04-19.

International Energy Agency (2014). Key World Energy Statistics. International Energy Agency. Neto, A. H. and Fiorelli, F. A. S. (2008). Comparison between detailed model simulation and

artificial neural network for forecasting building energy consumption. Energy and Buildings, 40(12):2169–2176.

Pérez-Lombard, L., Ortiz, J., and Pout, C. (2008). A review on buildings energy consumption information. Energy and buildings, 40(3):394–398.

Russell, S. and Norvig, P. (2010). Artificial Intelligence: A Modern Approach. Prentice Hall series in artificial intelligence. Prentice Hall.

Schein, J., Bushby, S. T., Castro, N. S., and House, J. M. (2006). A rule-based fault detection method for air handling units. Energy and Buildings, 38(12):1485–1492.

U.S. Energy Information Administration (2015). Annual Energy Outlook 2015. U.S. Energy Information Administration.

Wikipedia (2014). Decomposition of time series. https://en.wikipedia.org/wiki/ Decomposition_of_time_series. Accessed: 2015-06-03.

Saving energy in buildings using an Artificial Neural Network for outlier detection