A prediction route with efficiency of security as destination.

(1)

1

A PREDICTION ROUTE WITH

EFFICIENCY OF SECURITY AS

DESTINATION

Discovering influences of variables to improve the effectivity and efficiency in the private security branch

Darby Wesselink Student number: 11587083

University of Amsterdam Faculty of Science

Thesis Master Information Studies: Business Information Systems Final version: 2018/07/05

Supervisor: Loek Stolwijk Examiner: Dick Heinhuis

Abstract. The private security is a highly competitive industry, for this reason there is often information unavailable. The aim of this research is to find relationships between independent variables and; (1) effectivity, and (2) efficiency. Which can provide important insights. Business experts’ statements are translated to hypotheses, which have been tested with correlation tests. The datasets have been extracted from internal and external sources. The soft modeling approach SmartPLS has been used to discover influences of independent variables on effectivity and efficiency. Four different prediction methods have been executed to predict the number of notifications per week. Polynomial curve fitting has the best predictive ability, where the actual number of notifications have been used to retrain the algorithm.

Keywords. Private security branch, Efficiency, Effectivity, SmartPLS, Polynomial curve fitting

Introduction

The Dutch government attempts to minimize the amount of robberies, burglaries, and tries to increase the probability of detection and by demanding higher penalties. They invest in prevention and better collaboration between the police, municipalities, and entrepreneurs. According to Centraal Bureau voor de Statistiek (CBS) (CBS, 2018), there were around 428.000 registered burglaries in the Netherlands in 20171_{. On the other hand,} individuals and organizations can decide to hire a company within the private security branch for securing people, valuables, buildings, goods, processes and risk prevention.

This research focuses on the investigation whether it is possible to make predictions to improve the efficiency and effectivity of a private security company. The aim of this research is to find relationships between variables which can provide important insights. Finding those potential relationships will be investigated by using a multiple regression model and path analysis. Using the multiple regression model investigates the relationship between two or more independent variables and one

(2)

2 dependent variable. The Path analysis - SmartPLS - can lead to explaining the model of the relationship between measurements and latent factors which aimed at explaining the variance in the endogenous factors (Loek, 2017). Finding relationships between variables will gain an understanding which of the independent variables have the most influence on the efficiency and effectivity. This research can provide valuable insights and contributes to innovation within the private security branch. The scientific relevance is that the focus of this research is on efficiency and effectivity by taking into account influences, instead of efficiency and effectivity based on historical data and contract hours of security guards.

The overall problem is that scientific research on the security branch shows that the main focus is on investigating the criminology and satisfaction of the citizens. There is a lack of investigations being done on; (1) improving efficiency and effectivity within the private security branch, and (2) related independent variables which might have influence. The first identified gap within the literature is that there is often information unavailable about the private security branch. “The industry is a highly competitive and often reluctant to disclose sensitive business information to inquisitive researchers” (van Steden & Nalla, 2010). The second identified gap is that investigation on making predictions within the security branch is lacking.

The case study is carried out on behalf of X Netherlands which is one of the largest private security company in the world. X is active in more than 100 countries, has around 610,000 employees (X, 2018) and focuses especially on cash, safety and security solutions.

Based on discussions with business experts within the X Netherlands company, the aim is to explore possible ways to improve the efficiency and effectivity of the deployment. This can result in a more predictive method of the deployment and may create more appropriate solutions for the clients. Relevant efficiency and effectivity can still be achieved in the current processes. Actions, events or unexpected incidents may influence the process of deployment. This causes among others: (1) incorrect registrations, (2) force majeure, and (3) ambiguities. Firstly, the incorrect registrations can occur through system errors or human (re)actions. Secondly, force majeure can arise through, conditions like weather, delays, events, or traffic jams. A practical (processing) example for this, each security notification enters the control room. From the moment a notification enters, till the moment the notification has been finished are logged. An effectivity problem occurs when a security guard gets stuck in a traffic jam. An efficiency problem occurs when a security guard arrives at the location and it turns out to be a false alarm. Thirdly, it is unknown whether there are explainable reasons for deviations or trends on average incoming notifications. In this context, what is or can be the reason for the increased or decreased number of notifications. For example, business experts from X suggest that there are more incidents during a thunderstorm. Others are convinced that there are more notifications when it is sunny, because many windows remain open.

The opportunities of these security solutions might be subject to improvement by making predictions. During the discussions with business experts, X indicated that producing predictions to improve their efficiency and effectivity is not yet in scope within their projects.

(3)

3 Given that prediction methods might improve the effectivity and efficiency, the following main research question has been conducted: Which independent variables have the most influence on the effectivity or efficiency, and is it possible to use these variables to make predictions to improve the efficiency and effectivity within the private security branch?

To assist achieving the goal of the research, the following sub-questions are proposed: 1. What is the input for SmartPLS to discover relations between the efficiency or

effectivity and independent variables?

2. What kind of relation, through SmartPLS, is there between the efficiency or effectivity and the selected independent variables?

3. Why could the investigated independent variables be used as a predictor to improve the efficiency and effectivity within the private security branch? 4. How can the predictions be made to improve the efficiency within the private

security branch?

For SmartPLS, there are thirty-two hypotheses posed from the business experts’ opinions and observations during interviews. For the ease of reading, the hypotheses are described in section 3.3.

Based on existing literature, analyses to determine how SmartPLS can indicate possible relationships and which prediction methods can be used, a review is presented in section 2. In section 3, the described steps in the research process and the methods which have been used to answer the research question are illustrated and the analysis and results discussed. In the last part the conclusions and the advice for X are described and the results of the research that might be input for future research.

1. Literature review 1.1. SmartPLS

Structural Equation Modeling (SEM) is a multivariate data analysis method which can help to describe, understand and predict elements in reality (Loek, 2017). A more detailed description of SEM can be found in appendix 7.1. Partial Least Squares (PLS) - also called variance-based - is a soft modeling approach to SEM (Wong, 2013). Soft modeling approach in terms of modeling when the “hard” assumptions cannot be encountered. PLS focuses not on explanation but on exploratory research.

“SmartPLS is the most prevalent implementation as a path model” (Garson, 2016). It can manage multicollinearity and several independent variables. Multicollinearity is a phenomenon in the statistics which means that one predictor variable can be linear predicted from other variables with an important accuracy degree. According to Grewal, Cote & Baumgartner (2004) multicollinearity addresses one of the various application problems, because of “high correlations among the latent exogenous constructs”. According to Garson (2016), PLS is a statistical model which is the most suitable whether the research purpose is potential predictions or exploratory modeling. “On the response side, PLS can relate the set of independent variables to multiple dependent (response) variables. On the predictor side, PLS can handle many independent variables, even when

(4)

4 predictors display multicollinearity” (Garson, 2016). It can be implemented as a regression model as well as a path model. Where the regression model attempt to predict one or more dependents from a set of independents (Garson, 2016). The path model handles causal paths which relate to predictors and paths relating the predictors to the response variables. It is recommended to execute the PLS path modeling in a premature stage of the theoretical developments, with the aim to test and validate exploratory models. A more detailed description of SmartPLS can be found in appendix 7.2. 1.2. Efficiency

“Efficiency is the relation between (1) the accuracy and completeness with which users achieve certain goals and (2) the resources expended in achieving them” (Frøkjær, Hertzum & Hornbæk, 2000). The indicators of efficiency consist of task completion time and learning time (Frøkjær et al., 2000). Measuring the efficiency will indicate the security company uses the minimum quantity of inputs in order to produce a given quantity of the outputs or maximizing the output quantity given a certain quantity of inputs (Fethi & Pasiouras, 2010).

1.3. Data mining

“Data mining is the search for valuable information in large volumes of data” (Weiss & Indurkhya, 1998). Weiss et al. (1998) states that big data is a characteristic feature of data mining. According to Gandomi and Haider (2015) there are different big data analytical techniques whether the data is structured or unstructured. Predictive analytics is one of these big data analytical techniques which are primarily based on statistical methods. Based on historical or dynamic data, predictive analytics predict outcomes. “In practice, predictive analytics can be applied to almost all disciplines – from predicting the failure of jet engines based on the stream of data from several thousand sensors, to predicting customers’ next moves based on what they buy, when they buy, and even what they say on social media” (Gandomi et al., 2015). Predictive analytics attempt to discover patterns and capture relationships in the data. Predictive analytics can be categorized in two different groups; regression techniques and machine learning (ML) techniques. First, a regression technique is capable of discovering linear relationships. A learning scheme such as regression methods, deal barely with ratio scales because it calculates the “distance” between two instances on the basis of the values of their attributes (Witten, Frank, Hall & Pal, 2016). The task of regression predictive modeling is to approximate a mapping function, which is called f. Where variables (X) are the input and the output is a continuous variable (y) (Brownlee, 2017). Linear regression is one of the algorithms of regression methods. The aim of linear regression is “to capture the interdependencies between outcome variable (s) and explanatory variables and exploit them to make predictions” (Gandomi et al., 2015). Second, “ML enables the acquisition of structural descriptions of examples” (Witten et al., 2016) which can be used for predictions, explanations and to gain an understanding. The definition of ML is a philosophical question. Witten et al. (2016) states that learning implies thinking and purpose, without purpose it is definitely training. According to Sebastiani (2002), ML is the search for algorithms where the reasoning from externally supplied instances will be used to generate common hypotheses, which will create predictions relating to future instances. Previous research from Freitag (2000) has shown

(5)

5 that ML is a various source of ideas for algorithms which can be trained to perform information extraction.

2. Research Process and Methodology 2.1. Information & sources

This research is based on previous investigations and explorative since the purpose is to achieve a preliminary understanding of possible relations between (1) efficiency and independent variables, and (2) effectivity and independent variables. The insights for the research are derived from interviews with business experts from X combined with relevant literature and observations from data. The following features are needed in order to answer the research question; (1) independent variables, (2) dependent variable, and (3) datasets. These features are the input for the used methods and techniques.

2.2. Method & Techniques

This research has been applied according to the quantitative approach and follows the process of a deductive approach.

Parts of Mean Time Between Failure (MTBF) has been applied, the explanation can be found in appendix 7.3. MTBF, often has been used for lifecycle predictions of machines, it can also be used for this research. Firstly, the elapsed time between the inherent failures or disruptions of incoming notifications could have impact on the deployment. Secondly, the elapsed time is unspecified. It is unknown where the bottlenecks are located in the process of handling an incident. This also means that it is unknown which part of the process takes the most time. By applying parts of the MTBF, the task completion time can be measured as well as indicating the bottleneck within the process. The results of MTBF and the independent variables were the input for SmartPLS.

There are numerous advantages which emphasize the use of SmartPLS. Firstly, it includes the ability for modelling multiple dependent and multiple independents. Secondly, SmartPLS has the ability to handle multicollinearity between the independent variables. Thirdly, the robustness relative to data noise and missing data. The last advantage of SmartPLS, “creating independent latent variables directly on the basis of cross-products involving the response variable(s), making for stronger predictions” (Garson, 2016). SmartPLS has been used to test the proposed model which led to observations and findings. It explains relationships between the measurements and the latent factors of the model. A reflective model has been used. The reliability of the measurements and the validity of the model has been checked by using the PLS-SEM model evaluation method which is shown in figure 1.

Figure 1. PLS-SEM model evaluation method (Adapted from Sarstedt et al., 2014)

(6)

6 Four different methods have been executed in order to determine which of these have the best predictive ability. The approach and used methods will be described further in section 3.4.

2.3. Resources & Data collection

The resources and data collection part are conducted in four different steps, which are explained in this section.

2.3.1. First step – conducting the interviews

The first step was conducting the interviews to derive insights for this research. The face to face interviews were at the respondents’ work offices. From the 9 candidates, 7 of the candidates agreed to be interviewed. A summary table has been applied to appendix 7.4 which contains the respondents’ title, organization, and background. The respondents have been selected via a non-probability sampling and based on acquaintance and trustworthiness. The types of non-probability sampling that have been used are convenience sampling and snowball sampling.

The interviews were semi structured in order to ask the business experts a collection of conducted questions (see appendix 7.5), which aims to give the accurately same context of questioning (Bryman, 2016). Recording the answers, reduced error due to variation in asking the questions, resulted in a greater accuracy and ease of processing the answers of the respondents. Transcriptions have been made from sound recordings with the approval from the respondents, which can be found in appendix 7.6 till 7.12. 2.3.2. Second step – Collecting the data

The second step was collecting the data. This research focuses on incoming notifications which occurs at the control room in Amsterdam. With the input of the business experts from X, the variables and required datasets for the analysis have been collected. The input data for SmartPLS is derived from internal databases from X, CBS, Rijkswaterstaat (RWS), and Koninklijk Nederlands Meteorologisch Instituut (KNMI). The dataset from X consists of sixty columns related to the data of incidents, such as the type of activity, date, time, location, message, employee, and client. The data has been extracted with SQL queries, an example can be found in appendix 7.13. The data from RWS is retrieved from: https://nis.rijkswaterstaat.nl/SASPortal/main.do. The data from KNMI is retrieved from: http://projects.knmi.nl/klimatologie/daggegevens/selectie.cgi andhttp://projects.knmi.nl/klimatologie/uurgegevens/selectie.cgi. The measurements of different indicators have been extracted per day from station 240 Schiphol, with the longitude 4.790 and latitude 52.318. Each dataset has been generated from March 2017 to April 2018, which is the most accurate and available dataset. These months also contain the four seasons. The first dataset consists of 119,290 different records which has been retrieved from the patrol & response database and contain time stamps for different tasks in the process of handling a notification. The second dataset consists of more than 1,000,000 records which has been retrieved from the control room database. 2.3.3. Third step – filtering the dataset

The next step supports filtering the dataset, only the burglary notifications are relevant for this research. In addition, columns have been extracted based on their

(7)

7 relevance, such as timestamps, locations, and the suspected reason of the notification. During this step, the average of several timestamps of tasks has been calculated per day. The input for SmartPLS has a sample size of 365 per variable. SmartPLS has been downloaded from the website: www.smartpls.com. The University of Amsterdam (UvA) provided a license code for the professional package which enables the user to add unlimited data sets. Pre-processing the datasets was not necessary, because the data needed to be raw for the input for SmartPLS. The data has been converted into .csv file format and imported to SmartPLS.

2.3.4. Fourth step – missing data

After filtering the dataset, the data has been investigated on reliability. Some missing data has been tracked down due to different comparisons and calculations. Missing data is a term which can be defined as a statistical characterized by an incomplete data matrix (Newman, 2014). Analyzing the internal dataset showed that manually registrations are sometimes performed incorrectly. This have been described in the literature as wave nonresponse, which is a theory where the same individuals are measured at two or more times (Graham, 2012). In this case, security guards were missing an entirely wave of measurement. It can be stated that 52.3% of the driving time data was unreliable. Driving time under sixty seconds are extremely exceptional according to X. The reason for this significant percentage can be explained but cannot be substantiated.

The solution for this problem was using zip codes which were available in the dataset. Comparing the zip codes with an Excel script has indicated the distances in kilometers between the starting point and arrival point. Comparing the driving time and the distances has indicated whether the driving time data could be correct or not. Incorrect values have influence on the driving time, which is not representative when it comes to the average driving time per day and thus removed from the dataset.

3. Analysis and results

The preliminary findings concerning this research are based on the following analyses: (1) MTBF, (2) correlation tests, (3) SmartPLS, and (4) prediction methods. The results of each analysis provided information and input for the next analysis.

3.1. MTBF

Parts of MTBF (figure 2) have been applied to gain insights from the process of handling an incident and the effectivity of the different tasks. The start of an incident is the timestamp from the moment an alarm system registers a combination between movement and heat. The time of detection is the timestamp when an incident occurs at the control room. The intervention time is from the moment an incident is picked up by a security guard. At location is the time when a security guard has arrived at the destination. The incident is handled when the security guard has finished the notification en leaves the location.

The formulas can be found in appendix 7.14. The results are expressed in minutes and summarized in table 1.

(8)

8 Figure 2. MTBF

Table 1. Results MTBF per month

Month

Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar System Time x x x x x x x x x x x x Control Time x x x x x x x x x x x x

Driving Time x x x x x x x x x x x x

Time on Site x x x x x x x x x x x x

The Key Performance Indicator (KPI) for system time is based on national rules for the private security branch. Burglaries notifications have been typed as “no direct danger to human lives” by the CCV Certificatie Particuliere Alarmcentrales (2017). They stated that 98,5% from this alarm notifications must be taken within 240 seconds. Based on the data, the KPI is not met every month. However, this is out of scope for this research. KPIs for control time are unavailable. The KPI for driving time has a maximum of thirty minutes per incident. The above table shows that the average driving time per month did not meet this KPI. This can be indicated as a bottleneck within the process, which can have several reasons that cannot be explained with this analysis. The KPI for time on site differs per client because of different contract agreements. Based on this data, it makes it difficult to draw conclusions.

The information in table 1 is useful and indicates the average time for handling an incident. The shorter the process time, the more incidents a security guard can handle, or the falser notifications will be handled.

3.2. Correlation tests

The reasons why certain variables have been used for investigating potential relations is based on the following explanations. Firstly, the variables have available datasets which can be downloaded via reliable websites. Secondly, at least the datasets consist of data per day. The sample size is leading; it was needed to eliminate variables with datasets based on month or year figures. In addition, historical data of X was unavailable through system replacements. Variables such as safety level, traffic intensity, and bankruptcies are eliminated. The indicator for effectivity and efficiency are as follows:

𝑒𝑓𝑓𝑒𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 1 −(driving time − KPI driving time) driving time

𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 =notification − false notifications notifications

The values are standardized on a range from 0 to 1. The closer the value to 1 represents positive effective or efficient performance and closer to 0 indicates strong negative performance.

(9)

9 For SmartPLS, there are thirty-two hypotheses posed which are based on the business experts’ opinions and observations. The measurements of the used variables in the correlation tests are described in appendix 7.15.

H1: There is no correlation between thunderstorms and the number of notifications. H2: There is no correlation between number of doors/windows not closed and moving goods. H3: There is no correlation between wind speed and the number of notifications.

H4: There is no correlation between number of notifications and driving time. H5: There is no correlation between system failures and the number of notifications. H6: There is no correlation between movement, temperature and number of notifications. H7: There is no correlation between storm has and the number of notifications.

H8: There is no correlation between number of doors/windows not closed and the number of notifications.

H9: There is no correlation between number of traffic jams and the driving time. H10: There is no correlation between rainfall, temperature and driving time.

H11: There is no correlation between rainfall, thunderstorms, temperature and power failures. H12: There is no correlation between power failures and system failures.

H13: There is no correlation between national holidays and driving time.

H14: There is no correlation between national holidays and the number of notifications. H15: There is no correlation between holidays and the driving time.

H16: There is no correlation between holidays and the number of notifications. H17: There is no correlation between events and the driving time.

H18: There is no correlation between events and the number of notifications.

H19: There is no correlation between temperature and the number of door/windows not closed.

H20: There is no correlation between temperature and the number of notifications. H21: There is no correlation between weather conditions and efficiency.

H22: There is no correlation between weather conditions and effectivity. H23: There is no correlation between number of traffic jams and effectivity. H24: There is no correlation between events and efficiency.

H25: There is no correlation between events and effectivity. H26: There is no correlation between holidays and efficiency. H27: There is no correlation between holidays and effectivity.

H28: There is no correlation between number of notifications and efficiency. H29: There is no correlation between number of notifications and effectivity. H30: There is no correlation between driving time and effectivity.

H31: There is no correlation between national holidays and efficiency. H32: There is no correlation between national holidays and effectivity.

These hypotheses have been tested based on the correlation and significance results. The first step was calculating the significance. “The relations between factors need to be significant” (Loek, 2017). For this reason, bootstrapping has been used to determine the significance (p-values). If the significance <0.05, the relations can be considered significant and the hypothesis will be rejected. The second step was calculating the effect size, in SmartPLS correlation coefficient r. The rule of thumb is used; (1) none to barely correlation (0.00 < r < 0.30), (2) low correlation (0.30 < r < 0.50), (3) moderate correlation (0.50 < r < 0.70), (4) high correlation (0.70 < r < 0.90), and (5) very high correlation (0.90 < r < 1.00) (Hair et al., 2016).

The results can be seen in appendix 7.16. The following results can be stated: (1) hypothesis 12 is significant with a low effect size, and (2) the hypotheses 8 and 28 are significant with barely any correlation. Hypotheses 8 and 28 are the input for the

(10)

10 efficiencies’ SmartPLS model. Hypothesis 12 is not related to hypotheses 8 and 28, thus not added to the model.

3.3. SmartPLS

The above two hypotheses have been worked out by building the structural model. Building the measurement model has been done by linking the indicators to the latent variables. After building the structural- and measurement models, the path modeling procedure has been started by using the PLS Algorithm for the calculations. SmartPLS contains three stages (Sarstedt et al., 2017). The first stage is the iterative estimation of the latent variable scores, which will be repeated till convergence has been acquired. The second stage, estimating the outer weights or loadings and the path coefficients. During the third stage, the location parameters will be estimated. After calculations, the variables have been checked on reliability and validity.

3.3.1. Effectivity

The original plan was to build the structural- and measurement model for effectivity. However, the hypotheses related to effectivity showed that the results are not significant, or the effect size was too low. This means that the variance in effectivity cannot be explained by the used variables and might be subject to further research. 3.3.2. Efficiency

Figure 3 illustrates the structural- and measurement model. The blue circles represent the structural model, the relation between each construct and its indicator represents the measurement model.

As described before, the measurement model has been evaluated with the model PLS-SEM evaluation stages. The reliability of the

indicators has been checked by looking at the outer loading numbers. An outer loading number of 0.70 or higher is preferred for the reliability value of the indicator. The indicator reliability is expressed in outer loadings numbers. The composite reliability shows the internal consistency reliability which should be 0.7 or higher, except for exploratory research (0.6 or higher). The Average Variance Extracted (AVE) shows the convergent validity which should be 0.5 or higher. After the stage (1.1) of evaluation criteria for reflective models, none of the indicators has been eliminated. The results of stage 1.1 can be found in appendix 7.17, which includes the discriminant validity.

Formatively measured constructs are not included in this model. The second stage was evaluation criteria for the structural model. The inner Variance Inflation Factor (VIF) values has been checked, because the exogenous factors related to the identical endogenous factor should not correlate. The requirement of the inner VIF is to be smaller than 5 (VIF<5), the results can be seen in appendix 7.18. The path coefficients are the results on the arrows, it described how significant the effect on variable X on variable Y is, the closer to the +1.0 or -1.0, the stronger the relation. The results of the path coefficients indicate the following relations; (1) from doors/windows not closed to number of notifications (0.524), and (2) from number of notifications to efficiency (-0.446).

(11)

11 Bootstrapping has been used to determine the p-values of the loadings. “If p < 0.05 the loadings/relations can be considered significant” (Loek, 2017). The results of bootstrapping indicate the p-values of the loadings, which are as followed: (1) from number of doors/windows not closed to number of notifications (0.000), and (2) from number of notifications to efficiency (0.000). The p-values indicate that the path coefficients are significant and should not be removed from the model.

The effect sizes have been compared in order to check whether the relations between the factors were relevant (see table 2). Based on the total effects: (1) small effect (<0.2), (2) medium effect (0.8< < 0.2), and (3) big effect (>0.8). The relations between the factors are relevant, because the relations have a medium effect.

The results of the SmartPLS analysis are expressed in statistics terms. “The 𝑅2_{is a}

measure of the variance explained in each of the endogenous constructs and is thus a model’s predictive accuracy” (Sarstedt et al., 2014). The rule of thumb; substantial (0.75), medium (0.5), and small (0.25). The results of SmartPLS are shows that:

 19.9% (0.199) of the variance in efficiency is explained by the model with three variables;

 27.4% (0.274) of the variance in number of incidents is explained by the number of doors/windows not closed;

 An increase of one unit of the standard deviation of number of doors/windows not closed has an effect on the increase of number of notifications (0.524);

 An increase of one unit of the standard deviation of number of incidents has an effect on the decrease of efficiency (-0.446).

These results indicate that the explained variance of the endogenous are small, which is considered as weak.

There was no reason for eliminating a connection, because the effect sizes (𝑓2_{) where} above 0.02, which can be seen in table 3. The connection between number of notifications and efficiency can be considered as a medium effect. The connection between number of

doors/windows not closed and number of notification can be considered as a large effect.

Another criterion of predictive relevance has been applied, the 𝑄2_{value has been} evaluated by using the blindfolding procedure. “Blindfolding is a sample re-use technique, which systematically deletes data points and provides a prognosis of their original values” (SmartPLS, 2018). The predictive accuracy of the path model is acceptable for the specific construct if 𝑄2_{> 0. “The smaller the difference between} predicted and original values the greater the 𝑄2_{and thus the model’s predictive accuracy”} (Sarstedt et al., 2014). The following results of the blindfolding procedure are stated: (1) 0.193 for efficiency and (2) 0.268 for the number of notifications. These results indicate that predictive accuracy of the efficiencies’ path model is acceptable.

The outcome of the PLS-SEM model evaluation method resulted in the following model, which is illustrated in figure 4.

Table 2. Comparing effect sizes

EF NN NC EF

NN -0.446

NC -0.233 0.524

Table 3. Effect size (f^2)

EF NN NC EF

NN 0.248

(12)

12 Figure 4. SmartPLS model Efficiency

3.4. Predictions

The results of SmartPLS could be used as the input for several ML techniques, such as: neural networks or multivariate regression models. The original plan was to use the latent variables and observations as input. However, another approach was required because the tested variables did not completely explain the number of notifications. Based on this knowledge, the following four methods were applied in order to predict the number of notifications for the next week: (1) using the actual value from the previous week, (2) using the actual value from last year, (3) using the tangent, and (4) polynomial curve fitting. These methods do not require the tested variables.

The first method, is to use the actual value of the previous week to predict the number of notifications for the next week. The following formula was used for this method:

𝑦(𝑥𝑛) = 𝑦(𝑥𝑛−1)

The second method uses the actual value from last years’ corresponding week. This has been done by applying the following formula:

𝑦(𝑥𝑛) = 𝑦(𝑥𝑛−52)

The third method, is using the actual number of notifications from the previous week, plus the difference between the previous (X – 1) - and the previous before that week (X – 2). This line can be continued to the next data point, the next week in this case. The following formula was used for this method:

𝑦(𝑥𝑛) = 𝑦(𝑥𝑛−1) + (𝑦(𝑥𝑛−1) − 𝑦(𝑥𝑛−2))

The fourth method is polynomial curve fitting. The number of notifications can be seen as a curve over time. By discovering the underlying function, it is possible to make predictions by extrapolation. Polynomial functions can approximate a lot of different functions, therefor it makes sense to estimate the unknown underlying function by a polynomial function, given by (adapted from Bishop, 2006):

𝑦(xn, 𝐰) = 𝑤0+ 𝑤1𝑥𝑛+ 𝑤2𝑥𝑛2+ ⋯ + 𝑤𝑀𝑥𝑛𝑀= ∑ wj𝑥𝑛 𝑗 𝑀

𝑗=0

In these functions, y represents the number of notifications per week and 𝑥𝑛 corresponds to the week number. Fifty-five data points were used to calculate the polynomial coefficients w that minimizes the Squared Error function. To control

(13)

over-13 fitting, a regularization term was added to this function. K-fold cross validation was used to determine the regularization coefficient and the polynomial coefficient M.

The python script for polynomial curve fitting can be found in appendix 7.19. After running the algorithm with seed (2) and thirteen folds, which means that the data has been split into thirteen training and validation sets. The algorithm has a polynomial degree of seven with a regularization term of 0.0001. This resulted in the following

formula to describe the number of notifications:

𝑦(x, 𝐰) = 279.5 + 2.10𝑥 + 0.028𝑥2_{+ 3.59 ∗ 10}−4_𝑥3_{− 7.71 ∗ 10}−5_𝑥4_{− 1.49 ∗ 10}−6_𝑥5_{+ 3.85 ∗ 10}−8_𝑥6

Plotting this formula over time is shown in figure 5. The blue points illustrate the training set data points, the red line is the found polynomial function. Extrapolating the function to the next week predicts the number of notifications for that week. After the prediction for the next week has been made, the actual number of notifications was used to retrain the algorithm.

The Root Mean Square Error (RMSE) was used to indicate the predictive ability of the four methods and is calculated with the following formula (adapted from Bishop, 2006):

𝑅𝑀𝑆𝐸 = √∑ {𝑦(𝑥𝑛) − 𝑡𝑛} 2 𝑁

𝑛=1 𝑁 Table 4. RMSE results per method

Prediction week Target Method 1 Method 2 Method 3 Method 4

2018-04-02 x x x x x 2018-04-09 x x x x x 2018-04-16 x x x x x 2018-04-23 x x x x x 2018-04-30 x x x x x 2018-05-07 x x x x x 2018-05-14 x x x x x 2018-05-21 x x x x x RMSE x x x x

This relatively high numbers of RMSE can be explained through the variance in the number of notifications. It has been concluded that the fourth method has the best predictive ability by retraining the algorithm with the actual number of notifications. The predictability will decrease if the RMSE increases.

The next step could be using a Bayesian linear regression model which uses this variance to determine the certainty.

(14)

14 4. Conclusions and Advice

4.1. Conclusions

In this paper, the potential relationships between the selected independent variables and efficiency has been examined. Applying parts of MTBF has gained insights of the task time completion of a notification.

By using the PLS-SEM model evaluation method, it can be concluded that almost each statement is invalid. The results related to effectivity are not as expected and due to the lack of significance and the low effect sizes it is not easy to draw conclusions, this might be the input for further research. The variance of the variable number of notifications is explained by the variables number of open doors/windows with 27.4%. The variance of efficiency is explained by the model with three variables with 19.9%. The results of the blindfolding procedures indicate that models’ predictive accuracy is acceptable, which might lead to improving the efficiency within the private security branch. However, these results require further research, because 80% of the variance in efficiency is explained by other variables.

Within this research, the predictions of the number of notifications per week have been made by applying four different methods. Method 4, polynomial curve fitting, has to most predictive ability. This method has been executed by writing a Python script that fits a polynomial function. This function is subsequently extrapolated. The algorithm has been calculated and tested by using the actual number of notifications, which resulted in a RMSE of x.

4.2. Advice

The results of this research might be of value for the organization in which this research has been conducted, namely X.

The first finding in this research is that the business experts have their own opinions and experiences which they use to steer their processes, efficiency, and effectivity. Based on the data, the business experts’ statements have been investigated by executing correlation tests. The identified lack of significance and the low effect sizes between variables might be used for further research. The results of SmartPLS shows that: the efficiency will decrease if the number of notifications increases.

The second finding, it has been noticed that X collects a lot of data via databases and Excel sheets. However, by converting more data into information they might be able to gain more insights and improvement of their performances by focusing more on the what and why questions which can be extended with the how questions. These how questions might be answered with data science (data mining). This research is the beginning of an innovative route, as it raises many more questions than this research can discuss. The short-term advice is to firstly collect the meta data from the available sources within X Netherlands. Secondly, extract the data, transform it and load it into a central data warehouse. Thirdly, a Business Intelligence (BI) tool can be used to analyze the data and might give answers on the how questions. This allows improvement of efficiency and effectivity, where managers can respond better to the demand and answer specific issues. This can limit the manual actions, which can improve the reliability of the data. After this, the next step is implementing data mining, where further research is needed to tackle questions such as: what relevant predictions could be, how false notifications can be detected, and how it is possible to reduce the false notifications per

(15)

15 day. This can lead to minimizing the risk and maximizing safety and contribute to both company and society, “a prediction route with efficiency of security as destination”.

5. Discussion

During the research, several setbacks appeared and had to be processed. With a result of time and resource constraints, a full data saturation has not been reached.

The respondents of the interviews had a lot of personal interests, for this reason bias can be recognized. However, the interviews are replicable. In order to avoid probing and not reflecting ‘true’ variation, the respondents were provided with a list of possible answers when the questions got more difficult. The semi closed-ended questions and the suggestions were identical for all the interviews, flashcards have been used to display the suggestions.

The chosen methods are considered as appropriate for this research because of the following causes: (1) the availability of information about the private security branch, (2) the availability of the described data sources, and (3) the assertions of X business experts without being examined on the basis of historical data. The time, historical events and cohort effects are taken into account. It might be possible that the results are outdated over time which requires an update. By describing the approach and all the methods it is possible to replicate this research.

6. Future research

After investigating the available literature, it has been noticed that research and information within the private security branch often is difficult to obtain.

Future research for X can be supported by a full MTBF to gain more insides on the effectivity. The data for MTBF is available within the internal sources of X. The method can be used to indicate for example; (1) which moments are the busiest moments at the control room, (2) the average time between two notifications, and (3) whether the KPIs have been achieved on an hourly basis. Results from MTBF can be used for predictions, indicating after how long a subsequent notification will occur.

In this case, polynomial curve fitting has a RMSE of x. This could be further minimized by using more training data and possibly exploring an error function that is less prone to outliers. A Bayesian linear regression model can be used to determine the uncertainty of the predictions. This might be the input for further research.

This research shows that there are relationships between independent variables and the effectivity and efficiency which might improve the process of deployment within the private security in the future. However, this research has focused on the city Amsterdam. Without time limitations, this research could be done per region. If the year historical data would be available, it would be interesting to calculate and investigate the relationships between the variables which are described in the interview questions. The SmartPLS analysis can also be applied per season, the potential relations can be seasonal which can be done with a Multigroup Analysis (MGA).

This research can lead to further research within the private security branch on the following questions with new technologies in mind. Is the influence from the independent variables on the efficiency seasonal? Can predictions eventually replace certain actions of deployment within the security branch? Can predictions in the private

(16)

16 security branch be used to improve the usability by combining measures of efficiency, effectiveness and user satisfaction?

The report has been anonymized on the basis of competitive aspects, for this reason some of the results have not been presented. For further questions, I can be reached at the following e-mail address: darbywesselink@hotmail.com

References

Bishop, C. M. (2006). Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg.

Brownlee, J. (2017, December). Difference Between Classification and Regression in Machine Learning. Retrieved April 26, 2018 from

Bryman, A. (2016). Social research methods. Oxford university press.

CBS. (2018). Geregistreerde diefstallen; diefstallen en verdachten, regio. Retrieved April

12, 2018 from

http://statline.cbs.nl/StatWeb/publication/?VW=T&DM=SLNL&PA=83651NE

D&LA=NL

Centrum voor criminaliteitspreventie en veiligheid. (2017). CCV-certificatieschema, Particuliere Alarmcentrales (PAC). Retrieved June 4,2018 from X.

Fethi, M. D., & Pasiouras, F. (2010). Assessing bank efficiency and performance with operational research and artificial intelligence techniques: A survey. European journal of operational research, 204(2), 189-198.

Freitag, D. (2000). Machine learning for information extraction in informal domains. Machine learning, 39(2-3), 169-202.

Frøkjær, E., Hertzum, M., & Hornbæk, K. (2000, April). Measuring usability: are effectiveness, efficiency, and satisfaction really correlated?. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems (pp. 345-352). ACM.

Gravetter, F. J., & Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences, 7th edition, 2-634.

X Nederland. (2018). Over X. Retrieved April 4, 2018 from

http://www.X.nl/nl-nl/over-X

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144. Garson, G. David (2016), Partial Least Squares Regression and Structural Equation

Models. Asheboro: Statistical Associates. Retrieved from:

https://www.smartpls.com/documentation/learn-pls-sem-and-smartpls/ebook-on-pls-sem

Graham, J. W. (2012). Missing data: analysis and design. New York, NY: Springer, 3- 323.

Grewal, R., Cote, J. A., & Baumgartner, H. (2004). Multicollinearity and measurement error in structural equation models: Implications for theory testing. Marketing Science, 23(4), 519-529.

Gudergan, S. P., Ringle, C. M., Wende, S., & Will, A. (2008). Confirmatory tetrad analysis in PLS path modeling. Journal of business research, 61(12), 1238-1249. Hair Jr, J. F., Hult, G. T. M., Ringle, C., & Sarstedt, M. (2016). A primer on partial least

(17)

17 Henseler, J., Ringle, C. M., & Sarstedt, M. (2015). A new criterion for assessing discriminant validity in variance-based structural equation modeling. Journal of the academy of marketing science, 43(1), 115-135.

https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/

Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372-411.

Sarstedt, M., Ringle, C. M., Smith, D., Reams, R., & Hair Jr, J. F. (2014). Partial least squares structural equation modeling (PLS-SEM): A useful tool for family business researchers. Journal of Family Business Strategy, 5(1), 105-115. Sarstedt, M., Ringle, C. M., & Hair, J. F. (2017). Partial least squares structural

equation modeling. In Handbook of market research (pp. 1-40). Springer International Publishing.

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.

SmartPLS (2018). PLS Algorithm. Retrieved April 10, 2018 from

https://www.smartpls.com/documentation/algorithms-and-techniques/pls

SmartPLS (2018) . Blindfolding. Retrieved June 11, 2018 from

https://www.smartpls.com/documentation/algorithms-and-techniques/blindfolding

Stolwijk, A. M. (2017, November). Structural Equation Modelling. Retrieved April 4, 2018 from Loek Stolwijk.

Torell, W., & Avelar, V. (2004). Mean time between failure: Explanation and standards. white paper, 78.

Weiss, S. M., & Indurkhya, N. (1998). Predictive data mining: a practical guide. Morgan Kaufmann.

Witten, I. H., Frank, E., Hall, M. A., & Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann.

Wong, K. K. K. (2013). Partial least squares structural equation modeling (PLS-SEM) techniques using SmartPLS. Marketing Bulletin, 24(1), 1-32.

van Steden, R., & Nalla, M. K. (2010). Citizen satisfaction with private security guards in the Netherlands: Perceptions of an ambiguous occupation. European Journal of Criminology, 7(3), 214-234.

7. Appendix 7.1. SEM

SEM is a second-generation tool which combines factor analysis and multi regression for investigating the direct and indirect effects of the chosen independent and dependent variables. SEM consists of two different approaches; (1) covariance based and (2) variance based. Where the covariance-based approach focuses on the relationship between factors by minimizing the difference between prediction and the reality. The variance-based modeling approach employs linear composites of variables which are observed as proxies for the latent variables with the aim of estimating the model relationships (Henseler, Ringle & Sarstedt, 2015). The variance-based modelling approach uses the standardized z-scores of measurements of small samples which have no requirements with respect to distribution to measurements (Loek, 2017). The z-score

(18)

18 identifies the exact location of each X value in a distribution. The Z-score can be above the mean (positive +) or under the mean (negative -). “The numerical value of the z-score specifies the distance from the mean by counting the number of standard deviations between X and 𝜇 (Gravetter & Wallnau, 2011).

7.2. SmartPLS

The PLS-SEM has two types of measurements which allows reflective and formative computations (Gudergan, Ringle, Wende & Will, 2008), which is illustrated in figure 6.

A reflective model uses a variety of techniques of; (1) scale construction, (2) factor analysis, (3) measurements assessment, and (4) classical test theory (Gudergan et al., 2008). The measurement path arrows start at the latent variable to the measured indicator variables. A formative model will be utilized when an explanatory combination of the indicator variables underlies the latent construct (Gudergan et al., 2008), the path arrows are going from the indicator to the latent variable.

“SmartPLS is the most prevalent implementation as a path model” (Garson, 2016), which is developed by Wold in 1992 (SmartPLS, 2018). It is a user-friendly modeling package for PLS analysis which is supported by a section located in Hamburg, Germany. The input for SmartPLS needs to be raw data instead of standardized data. Standardization is automatically implemented; every indicator weight and latent variable scores are continually standardized. SmartPLS supports .csv and .txt text-type data files for importing the data. The results - output - from SmartPLS can be exported to Excel, HTML or R.

7.3. MTBF

MTBF can be used as a method and procedure for lifecycle predictions. It often has been used in “the design of mission critical facilities that house IT and telecommunications equipment” (Torell & Avelar, 2004) and simplify the complexity. Downtime of systems can have a negative impact on the market value of the business, where minimizing the downtime minutes is crucial. A second crucial point is the reliability of the physical infrastructure which supports that network environment. Research has shown that the target of business reliability may not can be achieved without a solid understanding of MTBF (Torell et al., 2004).

(19)

19 7.4. Respondents

The table contains the respondents’ title, organization, and background. Respondent Organization and Title Background

R.1 x x R.2 x x R.3 x x R.4 x x R.5 x x R.6 x x R.7 x x 7.5. Interview questions Interview vragen Introductie

1. Zou u zichzelf voor willen stellen? 2. Wat zijn uw werkzaamheden?

3. Wat is uw rol en waar bent u verantwoordelijk voor? Efficiency

4. Wat verstaat u onder efficiency? 5. Wat verstaat u onder effectief?

6. Is het verbeteren van de efficiency en/of effectiviteit een onderdeel van uw werkzaamheden?

7. Op welke manier houdt u rekening met efficiency en/of effectiviteit? 8. Op welke manier meet u efficiency en/of effectiviteit?

9. Op welke manier heeft u geprobeerd om de efficiency en/of effectiviteit te verbeteren?

Introductie van het onderzoek.

Onderzoek en variabelen

10. Is het voor u duidelijk wat ik ga onderzoeken?

11. Waar liggen de raakvlakken die voor u van toepassing/belang zijn? 12. Welke (onafhankelijke) variabelen kunnen volgens u invloed hebben op de

efficiency en/of effectiviteit?

13. Welke indicator heeft volgens u meer invloed op de efficiency en/of effectiviteit? (Maak gebruik van de flash cards)

Indicator 1 Indicator 2

Het actuele weer De weersvoorspelling Vakanties Vogeltrek

(20)

20

Files Verkeersintensiteit Drukte op Schiphol Vertragingen van vluchten Evenementen Kijkcijfers

Werkloosheid Faillissementen

Aantal meldingen politie Aantal meldingen brandweer Veiligheidsniveau Dreigingsniveau

Reizigers Kilometers Personen Mobiliteit Uitval van systemen Stroomstoringen

Stakingen vliegend personeel Stakingen grondpersoneel

Voorspellingen

14. Wat verwacht u van ‘voorspellingen’ met betrekking op de inzet van werknemers?

15. Op welke manier kunnen de voorspellingen voor u het best weergegeven worden? 16. Op welke manier zou u de minimale en maximale bezettingen uitgedrukt willen

hebben?

7.6. Transcription interview respondent 1 Removed

(21)

21 Removed

(22)

22 Removed

(23)

23 Removed

(24)

24 Removed

(25)

25 Removed

(26)

26 Removed

(27)

27 Removed

(28)

28 Removed

(29)

29 Removed

(30)

30 Removed

(31)

31 7.7. Transcription interview respondent 2

(32)

32 Removed

(33)

33 Removed

(34)

34 Removed

(35)

35 Removed

(36)

(37)

37 Removed

(38)

38 Removed

(39)

39 Removed

(40)

40 Removed

(41)

41 Removed

(42)

42 Removed

(43)

43 Removed

(44)

44 Removed

(45)

45 Removed

(46)

46 Removed

(47)

47 Removed

(48)

48 Removed

(49)

(50)

50 Removed

(51)

51 Removed

(52)

52 Removed

(53)

53 Removed

(54)

54 Removed

(55)

55 Removed

(56)

56 Removed

(57)

57 Removed

(58)

58 Removed

(59)

59 Removed

(60)

(61)

61 Removed

(62)

62 Removed

(63)

63 Removed

(64)

64 Removed

(65)

65 Removed

(66)

66 Removed

(67)

67 Removed

(68)

68 Removed

(69)

(70)

70 Removed

(71)

71 Removed

(72)

72 Removed

(73)

(74)

74 Removed

(75)

75 Removed

(76)

76 Removed

(77)

77 Removed

(78)

78 7.13. SQL Query

/****** Script for SelectTopNRows command from SSMS ******/

SELECT case when message in('Driving Time - Object','Time on Site - Object', 'Driving Time - Incident' , 'Time on Site - Incident' ) then left(message,12) else '' end TYPE,month(wijk.Tijd) Month, datepart(wk,(wijk.tijd)) week, wijk.weekdag, left(x.BookingObjectZip,4) PC4, left(x.BookingObjectZip,3) PC3, left(x.BookingObjectZip,2) PC2, left(x.BookingObjectZip,1) PC1,wijk.DuurActiviteit /60 DuurMin, wijk.GeplandeDuur /60 planduurmin

,*

FROM [BI_STAGING_SECURE].[dbo].[Wijkanalyse] as Wijk left join

(

SELECT Distinct BookingObjectGenesisContract , BookingObjectZip , BookingObjectHsnr, BookingObjectStreet, BookingObjectCity

FROM [BI_STAGING_SECURE].[dbo].[STG_Type0] where bookingobjectgenesiscontract <> 0

) as x on x.BookingObjectGenesisContract = wijk.Client where tijd between '2017-01-03' and '2017-01-09' and BookingObjectGenesisContract <> 0

and Personeelsnummer <> 0

and left(x.BookingObjectZip,2) in ('10','11') order by tijd, Dienst, Client

7.14. Formulas MTBF analysis 𝑆𝑦𝑠𝑡𝑒𝑚 𝑇𝑖𝑚𝑒 =Σ(𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 − 𝑠𝑡𝑎𝑟𝑡 𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑡) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝐶𝑜𝑛𝑡𝑟𝑜𝑙 𝑇𝑖𝑚𝑒 =Σ(𝑖𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛 − 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝐷𝑟𝑖𝑣𝑖𝑛𝑔 𝑇𝑖𝑚𝑒 =Σ(𝑎𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 − 𝑖𝑛𝑡𝑒𝑟𝑣𝑒𝑛𝑡𝑖𝑜𝑛) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 𝑇𝑖𝑚𝑒 𝑜𝑛 𝑆𝑖𝑡𝑒 =Σ(𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑡 ℎ𝑎𝑛𝑑𝑙𝑒𝑑 − 𝑎𝑡 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛) 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑜𝑡𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛𝑠 7.15. Variables

Nr. Variable Indicator Explanation

1. Number of power failures PF1 Number of power failures per day 2. Number of system failures SF1 Number of system failures per day 3. Number of doors/windows not

closed

NC1 Number of doors/windows not closed 4. Moving goods MG1 Number of noticed moving goods at the

(79)

79

5. Temperature TE1 Measured temperature in degrees Celsius 6. Holidays HO1 Holidays (0 = none, 1 = holiday for

primary and secondary schools)

7. Events EV1 Number of public events per day in Amsterdam

8. Thunderstorms TH1 Thunderstorm (0 = none, 1 = measured thunderstorm)

9. Number of notifications NN1 Number of notifications per day 10. Weather conditions WC1 Mean daily cloud cover (in octants, 9=sky

invisible)

WC2 Daily rainfall amount in mm

WC3 Daily mean temperature in degrees Celsius

WC4 Daily mean windspeed

WC5 Thunderstorm (0 = none, 1 = measured thunderstorm)

WC6 Snowfall (0 = none, 1 = measured snowfall)

11. Efficiency EF1 ((Number of notification – unrecordable notifications) / number of notifications 12. National holidays NH1 National holidays in the Netherlands (0 =

none, 1 = measured snowfall)

13. Driving time DT1 Daily mean driving time to location of notification

14. Wind speed WS1 Daily mean windspeed

15. Traffic jams TJ1 Number of traffic jams from Amsterdam TJ2 Number of traffic jams to Amsterdam 16. Rainfall RA1 Daily rainfall amount in mm

17. Effectivity EF2 ( 1 – (driving time – KPI driving time)) / driving time

7.16. Hypothesis testing

Significance Hypothesis Effect size Result H1 0.000 Rejected 0.031 No correlation H2 0.006 Rejected 0.027 No correlation H3 0.363 Accepted - - H4 0.002 Rejected 0.032 No correlation H5 0.038 Rejected 0.008 No correlation H6 0.000 Rejected 0.139 No correlation 0.000 Rejected H7 - - - -

H8 0.000 Rejected 0.274 Barely any correlation H9 0.000 Rejected 0.042 No correlation H10 0.103 Accepted - - H11 0.401 Accepted - - H12 0.001 Rejected 0.414 Low correlation H13 0.392 Accepted - - H14 0.279 Accepted - - H15 0.630 Accepted - - H16 0.036 Rejected 0.011 No correlation H17 0.725 Accepted - -

(80)

80 H18 0.000 Rejected 0.040 No correlation H19 0.036 Rejected 0.014 No correlation H20 0.000 Rejected 0.105 No correlation H21 0.211 Accepted - - H22 0.318 Accepted - - H23 0.200 Accepted - - H24 0.096 Accepted - - H25 0.133 Accepted - - H26 0.269 Accepted - - H27 0.303 Accepted - -

H28 0.000 Rejected 0.199 Barely any correlation H29 0.024 Rejected 0.012 No correlation H30 0.088 Accepted - - H31 0.233 Accepted - - H32 0.878 Accepted - -

7.17. Results stage 1.1 from PLS-SEM evaluation method

Discriminant Validity

7.18. Inner VIF Values EF

EF 1.000

NC 1.000

NN 1.000

Latent Variable Indicators Indicator Reliability Composite Reliability AVE Number of doors/windows not closed NC1 1.000 1.000 1.000 Number of notifications NN1 1.000 1.000 1.000 Efficiency EF1 1.000 1.000 1.000 EF NN NC EF 1.000 NN -0.446 1.000 NC -0.283 0.524 1.000

(81)

81 7.19. Python script

(82)

(83)