Monitoring systems with container technology

(1)

Monitoring systems with container technology

The discovery of monitoring container technology for a multinational company.

Ashwin Kahmann

10633030

12-04-2019

Abstract: Cloud computing and container technology can be beneficial for enterprises. To be able to monitor enterprise data when using these technologies, a movement from static monitoring to dynamic monitoring occurs. This movement contains certain complications. These complications are the motivation of the following research question: How would it be possible to monitor a cloud and container technology for a multinational company?

A data science methodology is applied in combination with the framework of Lacum and the framework of Johannesson & Perjons. The theoretical findings have been applied on an empirical subject to test the theoretical findings. The empirical subject is an enterprise called KPMG. This paper finds how enterprise data can be monitored when it is stored within containers in the cloud and how the monitored data could be transformed into visual interpretable information. Furthermore, it finds how this process of monitoring could be automated. The combination of these findings has optimized the management team of the empirical subject.

(2)

2

Non-data driven organizations base their decisions on intuition, experience and expertise of their employees. However, organizations could also base their decisions on data, correlations and empirical evidence. Those are called data-driven organizations. Data is the foundation of these organizations. It can be collected through data monitoring systems. These systems are designed to collect, visualize and analyze data in order to fit a purpose. In the case of data-driven organizations, the ultimate goal of data monitoring is to support a decision making process.1

Over the last couple of years, enterprises are moving towards cloud computing. This is especially true for multinational enterprises (“Cloud Computing: Well-Known Companies Who Have Moved to the Cloud,” 2013). An understanding of why this is happening, could be acquired when looking at benefits which cloud computing offers. When described briefly, cloud computing can provide enterprises with an increase in flexibility, scalability on demand, multiple storage possibilities, pay-per-use, availability, accessibility, and higher security features ("Benefits of cloud computing,” 2019).2

The kernel isolation of containers, which is better known as container technology is an operating system that allows programs and applications to run in isolation from other processes. Some benefits of using containers are resource probability, efficiency, simplicity, and scalability. Major cloud computing providers offer containerized technology as a service.3

The motive of this paper comes from the fact that companies which are data driven, work with cloud computing and use container technology are experiencing complications with their monitoring systems. Monitoring systems provide data, which could be used for data-driven decision making. Meanwhile, monitoring platforms are changing in order to adapt to the changing cloud-base applications and dynamic infrastructure. This however, does not transpire without complications. Traditional monitoring focus around services status, CPU memory usages and tracking logging events. The transition to the cloud has led the focus to be moved away from these traditional monitoring focus components. The cloud computing could even become completely serverless.4_{Instead, the focus has been set on application performance,} metrics and dependencies. Through a cloud provider it is possible to use special services for databases, querying, storage and more. However, traditional monitoring systems are not built to provide the aforementioned services. Therefore, a specialized cloud monitoring solution is

1_{Explanation of data-driven decision making is located in the Appendix 1: Data-driven decisions.} 2_{Explanation of cloud computing is located in the Appendix 1: Cloud Computing.}

3_{Explanation of kernel isolation of containers is located in the Appendix 1: Virtual Machines and Appendix 1:} Docker Containers.

(5)

5

needed instead. When data driven companies are not able to base their decisions on data, a problematic dilemma occurs. Thus, solving this dilemma is the motive for creating the main research question of this paper: How would it be possible to monitor a cloud and container technology for a multinational company?

(Marston, Li, Bandyopadhyay, Zhang, & Ghalsasi, 2011; Kshetri, 2013; Rittinghouse, & Ransome, 2016; Ishrat, Saxena, & Alamgir, 2012).

(6)

6

1.2 Introduction - Goal

The goal of this paper is to find an answer of the main research question. To establish a complete answer, a profound understanding of each aspect is required. Furthermore, a theoretical research as well as empirical research is conducted to answer the main research question. Hereby, a solution must be found to a real-life case which aids in finding an answer to the main research question.

A design science methodology is applied to research this issue (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007).

The Comprehensive Argumentative Structure framework of van Lacum has been used as the structural backbone of this paper. The framework depicts the rhetorical moves of a research cycle and their relations. The framework is displayed in figure 1.0.

Figure 1.0 The Comprehensive Argumentative Structure of van Lacum 2013

The Comprehensive Argumentative Structure of van Lacum can be divided into four aspects. The first aspect focuses on the purpose of the paper which is titled Introduction and contains the Motive and the Goal.

The second aspect explores the theoretical and empirical side of this research. This aspect is called Method and contains the Experimental procedure and the Data collection.

(7)

7

During the Method aspect, another method framework is used as a guideline to guarantee the quality and completeness of the empirical research. The framework is called the Method Framework for Design Science Research, displayed in figure 1.1. This framework consists of five main activities, which are ordered. The description of the actions being ‘Explicate problem’, ‘Define requirements’, ‘Design and develop artefact’, ‘Demonstrate artefact’ and ‘Evaluate artefact’ (Johannesson & Perjons, 2014). During the research, each action is explained, substantiated and applied. This paper has added a new part called “Theoretical framework” in Method, in order to explore the “Explicated problem” and “Define requirements” of the Method Framework for Design Science Research.

In the Results, which is the third aspect of van Lacum, findings are described and evaluated by Tables and figures, Explanation of findings and Temporary conclusion.

Finally, the last aspect of the framework discusses and concludes the outcome of this research. Hereby, the Counter-arguments, Final conclusion and Implications create the Discussion aspect (van Lacum, 2013).

In the appendix terminology, techniques and architectures are explained, in order to establish a certain knowledge level. Graphs, research results and codes are also displayed as well for substantiating and clarification reasons.

In order to find an objective and data-driven answer to the main research question, a theoretical research alone is not sufficient enough. An empirical research must be conducted as well, in order to test the theoretical findings.

For the empirical research an enterprise is required. Preferably a multinational enterprise, as is also stated in the main research question. Reason being, a multinational enterprise conducts business within a broad geographical range. Therefore, by moving towards cloud technology these enterprises can achieve a higher advantage, in comparison to an enterprise which is not internationally located. The maximal advantage can be achieved in the area of accessibility, efficiency and availability. This enterprise must apply a data driven culture and at least manage some form of data-driven decision making.

Furthermore, this enterprise must use cloud computing or show the desire to make a transition towards cloud technology. Finally, the enterprise must currently apply a monitoring system on which an alternative monitoring system can be compared to.

As the main research question of this paper is based on complications which enterprises face, a profound answer to these complications can only be found and proven by solving a similar complication in a real-life enterprise. By finding a solution on the empirical case in combination with the theoretical frameworks and literature, an understanding and answer on the main research question can be found.

(8)

8

(9)

9

2.1 Method - Theoretical Framework

2.1.1 Explicate Problem

A static monitoring system can perfectly monitor data of one single system. Even multiple systems could be monitored by a static monitor system. However, complications occur when instead of monitoring a multiple system, a microservices architecture is desired.5_A microservices architecture contains multiple systems. Each system contains a number of processes. All microservices systems contain a number of processes which cannot be monitored by a static monitoring system. When these microservices are combined with cloud computing, of which the architecture is designed to be flexible and scalable, a dynamic environment is established which cannot be monitored by a static monitoring system. This happens besides the fact that static monitoring systems could encounter network issues when connecting to the data for monitoring, due to dynamic path variables. A solution for this dynamic environment problem, could be a dynamic monitor system. Multiple dynamic monitor systems have been developed in the past few years, each with its strengths and weaknesses. There has not been a dynamic monitoring system which is the most popular with enterprises or exceedingly better in comparison with other monitoring systems.

As cloud computing is recently is able to fully sustain complete enterprises, not many enterprises have made a complete transition to a serverless architecture. Now that cloud computing has become suitable, a transition towards the cloud has been occurring within enterprises.6_{Enterprises which work with cloud computing could profit from multiple benefits.} Especially multinational enterprises, as they conduct business within a broad geographical range. Therefore, by moving towards cloud technology they can achieve a higher advantage, in comparison to an enterprise which is not internationally located. Cloud computing contains a microservice architectural way of working. Therefore, monitoring data in the cloud is not realizable with a static monitor system. When enterprises make a transition to the cloud, they encounter the dilemma of making a transition from a static monitor system to a dynamic monitor system. Cloud computing is not the only architectural way of working which can provide a microservices structure. Container technology is also heavily built on the idea of a microservices architecture.7_{Container technology is depending on cloud computing. Therefore, when using} container technology, cloud computing is required. Enterprises who are interested in microservices architecture can implement both cloud computing and container technology in their environment. With these developments and transitions to cloud computing and container technology, the monitoring complications became a serious issue for enterprises. These new

5_{Explanation of microservices architecture is located in the Appendix 1: Microarchitecture.} 6_{Explanation of cloud computing is located in the Appendix 1: Cloud Computing.}

7_{Explanation of container technology is located in the Appendix 1: Docker container and Appendix 1:} Kubernetes cluster architecture.

(10)

10

complications with enterprises, have led to the development of the main research question: How would it be possible to monitor a cloud and container technology for a multinational company?

Empirical research

To solve the main research question, a method has to be designed which describes the theoretical answer. However, to substantiate this theoretical answer, an empirical research must be conducted. In order to do so, an empirical subject need to be chosen. This empirical subject needs to satisfy certain requirements. It must be an enterprise that has a data-driven culture, has made the move towards cloud technology, and towards containerized technology. Also it has to be a multinational enterprise.

After conducting social and online research, based on companies who are working with cloud technology, several enterprises have been considered such as Nederlandse Spoorwegen (NS), Ernst & Young and KPMG. The NS made the movement towards cloud computing by using one of the main cloud providers. It has also implemented containerized technology. However, this enterprise is not a multinational company and therefore does not satisfy all requirements.

Another enterprise called Ernst & Young has been found, together with an enterprise called KPMG. Both enterprises satisfy all the requirements and have been approached in order to conduct an empirical research. Each of the enterprises have showed interest in the research after being approached. KPMG seemed to be more time sufficient by corresponding faster and not requiring as much documents and tests as Ernst & Young. For these reasons, KPMG has been chosen as an empirical subject for this research.

As stated above, KPMG is more resource sufficient in comparison to Ernst & Young and therefore is chosen as the empirical subject of this research.KPMG which stands for Klynveld Peat Marwick Goerdeler, is a professional service company. It is one of the Big Four auditors, along with Deloitte, Ernst & Young, and PricewaterhouseCoopers. KPMG operates as a global network of independent member firms whose clients include business corporations, governments, public sector agencies and non-profit organizations. It offers audit, tax and advisory services; working closely with clients, helping them to mitigate risks and grasp opportunities. One of the departments of business support is the Information Technology department. This department of KPMG Netherlands is officially called KPMG Information Technology Services Netherlands, which from now on is be referred as ITS NL (“KPMG International,” 2019).

Multiple firms and departments of KPMG started using cloud technology over the last couple of years. They have made the movement towards cloud computing, with the goal of serverless

(11)

11

architecture. This means that all their resources are solely running in the cloud without tangible servers of their own. Momentarily, KPMG is using a hybrid cloud solution, which means that they are being provided with cloud and non-cloud servers.

KPMG has chosen one of the lead cloud providers called Microsoft Azure as their cloud computing provider. Microsoft Azure is a cloud computing service created by Microsoft for building, testing, deploying, and managing applications and services through a global network of Microsoft-managed data centers. It provides all the cloud technology services and many different programming languages, tools and frameworks, including both Microsoft-specific and third-party software and systems. (“Microsoft Azure Cloud Computing Platform & Services,” 2019).

The reason that KPMG uses Microsoft Azure instead of other cloud computing providers, is simply based on the fact that KPMG has a partnership with Microsoft. Therefore, the decision to choose Azure is the most profitable one for KPMG. Details of their partnership is out of the scope of this research. (“KPMG and Microsoft,“ 2019).

Before KPMG implemented cloud computing into its enterprise, they already monitored their own data. This was done by using a monitoring tool called System Center Operations Manager, which from now on is referred to as SCOM. It is a cross-platform data center monitoring system for operating systems and hypervisors. This is the static method of monitoring. KPMG has been using SCOM from the beginning of its creation, which started in 2007 and is currently still being used.

On one hand, KPMG used SCOM for its multiple advantages, especially in the area of operation manager. On the other hand, when it comes to monitoring, a multitude of issues occurred. Especially with the default capabilities of SCOM. ITS NL experienced the following complications.

SCOM as the name states, it is an operation manager. However, a monitoring tool is part of an operation manager; it provides certain operations which results in data which in turn is used to manage. SCOM could monitor data which is used to manage operations. This combination of various possibilities, causes issues for ITS NL. For instance, reactive monitoring is not included in the service nor service tracking, while high availability groups and data recovery are. An accurate performance threshold is hard to establish. These issues are due to the philosophy of SCOM being an operation manager and not a monitoring tool.

Another complication, is the reporting capabilities. SCOM reports are basic and limited. Combining multiple reports together is not available and making forecasts based on multiple statistics or manage capacity planning is not possible.

(12)

12

Besides reporting, there are also issues with the alert management process. It contains multiple complications caused due to filtering, rules, internal politics, management issues, auto-resolves and more.

ITS NL solved these complications by enriching their inhouse knowledge level of SCOM. A Subject Matter Expert of SCOM which from now on will be referred to as SME, has been hired. The SME is able to transfer the default capabilities of SCOM into a complete operating system that satisfies all the needs and wants of ITS NL. This is done by building packages and implementing services from third parties. There has been an implementations ofJalasoft Xian, SolarWinds, Derdack andmore. Most of the issues have been solved and SCOM became useful until KPMG decided to implement cloud computing. This movement towards the cloud creates new complications with monitoring.

One of these complications being that SCOM has not been able to always monitor KPMG applications and products. SCOM is not able to monitoring applications and products outside the KPMG network, it can monitor applications and products that are running in the internal network of KPMG or with the establishment of a KPMG VPN connection. From the moment an application or product switches over from a KPMG network to an external network, SCOM is not able to monitor them.

SOFY, a KPMG application is a perfect example of this. When clients wish to use SOFY outside of KPMG, which is always the case, ITS NL is not able to monitor it any longer. Therefore, SOFY is managing their own monitoring, instead of ITS NL.

Another problem is that SCOM is not compatible with cloud technology. It is not possible to monitor data in the cloud, like Azure. As KPMG has a future plan wherein most resources are based on cloud architecture, this complication is problematic.

Lastly, besides SCOM being incompatible with cloud computing, it is also unable to monitor data in clusters. As KPMG is currently moving its data in a cluster like Kubernetes, nothing can be monitored when the migration is completed.

All the above-mentioned issues are complicated to solve. The SME has been working together with Microsoft to develop a package manager in order to make SCOM compatible for Azure according to the ITS NL standards. This would solve the complication of monitoring Azure resource with SCOM. However, that does not solve all the monitoring complications with cloud computing, as this package manager is solely able to fix complications between SCOM and Azure. When SCOM needs to monitor data in a database which is not in Azure, a new package manager needs to be developed. A real-life example is that ITS NL is going to work with an application called iManage which is running on Azure. The data of iManage will not be stored in Azure but in a database called MongoDB. SCOM would be able to monitor iManage in Azure but not the data in the database. A new MongoDB package manager needs to be

(13)

13

developed. This is not cost nor time efficient. Also, the cluster complication of SCOM would not be addressed with this package manager solution.

KPMG is a data-driven enterprise, not being able to monitor its own data is not acceptable. Therefore, the management has decided that besides SCOM, another monitoring tool must be used which can meet their desires. (“Azure Management – Monitoring,” 2018; “Comparing OMS/Log Analytics and SCOM,” 2018; Welch, 2018; Piot, 2013; Tulloch, 2016).

2.1.2 Define Requirements

In this research, a problem with monitoring data in combination with cloud and container technology has been established. This problem also occurs in the empirical case. The problem of the empirical case will be solved in order to find an answer on the main research question. To solve the problem, the requirements of the solution must first be defined. In the empirical case, KPMG and more specific the management of ITS NL defines the necessary requirements according to its values in combination with the literature. The purpose of monitoring data for the management, is their use of the data on which they base their decisions. This data driven process is called a decision making process. (Brynjolfsson, Hitt, & Kim, 2011; Ikemoto, & Marsh, 2007). The management needs to be able to monitor, understand and adjust its data input in order to use the decision making process. The specific management requirements in combination with literature, are divided in three Phases. These three Phases are Infrastructure, Interpretation and Adjustment. Below these three Phases are defined with substantiation. In figure 2.0, these three Phases are displayed with their connection towards the decision making process.

Figure 2.0

Architectural design of the Decision making Process existing of three main phases.

Phase 1 – Infrastructure

To start off, ITS NL needs to be able to monitor their data in combination with its cloud computing and container technology. Monitoring must be done with a monitoring system. This system needs to satisfy certain requirements to be considered as a potential solution for this

(14)

14

research. (Sun, Yang, Yi, & Kong, 2017; “7 Requirements for Monitoring Cloud Apps and Infrastructure | New Relic,” 2018; Fickas, & Feather, 1995).

These requirements are as following: • Strength of the query language; • Structure of the data model; • Alerting option;

• Notification option; • Ease of integration;

• Ability to monitor metric data; • Ability to monitor log data; • Storage capacity;

• Availability.

The ability to monitor data in the new dynamic environment, is defined as the first and foremost requirement of the empirical case. This requirement would be satisfied, only when data in a container can actually be monitored.

Phase 2 – Interpretation

However, monitoring data has no added value, if the data cannot be understood. Therefore, the monitored data must be presented in an interpretable way. The visualization of the data in an interpretable way, is the second requirement. Interpretation and visualization can be multi-explicable. In order to guarantee the quality of this requirement, it is guided by the theoretical source called The Value of Information Visualization. “The primary goal of Visual Analytics is the analysis of vast amount of data to identify and visually distill the most valuable and relevant information. The visual representation should reveal structural patterns and relevant data properties for easy perception by the analyst. A number of key requirements need to be addressed by advanced Visual Analytics solutions.” (Kerren, Stasko, Fekete & North, 2008). In this paper these key requirements are used to find a visual system. The five key

requirements being; Scalability with Data Volumes and Data Dimensionality, Quality of Data and Graphical Representation, Visual Representation and Level of Detail, User Interfaces, and Interaction Styles and Metaphors and Display Devices. The requirements of this paper have to be satisfied, in order to satisfy the second requirement of ITS NL.8

Phase 3- Adjustment

The last Phase of the empirical case, depends on the attainability of the two above mentioned Phases. Having a containerized data cluster in the cloud from which data can be monitored and

8_{A description of these five requirements can be found in the Appendix 1: The Value of Information} Visualization.

(15)

15

visualized, can be beneficial to an enterprise. However, if the cost of creating this

containerized data cluster in the cloud is high, the technology can become unprofitable. The cost of resources in the current way of working, depends on multiple factors. The factors concerning ITS are the amount of time it takes to implement certain changes, the complexity level of the implementation and the level of sensibility to errors while implementing.

This process can better be explained by a real-life example. KPMG has an asset called SOFY Solutions. When a data cluster of SOFY wishes to create multiple web-apps that must be external facing, a request could be made to ITS for an extra data cluster for demilitarized zone purposes. When the management concludes that this request requires a large amount of time, manual labor and is prone to errors, the whole process can become non-profitable. This could be a reason for the management team to deny the request or researching alternatives . The threshold of the costs are dynamic, based on each request individually. Therefore, no fixed limit could be set that defines whether the costs of the resources are high or low. The management is interested in a way to optimize their concerning factors in comparison to the traditional way of working. This phase is called the Adjustment Phase.

Potential Extra Phase

As the three Phases together make up the decision making process it can be assumed that by optimizing these three Phases, the decision making process has simultaneously been

optimized. The empirical subject has stated that they maintain a data-driven culture. The ITS management uses the decision making process for the team they are managing. If the decision making process is optimized and the empirical subject has a data-driven culture, it can be assumed that by optimizing the decision making process the management is optimized. However, this assumption can only be proven if the requirements of all the three Phases are met. Therefore, this paper first focuses on the requirements of three Phases and only if they are met, it measures the optimization of the management with an extra Phase.

(16)

16

2.2 Method - Experimental Procedure

2.2.1 Design & Develop Artifact

Phase 1 – Infrastructure

The first requirement of this empirical case is clear; monitor data in a dynamic environment with cloud and containerized technology. In order to do so, a proper monitoring system needs to be acquired. The requirements mentioned in Define Requirements Phase 1 Infrastructure, are used as criteria to acquire potential monitoring systems.

After doing desk research, multiple monitoring systems meet all requirements. When a monitoring system checks all the requirements, that system is considered as a potential solution for the empirical research. Below the potential monitoring systems are listed: • Prometheus;

• Azure monitor/log analytics; • InfluxDB;

• Open TSDB; • Graphite; • Nagios.

Out of the list of potential monitoring systems, one system has to be chosen. All the systems check the monitoring requirements. However, when all the systems are being compared to each other, a difference in quality can be found. For example, all the systems have a data model. Therefore, all the systems check the requirement “data model”. However, when the data model is being compared in between systems, it shows that some systems have a high-quality data model and others do not.

When comparing the systems, a distinction can be made between for each

requirement. A list with all potential systems is being compared to each another, for every requirement. When a system has the highest quality for a certain requirement in comparison to the rest, that system gets a score. If they check the requirement but do not have the highest quality in comparison with other systems they do not get a score.

For example; the Prometheus data model stores its data as a time series, with metric names as identifiers and key-value pairs as labels. The labels consist of millisecond-precision timestamp with float64 values. The design of the data model contains a technical high quality because this design works well with storing timeseries. InfluxDB and Open TSDB have a similar model, whereas Azure, Graphite and Nagios do not. Therefore, Prometheus, InfluxDB and Open TSDB get a score for the requirement data model and the other systems do not. These binary scores of each requirement and system are listed in Table 2.0

(17)

17 Table 2.0: Requirement scores monitoring systems

When looking at Table 2.0, it can be seen that Prometheus contains the highest quality of the requirements. Due to Prometheus receiving most scores, it is theoretically the best solution for ITS NL. Azure is second with only one score less than Prometheus and InfluxDB comes third with only two points less. It can be stated that both systems are similar due to their minor differences in scores. When comparing Prometheus to Azure, it can be found that Prometheus has implemented and adjusted updates, which yet have to be announced for Azure.

Prometheus has a stronger query language, data model and metrics logging. Azure on the other hand contains a higher quality data logging and storing. Prometheus in comparison with InfluxDB shows a similar data model, stronger query language and better integration with other environments. From now on, this research focuses on Prometheus monitoring system and how to integrate it into the ITS NL environment. This system is used for satisfying the Phases of the empirical case. ("Overview | Prometheus", 2019; "Querying basics |

Prometheus", 2019; “Azure Management – Monitoring,” 2018; “Graphite Overview | Graphite.readthedocs,” 2019; “InfluxData Product Overview | InfluxData,” 2019; “IT Management with Nagios | Nagios,” 2019; “How does OpenTSDB work? | OpenTSDB,” 2019).

What is Prometheus

After doing profound comparison as shown above, Prometheus has been chosen as the monitoring tool to be applied for this research.

Prometheus is an open-source system monitoring and alerting toolkit. In 2012 a company called SoundCloud started to lean more towards cloud computing. They wanted a microservices architecture with multiple dynamic services and instances. They experienced a multitude of limitations with their current monitoring set-up, which was based upon StatsD and Graphite. What they wanted was a multi-dimensional data model, flexible query language, scalability, operation simplicity and more. Despite, all these features existed in various

systems, a combination did not exist. Soon after the development of Prometheus began. Nowadays, Prometheus contain the following notable features, components and architecture style.

(18)

18

For features Prometheus contains a multi-dimensional data model with time series data, identified by metric name and key/value pairs. A flexible query language to leverage this dimensionality. No reliance on distributed storage and single server nodes are autonomous. Time series collection which happens via a pull model over HTTP. Pushing time series which is supported via an intermediary gateway. Targets which are discovered via service discovery or static configuration. Multiple modes of graphing and dashboarding support.

For components, Prometheus main server scrapes and stores time series data. Client libraries for instrumenting application code. A push gateway for supporting short-lived jobs. Special-purpose exporters for services like HAProxy, StatsD, Graphite, etc. An alertmanager to handle alerts. Various support tools.

For architectural design, Prometheus Server scrapes metrics from Prometheus targets by getting instrumented jobs. Through a Pushgateway, a direct or intermediary scrape for short-lived jobs can be used. All the scrapes samples are locally stored. Rules are applied on the data to aggregate and record new time series. This can also be done from existing data and alerts can be generated. To visualize the data, an API can be integrated with Prometheus. Moreover, the alerts and visualization will be discussed later in the research.

Figure 2.1 illustrates the architecture of Prometheus and some of its ecosystem components: (Volz, & Rabenstein, 2015; “Overview | Prometheus,” 2019).

(19)

19

After observing what Prometheus specifically contains and ascertain the motivation behind the development, one can conclude that from a theoretical point of view Prometheus could be a solution in solving the desires of the three Phases of ITS NL management.

Phase 2 – Interpretation

The second phase of this empirical case requires an interpretable way of visualizing data. Requirements of an interpretable way of visualizing is defined by the theoretical source called The Value of Information Visualization. In this theory there are five requirements.9_{In this} empirical research, an artifact needs to be found which satisfies the requirements of the literature.

Prometheus has a default dashboard called Prometheus dashboard. This dashboard provides a basic visualization interface. It is not designed as a reporting tool or visualization system, but it is designed as a test interface for the monitoring expert. As the Prometheus dashboard is not designed as a visualization system, the possibility exists that this tool does not meet the requirements if the literature and the research will need an alternative tool. This alternative tool can only be tested if the Prometheus dashboard does not meet the Visual Analytic requirements.

There are multiple visualization or reporting systems on the market. These systems have benefits and disadvantages. However, the important aspect of the systems for this empirical research, is the integration of the system with Prometheus and the fulfillment of the five requirements of the literature.

Prometheus offers an intergraded visualization system by default, besides the Prometheus dashboard. However, this system is integrated but not working by default. In order to be used, it needs to be installed. This system offers a data visualization and export possibility. This data visualization tool is called Grafana. The reason that Grafana is

intergraded into Prometheus by default, has been a motivation for this empirical research. This in order to test the visualization system as a first potential solution to the five requirements of the theory before researching other alternatives.

Grafana is an expression browser, that allows expressions and querying a data source for monitoring and debugging purposes. It is an open-source, general purpose dashboard and graph composer, which runs as a web application. Results can be displayed in tables and graphs within a dashboard. This dashboard contains multiple visual building blocks, each with various styling and formatting possibilities. Through several additional Grafana options like Timer Pricker,

(20)

20

Templating, Annotations and Shared, the dashboard has the potential to become more dynamic, interactive, precise and connected. It supports Graphite, InfluxDB and Prometheus as of Grafana 2.5.0. ("Grafana - The open platform for analytics and monitoring,” 2019).

Phase 3 – Adjustment

As stated in the Define Requirements Phase 3 Adjustment contains certain factors that must be optimized. In the current way of working, a data source is created manually. With Azure, a Kubernetes cluster is created using an ACS-engine.10_{The creation of this data cluster, takes a} certain amount of resources. It involves manual labor hours, it is error prone and involves complex exercises which require deep Kubernetes architectural knowledge. To optimize the current situation, the best practice would be to remove the manual aspect of the situation. This means the automation of the whole creation and deployment process. There are no automated files available online which automate this process. Therefore, the automation must be

completely built from scratch in this empirical research. The program language chosen for this automation process is Bash, as this language is suited for executing a series of predefined commands. ("Bash - GNU Project - Free Software Foundation,” 2017)

10_{Explanation about Kubernetes and ACS-Engine is located in the Appendix 1: Kubernetes cluster architecture} and Appendix 1: Azure Containers Service – Engine.

(21)

21

2.3 Method - Data Collection

2.3.1 Demonstrate Artefact

Phase 1 – Infrastructure

Now that it is established that Prometheus is theoretically the best option that can satisfy the requirements of the management team, the design of this artefact must be demonstrated. For the next part of the research, an empirical case study will be performed whereby the artefact will be installed, configured, applied and tested. The practical steps will be listed below:

Step 1.1 Creation of Azure account and resource group

As described in Design and Develop Artefact above, an Azure account is needed. For this research an Azure account has been created with the account name:

akahmann@kpmgnl.onmicrosoft.com. This account has been created under the tenant of ITS

NL and certain rights are granted by the administrator. These rights give the account access to the subscription of ITS NL called KPMG ITS NL DevOps Services. Within this subscription a production resource group is created called kpmgnl-uva-p-rg. All the operations mentioned above are done manually, through the Azure Portal. For any commands that are needed run, the command shell of Azure CLI and language called Bashed are used. In the figure 2.2 below, the Azure portal with account and resource group is being displayed.

Figure 2.2

Azure portal, with akahmann@kpmgnl.onmicrosoft.com account and access to five

subscriptions.

Step 1.2 Creating a Kubernetes Cluster

Within this resource group, certain features can be installed. In this research, the infrastructure of a Kubernetes cluster is chosen. Therefore, all the features that are implemented in the resource group are according to the Kubernetes cluster. In this research the cluster contains

(22)

22

Nodes, Pods, Availability Sets, Load Balancers, Databases, VM’s, Disks, Networks, Web Apps, Native Apps and a Key Vault. An ACS-Engine is used for the deployment. In a JSON file all the specifications are set in order to run the ACS-Engine. This creates an ARM template11_{(“Azure Resource Manager overview | Docs.microsoft,” 2019) together with} apimodel.json, azuredeployment.json, azuredeployement.parameters.json and certificate access files like kubeconfig and apiserver certificate files.12

After a successful ACS-Engine deployment, a Kubernetes cluster is being created. The resource group of this cluster is displayed blow in figure 2.3, containing all the features.

11_{Explanation of the ARM template is located in the Appendix 1: Azure Resource Manager template.} 12_{All the files can be found in the Appendix 2: uva3-kubernetes-deployment Files.}

(23)

23 Figure 2.3

Azure resource group called kpmgnl-uva-p-rg on subscription KPMG ITS NL DevOps

Services with all necessary Kubernetes cluster resources.

Step 1.3 Integrate Prometheus to Kubernets Cluster

After a Kubernetes cluster has been established, it must be monitored. As decided in Design and Develop Artefact, Prometheus is used as the monitoring tool for this research. Prometheus generates automatically monitoring target configurations for Kubernetes. This monitoring system collects metrics from Kubernetes services, nodes and orchestration statuses. It operates via a node exporter (for classical host-related metrics like network and CPU), kube-state-metrics (for orchestration and cluster level kube-state-metrics like pod kube-state-metrics and deployments) and kube-system metrics (from internal components like scheduler and kubelet). Prometheus can be installed and integrated through a yaml file.13

After Prometheus is integrated within the Kubernetes cluster, it is possible for the Azure account to query metrics, see alerts and targets. This could be done through the Azure Portal. When a port of Kubernetes is being forwarded, a Prometheus dashboard on the localhost can been prompted. 14_{The dashboard is displayed in the figure 2.4 and figure 2.5.}

Figure 2.4 Prometheus can connect and run in the browser on localhost port 9090.

Figure 2.5 Prometheus is able to run on a non-KPMG network in the browser.

13_{All the files can be found in the Appendix 2: uva3-kubernetes-deployment Files.}

14_{A list of details about the specific configuration settings of Prometheus can be found in the Appendix 2:} uva3-kubernetes-deployment Files.

(24)

24 Phase 2 – Interpretation

Step 2.1 Testing visualization system

As shown in the Infrastructure Phase, Prometheus is integrated with Azure and Kubernetes. After the integration, a dashboard is displayed. The Prometheus dashboard is a default dashboard which automatically is integrated with Prometheus and does not require any installation steps.

When the Prometheus dashboard is compared to the Visual Analytic requirement (Kerren, Stasko, Fekete & North, 2008), it shows that Prometheus dashboard is not a suitable visualization system. The first requirement Scalability with Data Volumes and Data Dimensionality of the Visual Analytic literature could not be met by the Prometheus dashboard.15_{This requirement has stated that multiple data sources need to be visualized by} the visualization system. However, Prometheus dashboard can solely display data from Prometheus and no other data sources. By not meeting the first Visual Analytic requirement, it can be concluded that Prometheus dashboard is not able to satisfy the Interpretation Phase.

An alternative visualization system is required. As stated in the Design & Develop Artefact

above, Grafana is chosen to be the alternative visualization system of this research. To test this visualization system, the integration process must be completed. Grafana must be installed and configured with the data source.16_{When Prometheus is already up and running, it can simply} be connected with Grafana through a port-forward command. Grafana dashboard would run in a browser on local host.17_{Below is an example of the Grafana dashboard result.}

15_{Description of the Visual Analytic requirements is located in the Appendix 1: The Value of Information} Visualization

16_{The commands for installation and configurations can be found in the Appendix 2:} uva3-kubernetes-deployment Files.

17_{A list of details about the specifications of the configuration code and files, can be found in the Appendix 2:} uva3-kubernetes-deployment Files.

(25)

25 Figure 2.6

Grafana is able to run in the browser and is connected to Prometheus.

After Grafana is running, it has to be tested with the literature in order to meet the

requirements. Below each requirement of the literature are listed, explored and compared to Grafana in detail.

Scalability with Data Volumes and Data Dimensionality

The requirements state that the Visual Analytic system needs to be able to scale with the range of different data types, data sources and levels of quality. It also needs to be able to be

implemented with different interactive systems.

Grafana supports different storage back-ends and it is possible to combine multiple data sources into a single dashboard. However, each panel in the dashboard is fixed on a specific data source. The data sources supported are Graphite, Elasticsearch, CloudWatch, InfluxDB, OpenTSDB, Prometheus, MySQL, Postgres and MSSQL. From Grafana 3.0 additional plugins are available to support even more data sources.

Grafana supports high availability through a stateless session. A store session data can be chosen with Postgres and a LoadBalancer. There can be multiple Grafana servers running at the same time. The LoadBalancer send the users to certain servers according to the availability of the servers. Users do not have to register on each server when being assigned. Therefore, the user experience will not be affected when different servers are assigned to the user. This in turn helps with adding or removing servers without the user noticing. An infrastructure design can be seen in figure 2.7 below.

(26)

26

Figure 2.7 Grafana Loadbalancer infrastructure.

Quality of Data and Graphical Representation

This requirement states that the data quality and data algorithm confidence need to be appropriately represented. This is necessary due to limitations, errors or uncertainty of the chosen algorithm can lead to misleading results.

Grafana has been innovating its data visual possibilities, in order to minimize limitations. It helps the user with guiding tools for decreasing the possibilities of making errors or creating misleading results.

Grafana constructed a feature called Split and Compare. As the name states, it allows the user to visualize multiple graphs and tables side-by-side on one page, to compare relatable data. This feature facilitates the user to understand or spot anomalies when comparing data. The Loki feature, which automatically sets labels for each log stream, help the user with their meta-data tagging.

Deduping, these are algorithms which help duplicating lines for fewer errors. Duping algorithms are “exact”, “numbers” and “signature”. Each algorithm has its own level of matching and deleting data fields.

Visual Representation and Level of Detail

This requirement states that data patterns and relationships need to be visualized on several levels of detail, with the appropriate level of data visual abstraction.

Grafana contains a multitude of visual representational customizable possibilities.

One of which is the Panel menu. Through this menu it is possible to adjust and edit certain configuration settings. The Graph Panel gives the option to edit the different types of graphs, legend options, thresholds, time regions and time ranges.

The Singlestat Panel can adjust the configuration settings, coloring, spark lines, gauge, mapping and troubleshooting.

(27)

27

Furthermore, there are Heatmap Panel, Alert list Panel, Dashboard list Panel and Text Panel. Explore options helps to find and focus on the data of interest by the user.

User Interfaces, Interaction Styles and Metaphors

This requirement states that the user interface must not be overtechnical and complex which could potentially distract the analyst. Also, feedback must be provided.

Grafana dashboard features also its own list of data representation options. It can adjust the Variables, Annotations, Folders, Playlist, Search, Sharing, Time Range, Export & Import, Scripted Dashboard, Dashboard Version History and JSON Model. These features help designing an interface which can be as easy or complex depending on the taste of the user.

Display Devices

This requirement states that the Visual Analytic system needs to be able to support corresponding display devices.

The Grafana visual analytics system can be locally displayed on the device. This means that every machine can display the system regardless of the operating system. Even a virtual machine which runs in the cloud would work. Besides locally displaying the virtual analytic system, it is also possible to display data through an URL. Every device with internet connection is able to display data through the URL.

In addition, Grafana contains multiple feedback options. These include a contact form, online forms, blogs, GitHub and Twitter.

Step 2.2 Alertmanager

One of the aforementioned complications of working with SCOM, is the alert management process. It contains multiple complications caused due to filtering, rules, internal politics, management issues, auto-resolves and more. These complications can be resolved with another default built-in alerting system called the Prometheus Alertmanager. In the

Prometheus servers, the alerts are configured by PromQL and are subsequently handled by the Alertmanager. It does the grouping, de-duplicating, silencing, inhibition of alerts and routing to the corresponding email, PagerDuty, or OpsGenie. Therefore, the Alertmanager can solve the alert issues of SCOM by being more customizable, controllable and communicable with providing new possibilities.

(28)

28

Since the Alertmanager is installed in Prometheus, it needs to be configured. The dashboard runs in the browser as a local host. From there it is possible to create and install all the options which Alertmanager is offering.18_{Below is an example of the result.}

Figure 2.8 Prometheus Alertmanager dashboard is able to run in the browser on localhost.

Although, the Alertmanager can improve the manageability of the Prometheus system, the effects of Prometheus Alertmanager are experienced by the management team indirectly, due to the fact that they do not directly work with the Alertmanager. They would be receiving an increase of clear information because the alerting system would have altered and notified the monitoring expert of inconsistencies, whereby these alerts would be resolved with the Prometheus system before the final data result is reported to the management in Grafana. The effects of Alertmanager are added to the effects of using Grafana. Therefore, Prometheus and Alertmanager are not seen as an individual improvements.

Phase 3 – Adjustment

To truly optimize the adjustment phase, a script must be created that automates the creation and deployment of the data cluster. Reason being, that in order to create or change a data cluster, a new cluster will be deployed. When building a cluster, steps taken are ordered and repeated for each deployment. Therefore, these steps can be automated. The only manual steps which need to be taken before the deployment, is signing in under the correct subscription in Azure and

18_{The configuration code of the Alertmanager can be found in the Appendix 2: uva3-kubernetes-deployment} Files.

(29)

29

enter a resource-group name. Also, the right settings for any changes must be configured before deployment. After the script has been finished, a complete up and running cluster is deployed.19 This script runs all manual steps automatically from the moment of creation of a resource group. These steps are similar to the Infrastructure Phase where all steps are done manually through the use of the Azure portal. Each step of the automation script is explained below.

Step 1: A decision can be made if an environment (meaning test or production), resource-group, master-subnet, nodes-subnet or vnet will be created. As the first step is similar to the Infrastructure Phase, a resource-group must be created in order to be able to deploy a Kubernetes cluster. Therefore, the option group must be chosen with a corresponding resource-group name. Resource resource-group names have certain standards which are obliged. A resource-resource-group called kpmgnl-uva3-p-rg is created as a result of this step.

Step 2: The resource-group name is used to set a corresponding subscription path.

Step 3: A Key-vault is created for the cluster. This is an application which can store important information, cryptographic keys, secrets, certificates and tokens. These stored items which can be used to access the cluster. It can also be used for authentications of application.

Step 4: In this step a Web-API is created. A Web-API is an application programming interface for the web which focuses on communication. This API is stored in the Azure Active Directory authorization (AAD). An AAD is a management service for cloud-based identity from Microsoft. It enables Azure users sign-in and providing access to specific resources.

Step 5: A Native-application is being deployed in the AAD. This Native-application calls the Web-API from the step 4 on behalf of the user. Step 4 and step 5 combined contain the created of a service principle of the cluster, meaning the authentication of the cluster.

Step 6: Multiple Manifest files are created. A Manifest file is a file which contains metadata.

Step 7: An automatically generated password is stored in the Key-vault which has been created in step 3.

Step 8: A rule is set which states that every key that is created, must be stored in the Key-vault.

(30)

30

Step 9: A SSH Public-Key is generated. This key is needed if the cluster needs to be accessed externally.

Step 10: A SSH Private-Key is generated. This key can be used if the cluster needs to be accessed internally.

Step 11: An Azure Cluster Service-Engine (ACR-Engine) is downloaded. This tool helps managing the data cluster. Also, Azure Resource Manager Templates (ARM Templates) are downloaded. These templates contain specific predefined code for the Kubernetes integration with Azure.

Step 12: The ACS-Engine generate certificates, which are stored in the Key-vault.

Step 13: The ACS-Engine generates keys, which are stored in the Key-vault.

Step 14: All the certificates stored in the Key-vault will be referenced to an apimodel file.

Step 15: Deployment is executed. All the predefined steps are necessary in order for this deployment to complete successfully.

Step 16: The apimodels are restored.

Step 17: This final step is used for a multi-factor authentication for the user, meaning the login.

After a successful completion of this automated script, a Kubernetes cluster with Azure has been established. A manual has been written which explains step by step how to use this automation script.20

(31)

31

3.1 Results – Explanations findings, tables and figures

3.1.1 Evaluate Artefact

Phase 1 – Infrastructure

From the demonstration of the artefact in the empirical study, it is proven that it is possible to apply a successful and complete Kubernetes cluster deployment on Azure when connected to the KPMG network and subscriptions. It is also possible to connect the Kubernetes cluster with Prometheus monitoring tool. When looking at the results a few findings stand-out.

The first of the findings shows that Prometheus is by default compatible with cloud technology. This is proven by the Prometheus integration to Kubernetes cluster which is running on Azure.

The second finding demonstrates that Prometheus is a monitoring tool and not an operation manager. This is shown by the possibility to proactively monitor and its service tracking and data recovery capabilities.

It has also been found that Prometheus is not limited to a specific network. Therefore, Prometheus is always able to monitor a data source independent of the KPMG network, as is proven by running on a guest network which is displayed in figure 2.5.

When looking at the query possibilities, it is evident that by using Prometheus the ability to query the desired information with an expression language has expanded. This is due to the extra option of also being able to use PromQL as a query language. ("Querying basics | Prometheus,” 2019).

The last finding shows that Azure Docker container management is optimized by the

implementation of Kubernetes. By using a Kubernetes cluster, the orchestrator of Kubernetes manages its own cluster automatically. This means that scaling, application management and restarting containers do not have to be done manually.

If similar management options must be done through the traditional way, a manual alert trigger must be set in SCOM. When this alert goes off, the data source must manually be altered. However, this is not one of the requirements of the three phases of management.

When the findings of the use of the Prometheus monitoring tool are being compared with SCOM, it shows that Prometheus optimizes the monitoring capabilities on certain aspects.

(32)

32

SCOM is not able to be integrated with Kubernetes, in contrast to Prometheus that is built for this purpose. While SCOM is being used as a monitoring tool and an operation manager, Prometheus is solely focusing on being a monitoring tool which creates additional benefits. SCOM is not able to monitor outside the boundaries of the internal network of KPMG, where Prometheus encounters no limitation whatsoever.

SCOM is able to use default query options, which Prometheus can do as well and an

additional query language of its own which increases the strength of the query possibilities. All findings combined show that Prometheus is able to meet the requirement of the

Infrastructure Phase.

Phase 2 – Interpretation

In the demonstration of the artefact, the five requirements of the Visual Analytic are compared to the artefact. As the dashboard of Prometheus could not meet the requirements, an

alternative solution is found. This system is called Grafana and has underwent a comparison to the five requirements of the Visual Analytic. Below, the findings are summarized for each requirement individually.

The Scalability with Data Volumes and Data Dimensionality state requires that the Visual Analytic system needs to be able to scale with the range of different data types, data sources and levels of quality. It also needs to be able to be implemented with different interactive systems.

Grafana has met this requirement by supporting different storage backends and

multiple data sources. In addition, to the data sources support, multiple plugins are available to support specific aspects of the data sources. By having a stateless architecture, Grafana

guarantees a high availability, regardless of the amount of services assigned.

The Quality of Data and Graphical Representation requirement states that the data quality and data algorithm confidence need to be appropriately represented. This is necessary because limitations, errors or uncertainty of the chosen algorithm can lead to misleading results. Grafana has met this requirement by constructing tools like Split and Compare, Loki and Deduping. These tools help the user compare data, understand data, spot anomalies, automatic meta-data tagging, duplicating and matching code.

The Visual Representation and Level of Detail requirement states that data patterns and relationships need to be visualized on several levels of detail, with the appropriate level of data visual abstraction.

(33)

33

Grafana has met this requirement by multiple Panel menus like the Graph Panel, Singlestat Panel, Table Panel, Heatmap Panel, Alert list Panel, Dashboard list Panel and Text Panel. Each of these Panels help the user to visually explore numerous levels of detail on certain aspects of the data.

The User Interfaces, and Interaction Styles and Metaphors requirement states that the Visual Analytic system must not be an overtechnical and complex, which could potentially distract the analyst.

Grafana has met this requirement by being able to adjust the level of complexity in every aspect of the dashboard. Each user can customize the system to the complexity level accordingly to their own level of expertise.

The Display Devices requirement states that the Visual Analytic system needs to be able to support corresponding display devices.

Grafana has met this requirement by being able to run locally, on the cloud or through an URL. Therefore, every operating system is able to display Grafana with or without internet connection.

Besides the requirements of the literature, there are some additional findings

It has been found that by using Prometheus as monitoring system, it is possible to monitor the data source in real time. Every 30 seconds the metrics are scraped and displayed to the Prometheus dashboard as well as to the Grafana dashboard. So, it appears to be no difference to the pulled-in data between the Prometheus dashboard and Grafana.

The empirical research findings show that the graphs and tables of Grafana are interactive, giving additional information when hovering over certain positions on the graphs.

It has also been found that Alertmanager can work in combination with Grafana. Alertmanager can set an emphasis on certain data points in Grafana for example.

All findings combined show that Grafana is able to meet the requirement of the Interpretation Phase.

Phase 3 – Adjustment

Now that an automation script has been developed, the amount of resources needed to create a data cluster or make changes, has significantly been diminished in comparison to the

traditional way of working. Through the automation script, running the commands takes seconds while manual commands take minutes. All the commands combined will take the automated process 20 minutes, while manually it takes an hour.

(34)

34

The automation script has all the commands predefined; meaning all commands will run in the right order, be directed to the right location paths and no typing errors can be guaranteed. Manual commands always have some degree of being prone to errors. For instance, not being able to find the right file locations, typing errors or not giving the right commands in the right order. Because all the steps are created automatically instead of manually, the probability of creating errors has been eliminated.

The automated script can be started with one simple command line. Due to the simplicity of the command, it can be run by everyone. No expert is needed to create a cluster any longer. Doing all the commands manually is not as simple and because of its complexity it needs an equally high-level programmer.

The results of the changes within the adjustment phase show that all the requirements of the desired state have been met. The new automated process optimizes the current state compared to the traditional manner.

(35)

35

3.2 Results - Temporarily conclusion

Phase 1 – Infrastructure

According to the findings mentioned in the Evaluate Artefact, Prometheus resolves and optimizes many of the problems which occur while using SCOM. This is an improvement on the traditional way of working. Prometheus is a monitoring system that satisfies the

requirements of having a structured data model, strong query language, alerting option, notification option, integration capability, metric and logging data option, storage capability and finally having availability. Besides satisfying the requirements of the monitoring system, it also satisfies the requirement of the Infrastructure Phase of the management team. This first Phase states that the monitoring system needs to be able to monitor a dynamic environment. Kubernetes, as stated before, contributes to this dynamic environment on the Azure cloud. Prometheus has proven to be able to integrate and monitor Kubernetes on Azure. Therefore, it can be stated that one of the three decision making process requirements have been met. In figure 3.0, the Design and Develop Artefact can be seen in the Method Framework for Design Science Research, with the solution of Phase 1.

Figure 3.0:

Method Framework for Design Science Research, Infrastructure

Phase 2 – Interpretation

All five requirements of the Value of Information Visualization are met by Grafana. These requirements are Scalability with Data Volumes and Data Dimensionality, Quality of Data and Graphical Representation, Visual Representation and Level of Detail, User Interfaces, and Interaction Styles and Metaphors, Display Devices.

(36)

36

By meeting all these requirements, it can be stated that the requirements of Phase 2 of the decision making process are met. In figure 3.1, the Design and Develop Artefact can be seen in the Method Framework for Design Science Research, with the solution of Phase 1 & 2.

Figure 3.1: Method Framework for Design Science Research, Interpretation

Phase 3 – Adjustment

Due to the development of the automation script, the manual aspect has been removed. Therefore, the time to create and deploy a data cluster has been diminished substantially and the probability of an error has been removed. The level of knowledge required to create a cluster has been decreased. The findings show that the desired state of the Adjustment Phase which is the third and last Phase of the decision making process has been met.

By optimizing this Phase, the probability to accept cluster creation requests has a higher probability of being accepted, in comparison to the traditional way of working. Especially, now that the costs have been decreased due to the automation script. In figure 3.2, the Design and Develop Artefact can be seen in the Method Framework for Design Science Research, with the solution of all the Phases.

(37)

37 Figure 3.2:

Method Framework for Design Science Research, Adjustment

Extra Phase – Management Optimization

When analyzing the three Phases of the decision making process, it can be stated that every single requirement of each Phase has been solved, by meeting at least the desired state and in some cases surpassing the desired state. Therefore, it can be concluded that each of the three Phases have been optimized according to the criteria of the management team.

As the three Phases together make up the decision making process it can be assumed that by optimizing these three Phases, the decision making process has simultaneously been

optimized. The empirical subject has stated that they maintain a data-driven culture. The ITS management uses the decision making process to make decisions for the team it is managing. If the decision making process is optimized and the empirical subject has a data-driven culture, it can be assumed that by optimizing the decision making process the management is being optimized. However, this assumption must be proven.

A tangible result from the management could be certain tasks which are defined for the team the management manages. The ITS management uses the decision making process to make decisions for this team. This team is called the Core Team of ITS NL and it is a DevOps team, which stands for development and operations. This team researches, designs and builds innovative IT projects. Theteam is working Agile.21_{By working Agile they use a Scrum} framework which contains the so-called Sprints. These Sprints are time-periods in which certain tasks must be finished. These tasks are called Backlog-Items and are predefined by the

Monitoring systems with container technology