Towards Risk Analysis and Threat Handling in Digital Market Places

(1)

Bachelor Informatica

Towards Risk Analysis and Threat

Handling in Digital Market Places

Wouter Loeve

June 17, 2019

Supervisor(s): Lu Zhang MSc and Dr. Paola Grosso (UvA)

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

In our digital economy, data is becoming an increasingly important way of doing business. However, there are security concerns that come with this process, such as data privacy

and confidentiality. Digital Market Places (DMPs) are created to facilitate secure and

trustworthy data sharing. They aim to enable parties to collaborate without losing control over their data. In order to achieve this, we present two systems that can be used to secure

container units used in DMPs. The first system is the Threat Analysis System, which

identifies threats or risks from a DMP scenario and associates them with countermeasures and metrics. These countermeasures and metrics can then be used in the second system we propose, the Threat Handling System. This handling system is able to monitor different metrics to detect malicious behaviour during computation. We implemented a demo scenario in Docker and Kubernetes. In this proof of concept scenario for our system, we were able to monitor multiple metrics: CPU usage, system calls, network traffic and memory usage. We were then able to use these metrics to detect two malicious algorithms we designed to break the policies described in our scenario.

(4)

(5)

Introduction

1.1 Context

For many businesses, digital collaboration is an increasingly important facet [16]. Some organi-sations have data that is useful for other organiorgani-sations. However, when the data provider simply hands over the data, it might be concerned about losing control over its data.

This is where Digital Market Places (DMP) come in. A DMP is a membership organisa-tion supporting a common goal, in other words, to enable data sharing to increase value and competitiveness of (AI/ML) algorithms [14]. DMPs give opportunities for data providers and consumers to share, compute and monetise data. The challenge is to allow collaborations among organisations that normally compete with each other.

In order to facilitate secure data sharing and collaborative computation, the involved parties first form a digital agreement based on their trust relationships, as illustrated in Figure 2.3. This provides the owners of the data, some control about who has access to which data and for what purpose the data is used. In order to ensure that these digital agreements are complied to on an operational level, state-of-art security mechanisms and monitoring techniques are set into place.

1.2 Research Questions

In order to know what security mechanisms and monitoring techniques have to be used, risks have to be estimated, preferably in a generalised way so that these techniques can be applied to a wide variety of scenarios. Policy enforcement techniques can then be applied based on this risk estimation. In this context, enforcement of policies means that they cannot be breached; ergo enforcement implies prevention of any risks. We hypothesise that not all policies are able to be enforced directly. In that case, it might be possible to detect policy breaches. This detection is relevant because parties in DMPs may be held liable for any breaches of policy. There might be other consequences due to the fact that DMPs are membership organisations. In this thesis, the following research questions will be addressed:

1. How can we establish a generalised risk estimation method for digital market place scenar-ios?

• What security technologies can be adopted to prevent breaches of the policy during execution?

2. How can we construct a system which handles the prevention and detection of threats in real time?

(8)

1.3 Thesis Outline

First, this thesis will outline the current state-of-the-art in container security in Chapter 2 to find security flaws inherent to using containers (specifically Docker) in DMPs. In order to enforce and monitor risks, a risk estimation will be made in Chapter 3, to get an overview of possible risks and threats for a specific scenario. In the same chapter, a scenario will be defined in order to assess whether this approach is viable. This specific analysis will be generalised in Chapter 4, so that a system which estimates risks and corresponding countermeasures given an arbitrary scenario can be proposed.

This approach will then be extended to propose a real-time handling and enforcement system, which is featured in Section 4.2. This system is able to respond to anomalies by preventing certain threats from occurring or further escalating. When that is not a feasible solution, the algorithm will be flagged as a possible malicious algorithm so that involved parties may investigate and take action. The anomalies can be detected by analysing different metrics. The proposal for this system and the anomaly detection metrics will be validated by implementing the aforementioned scenario together with malicious algorithms in Chapter 5 and 6 respectively. This implementation will be done using Kubernetes container orchestration, which is able to direct Docker containers. Metric results will be collected from the malicious algorithms and will then be analysed in order to provide a proof of concept for this Threat Handling System.

(9)

CHAPTER 2

Containers, Orchestration and Digital

Market Places Background

Containers provide us with the means to deliver and consume data and exchange algorithms in Digital Market Places. This is why this chapter will give general information about what containers are, how they work internally, and what security features they use. This is done in order to assess threats during our risk estimation in Chapter 3. Furthermore, this chapter will give more in-depth information about what Digital Market Places are.

2.1 Characteristics and the reasons for using Containers

Containers are a virtualisation technology which proves to be a lightweight alternative to Virtual Machines (VMs) [23]. A container can be seen as a software environment in which one can install an application or component (also called a microservice).

VMs and containers aim to provide isolation, which in turn leads to security. In the case of VMs, this isolation creates a bottleneck. This is because each VM has its copy of the OS, libraries, dedicated resources and applications. Containers run closer to the OS, and all containers use the same Linux host kernel. This results in better performance compared to using separate VMs due to the reduced overhead [4]. The lesser degree of isolation means that in principle, using containers should be less secure. In a VM, an application communicates with the VM kernel instead of the host kernel, as is the case while running a regular application. This means a malicious application in a VM can only attack the host kernel by first getting through the hypervisor and then the VM kernel. On a container platform, the container communicates with the host kernel itself. An example of a container program is Docker. Docker is a well known and often used container program for the DevOps (Development and Operations) developer practice. A schematic representation of the Docker ecosystem can be found in Figure 2.1.

Since Docker is lightweight, it is possible to run multiple containers on the same system. This can be utilised in the DMP infrastructure. By using containers, the same computing space can be utilised by multiple running computation jobs since containers provide a level of isolation. Furthermore, we can easily exchange algorithms and data between clusters or parties by using containers in Digital Market Places.

(10)

Figure 2.1: A schematic overview of Docker [6] [17]

2.2 The internal workings of Docker

Docker uses images to instantiate containers. These images describe what the container should do and can be created by the user or pulled from a registry. Images can be created from a base image which is usually derived from a lightweight Linux distribution [6]. Docker images are separated in layers, each layer using a copy on write policy compared to the previous layer. This means that Docker images are platform independent. In practice images are often pulled from public registries such as the Docker hub [6]. Pulled images can also be reconfigured or even changed entirely to fit a specific use case.

2.2.1 The Docker Engine

The Docker containers are managed, monitored and executed by the Docker daemon (also called Docker Engine) [3]. The engine also controls the container’s level of isolation. This isolation is leveraged by using Linux kernel features, namely cgroups, capability restrictions, SElinux and Apparmor profiles and namespaces. These are explained in more detail in Section 2.2.2. The Docker engine can also spawn shells inside running containers, change iptables and create network interfaces [6]. To do all of this, the Docker engine runs with root permissions and UID 0.

2.2.2 Namespaces

As mentioned before, Docker uses the Linux namespaces feature to create an isolated environment and virtualise system resources for a group of processes ergo, a container. This means that modifications to namespaced system resources are kept within the associated namespaces [9]. There are also different types of namespaces in the Linux kernel [17], these are briefly described in Table 2.1.

Type Used for PID Processes

IPC Inter process communication NET network resources

MNT File system mountpoints UTS Hostname and Domain isolation USER Seperate view of users and groups

CGROUP Virtualization of the process’ cgroup’s view

(11)

If a (malicious) user were able to gain access to the Docker Engine or specify Docker images themselves, it would be possible to access host files [10]. This can be done by using Docker volumes, as these allow files or directories to be mirrored from the host system into the container. To counteract this, it is possible to configure user namespaces. User namespaces can remap user permissions, making it harder to perform privilege escalation with the help of the file-system. User Namespaces are not enabled by default in Docker at the time of writing [12] because they are not suited for all Docker applications. This is due to limitations 1 _{and the fact that this}

setting requires some extra configuration to make sure a container keeps working.

2.2.3 Control Groups (cgroups)

Control groups or cgroups control the accountability and possible limitation of resource usage. This means that cgroups can limit the resources used by specific containers [10]. The catch is that this feature also needs to be configured since the containers inherit the limits imposed on the Docker Engine by the Linux kernel itself unless configured otherwise. This would, in theory, prevent Denial of Service (DOS) attacks from ever happening. The problem with this approach is that a proper resource limit needs to be defined beforehand. This is especially true in the context of Digital Market Places that often features High-Performance Computing programs. Namespaces and cgroups are created by the Docker Engine when a container is spawned.

2.2.4 Capability Restrictions

Capabilities are another feature of the Linux kernel which is important to use in a Docker environment. These can be used to assign privileges to processes [11]. Containers run with a reduced set of these out of the box. Some applications need more privileges than others. As a best practice, it is advised to give a container an as low as possible number of privileges. According to [11] this can be identified by running a test suite by trial and error. It is said that requirements belonging to compute jobs usually remain constant.

2.2.5 Kernel Hardening

It is possible to use Docker in combination with several means of kernel hardening. One means of kernel hardening is using the Linux Security Modules (LSMs) [23], [3]. These modules are developed to support various security models. Examples of LSMs are: AppArmor, SELinux and TOMOYO.

These LSMs often implement Manditory Access Control (MAC) or a similar system. MAC provides an additional layer of permission checking after the standard Discretionary Access Con-trols (DAC) is performed. Files, Directories, Processes and System objects have labels in Linux. The system administrator can then write rules or so-called policies to control access to these resources. The kernel manages and enforces the access controls meaning that even containers with root access can be prevented from accessing resources.

Docker uses two types of policy enforcement. Type Enforcement protects the host from container processes [3]. The default behaviour of this type enforcement is that containers can only access resources that reside in a container. By solely using Type Enforcement it is possible to access resources in another container because they all have the container type. To solve this, Multi-Category Security (MCS) enforcement is used. This policy enforcement technique works by giving distinct labels to each container and corresponding processes [3]. The kernel only allows processes to access processes with the same label. Ergo, compromised processes are unable to access processes outside their container.

Besides LSM there are also kernel security patches like GRsecurity and PAX [3]. GRsecurity is a set of kernel patches which aims to provide protection against malicious programs which intend to cause buffer overflows by modifying memory [11].

Another interesting mechanism is SECure COMPuting (Usually abbreviated as SECCOMP) with filters. These filters can restrict access to certain kernel functions from a container.

1_User _namespaces _limitations:

https://docs.docker.com/engine/security/userns-remap/ #user-namespace-known-restrictions

(12)

2.3 Container Orchestration

Docker provides the basis for data sharing in Digital Market Places. With Docker, we can easily start up container workloads saved in pre-defined images. Because digital market place computation can involve high-performance computing, it is possible that we will have multiple machines running the same Docker container. In order to facilitate such behaviour, Kubernetes can be used to manage these containerised workloads and services [13].

2.3.1 Abstractions

Kubernetes is not a monolithic program [13] but is a collection of different components. These components can be instantiated in different places. The first of which is the master node. The start point of a Kubernetes cluster acts as the master node. Usually, this node is not used as a cluster node, ergo containers are not run on it. The other component(s) are run on the nodes themselves, as illustrated in Fig 2.2.

Figure 2.2: Overview of Kubernetes1

1

Source https://Kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked/

Containers are not executed on the node directly. They are controlled with the kubelet component and are also contained by another abstraction layer called a pod. A pod consists of one or more containers that are run on the same host.

2.3.2 Minikube

Minikube is a tool that enables developers to run Kubernetes locally [22]. Minikube does this by simulating a single node environment in a VM on a local machine. Recall that in a ’nor-mal Kubernetes environment’ only a single pod runs on one host. Minikube essentially scales Kubernetes down (nodes become pods) so developers can test their applications without setting up an entire network of servers to create a cluster. The implementation of the scenario and the corresponding validation of the monitoring metrics was done entirely in Minikube.

2.4 Digital Market Places (DMPs)

As described in the introduction, Digital Market places are membership organisations in which it is possible to share data with different parties [14]. For this purpose, DMPs can provide a distributed computing platform. Specific details about the movement and usage of a party’s data

(13)

are carefully documented in an agreement. This agreement applies to both data suppliers and algorithm providers. Figure 2.3 gives insight into the high-level workings of a DMP.

Figure 2.3: A high level view on DMPs [14]

First, an agreement is set into place. This agreement contains information about how com-putation is performed, with which data. It also provides information about what the algorithm provider is allowed to do with both the data and the result in order to provide a secure envi-ronment. This is especially important if the parties involved are competitors. This agreement, along with the party’s wishes contribute to the decision on what archetype or scenario is used during computation. The archetype describes the flow of the data and algorithm in concrete networking terms.

DMP parties can be divided into two types, data sharing parties and algorithm provider parties. Usually, an algorithm container image is sent to the DMP and can then be run on the data that is also sent over to the DMP from the data providing party.

(14)

(15)

CHAPTER 3

Risk Estimation

In this chapter we first, we introduce categories for our threats. Then we define the scenario our risk assessment is based on. Finally, we show a table which shows threats based on our scenario and established literature.

3.1 Quantitative and Qualitative Risk Analysis

3.1.1 Quantitative Risk Analysis

Statistical estimation represents the basis for quantitative models [18]. Quantitative risk analysis assigns monetary values to components. This is then used to calculate which components have relatively higher risks. The problem with this approach is that it is sometimes hard to estimate the value of components. This is especially true for those holding data as data can be more meaningful and thus more valuable to specific parties. This value can have multiple aspects, for example, to build, protect or recover the asset. The weights of these different systems are hard to establish precisely and can vary between use scenarios.

3.1.2 Qualitative Risk Analysis

A qualitative risk analysis categorises risks in different levels [18]. The most straightforward way of looking at this is dividing the risks by impact in a low, medium and high category. This qualitative risk analysis is also scenario based and is usually assessed through the use of risk assessment matrices and questionnaires [18]. An example of scenario information that qualitative risk analysis is based on can be the value of the asset, the possible threats, the vulnerability that enables the threats and the controls; actions that can be taken by a security team when something goes wrong. This dependence on data makes qualitative risk analysis hard to pull off in practice. Another aspect of this qualitative risk analysis is that it is subjective, the analysis is based on the experience and judgement of the professional who does the analysis.

3.2 Threat types

A main aspect of our work has been the exploration of threats for a specific scenario. This structured list will be important when exploring the monitoring options in a later chapter. In this section, two distinct means of classifying threats will be explored. This will be used in our risk analysis to organise the threats.

3.2.1 Ordering in Components

The first set of categories can be seen as a general container ecosystem view derived from [23]. This organisation categorises possible threats into four different component levels of the system.

(16)

Figure 3.1 provides a schematic overview to illustrate the general ideas behind each of the levels.

a. Protecting the container

The first category is protecting the container (manager) from the application that is running inside a container. As mentioned before, some applications require more rights than others. Utilising this, it may be possible to take control of the host system, which in turn may result in compromisation of other containers running on the same system.

b. Inter-container protection

This category is a level higher than the previous one. It may be possible to compromise a container from another one or leak data from one to the other. For this category, it is important to note that the containers do not have to run on the same host, as it may be possible to attack other containers via the network.

c. Protecting the host (and the applications inside it) from containers

The third category consists of protecting the host from the containers. An infected host may pose threats for other containers run on it.

d. Protecting containers from the host

This final category can partly be seen as an extension from the previous one but is not so by definition. As the host can be compromised by means other than containers. It may even be possible that the host was malicious to begin with. In the data sharing applications, we dismiss this last possibility because we assume that the third party that runs the container is trustworthy. In other words, this category is out of the scope of this research.

Container Application Container Application Container Application Inter Container Container to Host Host to Container Application to Container Host

Figure 3.1: Schematic representation of the componental ordering

3.2.2 Ordering by Stages

The second set of categories can be described as a set of threats ordered by the different execution stages they can occur in. This ordering is specific to the data sharing models used by DMPs.

I. Data in Storage

This means that the data is safely stored and only authorised parties have access to the data. Data should preferably be stored encrypted so that even if the data is compromised, it cannot be used.

II. Data during distribution

During the distribution from storage to a computing cluster, the data cannot be read, copied or seen at all. To accommodate this, the connection should be secure and eavesdropping should be prevented. Finally, only the authorised parties receive the data.

(17)

III. Security during computation

During computation, only authorised algorithms can work on the authorised data. This can be (partly) facilitated by encrypting the data and only providing access keys to authorised algorithm containers.

IV. Data usage policy

Data usage should conform to the agreed-upon policy, any deviation will be classified as malicious behaviour. No copies should be made of private data. Data is not allowed to be distributed in order to give a party control over their data. It is difficult to facilitate absolute security surrounding the safety of the data when computation is performed in the algorithm-providing party.

3.3 Scenario Definition

Our scenario is about data aggregation from multiple distributed locations. A containerised algorithm is running on a platform. It operates on data from multiple sources with permissions from the owners. The idea is to make sure that the algorithm can only access and compute on the data from authorised databases and not leak them to other parties. A schematic representation can be found in Figure 3.2. We will define the scenario’s policy in Section 3.3.

Location 1 Algorithm Location 3 Data Object Data Object 1 Location 2 Data Object 1

Figure 3.2: A schematic representation of our scenario: Data objects are copied to the algorithm where computation takes place. The data originates from the data location pods. The location pods running are numbered one to three. Data object 1 from location 1 and data object 1 from location 2 are authorised to be sent over to the algorithm pod for computation.

Policy definition

1. The algorithm provider is not allowed to leak the data or the result to any other parties.

2. The algorithm is authorised to access and use Data object 1 from location 1.

3. The algorithm is not allowed to access data object 1 from location 1

4. The result must only be available to the algorithm provider.

5. The data objects are directly transferred from the data provider to the algorithm provider.

6. The data providers must provide the qualified data to the algorithm based on the made agreement.

(18)

7. The algorithm provider can only use the data for the specific algorithm defined in the agreement.

8. When the data has arrived at the algorithm provider. The data used in computation is only allowed to be used in computation and must be deleted afterwards.

3.4 Threat model for the Scenario

Now we can identify a list of possible threats which can be found in Table 3.1. As mentioned before, threats that come from compromised or unsafe hosts are out of the scope of this research and therefore will not be named. This list also assumes that Docker itself does not have any vulnerabilities out of the box and uses the default security configuration. It is also assumed that the images are retrieved via secure and trusted means, meaning that image registries aren’t spoofed.

Now we make this threat table more concrete to the scenario by using examples specific to the scenario described in Section 3.3.

1. In the process of making images, errors can be made, which may result in security vulner-abilities.

2. Privilege configuration errors can result in an insecure environment. In the scenario, this might mean that the algorithm can get access to unauthorised data by abusing the privileges it has been given. For example, by opening up a network connection to unauthorised locations.

3. Images can be made or gathered in two specific ways. They can either be pulled from an image registry or made by the user themselves. When pulling images from a registry, it might be possible for the creator of the images to have hidden a back-door in the image. Using this, it is possible to extract sensitive data from the pods like data objects.

4. Data objects are moved over the network. This opens up possibilities for executing man-in-the-middle attacks [23]. In our scenario, this might mean that an unauthorised data provider or location in fig 3.2 pretends to be an authorised data provider. This could mean that it intercepts the data transmission to the algorithm’s location, modifies it and relays the modified data to the algorithm. Alternatively, it could send the algorithm false data while pretending to be the authorised party.

5. Using the same method as described above, it might be possible to access a data object as an unauthorised receiver.

6. According to the policy, the algorithm-providing party is not allowed to redistribute any data. The algorithm is allowed network access in order for the necessary data objects to be moved to the algorithm’s location for computation. This network access could be abused by the algorithm by sending the data from a specific location (say, location 1) to another location (location 2 or 3).

7. A malicious algorithm might be able to request data objects that it’s not allowed to utilise. This is especially important in a situation where there are more algorithms running on the same cluster. The existence of this threat depends on the implementation of the DMP. When using encryption, authentication tokens or other means of limiting access to unauthorised algorithms, this threat can be prevented.

(19)

T able 3.1: Threat Mo del of the sc enar io defined in Section 3.3. Num Threats Commen ts Defense Mec hanism Mitigation lev el Stage Comp- onen t 1 Image V ulnerabilities Errors in the images themselv es, not made maliciously E.g. Securit y bugs V ulnerabili ty scanning using existing framew orks (e.g. [24]) Detect II I a 2 Privilige Configuration Errors Do ck er con tainers should run with as least as p ossible privileges so that only the sp ecified data ob je cts are access ib le. the ro ot user in a con tainer is the same as the host ro ot b y default User namespace s, Capabilit y , restriction Prev en t II I a 3 Un trusted im ages Malicious images migh t con tain a preinstalled bac kdo or Only use v erified trusted images b y using a v erified trusted image registry Prev en t II I a, b 4 Data Mo dification during transmission Can p ossibly b e c au sed b y man in the middle attac ks. A data pro vider could p ossibly pretend to b e another data pro vider and send them false or mo dified data Hash v erification Detect II b 5 Data ob ject is accessed b y unauthorized receiv e rs Unauthorised parties can p ossibly use man in the middle attac ks to get a hold of the data Blo ck unauthorised comm unication, VPN, secure com m unication proto cols, Encryption, Namespaces Prev en t II b 6 Authorized part y redistributes the data Algorithm pro vider can p ossibly leak the data to other unauthorised parties Net w or k monitoring Detect IV NA 7 Unauthorised algorithm runs on the dataset Algorithm from the same pro vider can p ossibly gain access to the data ob je ct that is u sed b y the authorised part y V erify hash algorithm con tainer, System acce ss con trol, Monitoring metrics duri ng execution (e.g CPU usage) Prev en t II I b

(20)

(21)

CHAPTER 4

Threat Analysis System and Threat

Handling System

In this chapter, designs for the Threat Analysis System (TAS) and Threat Handling System (THS) will be introduced. Our risk estimation method is generalised to create the Threat Anal-ysis System. This analAnal-ysis system will analyse scenarios before data exchange and present coun-termeasures and metrics which can be used to prevent and detect threats, respectively. The Threat Handling System can be used to detect and prevent threats in real time. This system will be validated in Chapter 6.

4.1 Threat Analysis System

This section proposes the design of a generic system which can identify threats and generate metrics and countermeasures to combat these threats. This Threat Analysis System (TAS) is meant to be deployed a priori to the computation allowing a DMP to deploy appropriate techniques and metrics for prevention or monitoring before the computation starts. A high-level overview of the TAS can be found in Figure 4.1.

Threat Identiﬁcation Scenario DMP Possible Threats DMP Metrics Associate Threats with Metrics & Countermeasures DMP Counter-measures Mapping of Threats to Counter-measures & metrics Threats Speciﬁc to Scenario Implement countermeasures & measure

metrics _{countermeasures}Evaluate which

and monitoring metrics will be used

(22)

4.1.1 Threat Identification

The system we propose here is based on the method used in Chapter 3, which contains a risk estimation which corresponds to a specific scenario. The TAS is a generalisation of the method used there. In a DMP, an archetype which covers the party’s trust level and other needs have to be chosen. This archetype can then be converted to a scenario by filling in implementation details. The DMP will have a list of possible threats which are implementation independent. The TAS will analyse this scenario to extract the threats which are applicable to the specific scenario:

In this formula, the identification function, fidentif ication, corresponds to the threat

identifi-cation block from Figure 4.1. It outputs a subset of threats from the DMP list, and the output is specific to the given scenario.

4.1.2 Threat Association

These threats corresponding to the given scenario are then fed into another block. The block takes two inputs, Metrics and Countermeasures supported by the DMP so that they can be used during computation. This block resembles a function which associates the specific threats with possible metrics and countermeasures supported by the DMP. The function corresponds to the following formula:

fassociation(threatsscenario, CounterM easuresDM P, M etricsDM P) =

T0 T1 . . . Tn      CM0,0 CM1,0 . . . CMN,0 CM0,1 CM1,1 . . . CMN,1 .. . ... . .. ... CM0,N CM1,N . . . CMN,N           M0,0 M1,0 . . . MN,0 M0,1 M1,1 . . . MN,1 .. . ... . .. ... M0,N M1,N . . . MN,N     

The output of this function is a mapping between the specific threats and the n corresponding countermeasures and metrics, respectively. Threat T represents the identified threats for the specific scenario. CMx,y denotes the xth countermeasure that can be taken to prevent the

associated threat Ty from occurring. Mx,y is the xth metric that can be monitored in order to

detect threat Ty. The first matrix corresponds to the threats for the specific scenario. The second

matrix contains for each threat the associated countermeasures. The third matrix consists of metrics which are associated with the threats.

4.1.3 Evaluate countermeasures and metrics

Now that we have all possible countermeasures and metrics for each threat, we need to choose which one the DMP should utilise. It is possible that a single countermeasure solves multiple threats. In that case, it is more efficient to implement one countermeasure which covers multiple threats. When there is a countermeasure for a specific threat, it does not make sense also to monitor the metrics associated with it, unless that countermeasure only prevents the threat partially. Using this information together with possible preferences from the DMP party, we can create a list of countermeasures and metrics that will be implemented and measured.

(23)

4.2 Threat Handling System

At the beginning of this chapter, we described a Threat Analysis System (TAS) which is able to set up countermeasures and monitoring metrics based upon a scenario and lists of known threats, metrics and countermeasures before computation. Here we propose another system which aims to detect and prevent threats while they are occurring.

4.2.1 Outline

Our Threat Analysis System can only find out which threats correspond to which countermea-sures and metrics before computation. The DMP can then implement these countermeacountermea-sures. The metrics which can be used to monitor the threats that cannot be prevented, cannot be used immediately. First, we need to retrieve the metrics associated with the parts of the systems that require monitoring. We can then monitor these metrics for suspicious behaviour during the execution of the algorithm. We will present the details of detecting suspicious behaviour in Section 4.2.2. This is the heart of the proposed Threat Handling System (THS), to monitor for suspicious behaviour and take action accordingly. The preferred action would be to prevent the threat from escalating further in real time if that is possible for the threat in question.

In Figure 4.2, a flowchart containing a high-level outline of this system is presented. First, the results of the Threat Analysis System are utilised such that the countermeasures are initialised and the metrics corresponding to threats that cannot be prevented can be measured. Then the computation can start. The monitoring system will then look at each metric. If any threshold has been reached, prevention or flagging procedures will be called based on if real-time prevention of this threat is possible. Finally, based on the digital agreement, the program is either terminated or prolonged.

(24)

Set up countermeasures and monitoring of metrics based on

TAS

for each metric m: has m crossed

any threshold Start execution of computation

Pre-Execution Phase Execution Phase Yes No Continue Operation Is further escalation of threat(s)

associated with this metric preventable?

Flag algorithm as malicious and send monitoring data

over to parties No

Should execution continue according to digital

agreement? Yes

Terminate program and invalidate result No

Yes Activate prevention

mechanism

Figure 4.2: High level flowchart describing the proposed workings of the Thread Handling System

4.2.2 Defining and detecting suspicious behaviour

For this THS to work, it is necessary that with the metrics we measure, it can be determined whether or not an algorithm is malicious. In order to determine that we need to define suspi-cious behaviour. In our case, we define suspisuspi-cious behaviour as evidence in our metrics that an algorithm is committing a malicious act, ergo, a policy breaching act. There are different types of metrics that can be used to detect anomalies with. This research will focus on system calls, CPU-, memory and network usage.

We hypothesise in this paper that it is possible to detect anomalies using our metrics that may indicate malicious behaviour. We can construct a profile of the (type of) algorithm in question [8]. This profile shows us measurements of the metrics in a ’normal’ situation, in which no malicious elements are present. We can then use this profile to compare our running or candidate algorithm against. Based on established literature, there are three main methods to

(25)

create such a profile. The first is by running a non-malicious version of the program, capture the normal values for the metrics we use. These values we can compare to the running or candidate algorithm during computation. Literature suggests that this can be done by examining sample runs of the program or similar type of program [21]. It is possible that the program used to create the profile is also malicious. Another problem of creating a policy by training is that training does not cover all possible behaviours of a program. This means that the policy created is overly restrictive. The advantage is that this process can be automated. The second method is to write a profile by hand. The main downside to this method is that because it is not automated, it is difficult to estimate the exact values of the metrics. This method can be used when the system does not rely on value-based metrics such as the frequencies of system calls that are used. An example of a non-value based metric is monitoring based on the types of system calls used.

The third method is static analysis. In this method, the program is analysed without being run. This method can be automated like the first method. It is said [21] that this method is not often used compared to the other methods.

In this research, we use the training method to create our normal profile. However, the THS that we outline can use any kind of method used to create a normal profile. We establish a threshold which indicates how much the metric of the candidate algorithm can deviate from the profile. If the threshold is reached and there is nothing to prevent the malicious behaviour in real time, the algorithm is flagged as malicious. If there is a way to prevent the malicious behaviour in real time, then it will be utilised. In both cases, the parties involved will receive an alert together with the monitoring data indicating that something might be wrong.

We can compare the normal algorithm and the candidate algorithm by calculating the dif-ference between the two. Then based on a threshold, we can determine whether the algorithm is malicious or not. For example, we have CPU usage as our metric. Then the action we take follows the following formula:

(

F lag CP UN,t− CP UC,t> T hreshold,

Continue else

In which CP UN,t is the CPU usage value for the normal algorithm on timestamp t and

CP UC,tfor the candidate algorithm.

We validate this method by implementing our described scenario from Section 3.3 and con-ducting experiments in Chapter 6. These experiments will be focused around the presence of a malicious algorithm which aims to break policies. We will then show that we can flag algorithms for possible malicious behaviour with these metrics.

This method only works directly for metrics which have a value for a set of discrete time steps. An example of a metric which does not have this is system calls. Monitoring calls to the Linux system are made could give us insight into what exactly the program is doing. This metric will also be tested in Chapter 6. There are multiple ways in which looking at system calls can be useful. One way is to look at the frequency histogram. The histogram can be expressed in a vector which holds a frequency value for each system call. We hypothesise here that some threats can be detected by analysing the frequency of specific system calls.

Another way of looking at system calls is to look at the arguments of the system calls. This might reveal traffic to a specific IP address of an unauthorised location. This data can be used in possible prevention algorithms, for example, by blocking traffic to that IP address. Another possibility is restraining certain functions of the container using cgroups [10].

(26)

(27)

CHAPTER 5

Scenario & Monitoring Implementation

This chapter features the scenario implementation and the implementation of the monitoring tools used to validate the workings of our Threat Handling System in Chapter 6.

5.1 Scenario implementation

This section describes the implementation of the described scenario from section 3.3. This implementation features two types of pods: a database and an algorithm. The first pod type consists of a single Python container and a text file saved in the image itself. This container acts a data transmitter which sends the txt file using low level Python sockets1_{. As illustrated}

in Figure 3.2, two of these database pods exists, each containing a different txt file. The first Pod has the book Moby Dick2 _{which is sent over to the algorithm container. The second has}

the complete works of William Shakespeare3_{which is not authorised to be used in the algorithm}

container.

The other pod type embodies the algorithm block from 3.2. This Pod consists of two contain-ers running Python, the first of which can receive data from the network via the aforementioned Python sockets. This Pod acts as a server while the database pod can connect to it. The receiver appends the incoming text to a txt file. This txt file acts as a medium through which the data will be available to the other container. In this other container, the actual computation takes place and will henceforth be called the algorithm container. The algorithm container implements a simple word frequency algorithm. The word frequency computation starts when the receiver container has received the entire file. The receiver indicates that all the data has been received by creating another file in the same directory. The result of the algorithm is saved in another txt file.

The txt’s containing the data are saved in a Kubernetes volume. Kubernetes volumes are filesystems that can be mounted into containers. These filesystems are often used to prevent saved files from being lost when the container crashes. There is a wide variety of different types of filesystems that can be used in Kubernetes volumes. Some cater to a specific use case, such as native support for specific applications (e.g. git, elasticsearch). Other sport special features such as being able to save files directly to the RAM, which can be useful for high-performance computing applications. The type of filesystem used in the scenario is the emptyDir volume. It is first created when a pod is assigned to a node and exists as long as that Pod is running on that node.

The scenario makes use of two emptyDir volumes. One is used to save the data received from the database pods, and the other is used to save the result txt file to. The first of the two is mounted to both the receiver container and the algorithm container where the computation takes place. The resulting volume is only mounted to the algorithm pod.

1_{Python Sockets: https://docs.python.org/3/library/socket.html} 2_{Moby Dick: http://www.gutenberg.org/cache/epub/2489/pg2489.txt} 3_{William Shakespeare: https://www.gutenberg.org/files/100/100-0.txt}

(28)

Pod 1 txt File 1 Pod 2 Algorithm Container Pod 3: Computation txt File 1 Data Receiver Container TCP Socket txt ﬁle Result File Data Transmitter Container Data Transmitter Container

Figure 5.1: An overview of the implementation of the scenario. The scenario consists of three pods. The two database pods each house a txt file which is sent over to the third Pod, the one which houses the algorithm. Data is received using the aforementioned socket connection by the data receiver container. It is then saved in a txt file so that the container that runs the computation can access it. After computation, the result is saved to another file.

5.2 Monitoring Tool Implementation

As mentioned in 2.3.2, Minikube is used as the testing environment of this research. The advan-tage of using Minikube is that it does not require multiple servers to run our proof of concept application. The drawback is that most tools used in production (e.g. Sysdig, Falco) for monitor-ing do not work on Minikube. Minikube runs in a VM, which means that conventional methods of getting information from the Kubernetes API’s are not directly possible from the host sys-tem. Installing these inside the Virtual Machine’s Operating system is difficult due to the fact that Minikube is implemented using Tiny Core Linux, a lightweight Linux distribution that is meant to reside entirely in the Random Access Memory. This meant it was necessary to look for alternatives in the monitoring space.

5.2.1 Monitoring System calls

There are two options to monitor system calls of the pods running inside Minikube. The first option is to run the monitoring software inside the VM’s operating system. This is difficult as the container which we want to monitor is not directly available to us since Kubernetes runs it for you inside a pod. This means a handle to the said Pod is not easy to retrieve. As mentioned before Minikube runs on Tiny Core Linux, which does not have all features found in a normal Linux distribution. This makes it even more difficult to run traditional system monitoring tools on the VM’s operating system.

The second option is to run the system monitoring tools inside a container. This has some important negative side effects. Many system monitoring tools have to be run with extra capa-bilities in order to run correctly on an isolated system such as a container. Now there are again two options, one is to run the monitoring tools inside the container the algorithm runs while the other is to run the monitoring tools in a separate container. The latter is preferable to the first one from a security perspective. Since the container would be not be provided by the user, it does not create a direct security problem. The problem with running monitoring tools in a separate container is that the problem concerning the availability of a handle to the container persists. This is why we run the monitoring tools inside the container that runs the algorithm

(29)

since it is not important for this proof of concept to create a fully secure environment. In a real-world scenario other monitoring tools that can interface with Kubernetes can be used, the use of which is excluded due to the usage of Minikube.

strace

The actual monitoring tool used to monitor the system calls of the container is strace [26]. strace works like many debuggers by using the ptrace Linux kernel feature [20]. strace can be attached to programs using their PIDs or by running the program with strace directly. As we want to monitor the entire run-time of the algorithm, the program is run by strace directly. This is also the reason that we cannot run strace in a separate container. It is possible to retrieve the PID of the program running inside the algorithm container and then attach strace to it. However, this means that the system calls that occur at the very start of the program are not logged. As these calls may indicate malicious behaviour it is desirable to capture all of the system calls done by the program.

Running the Python script we use in our container with strace gives a time-stamp, the function that was called, the arguments the function was called with and the return value of every system call that is done during the Python script’s lifetime.

5.2.2 Monitoring container statistics

Since normal monitoring tools that interface with Kubernetes are not supported on Minikube we can look for ways to bypass Kubernetes entirely and go a layer deeper. A viable alternative can be found in Docker Stats [7]. Docker Stats is able to give CPU usage, memory usage, network IO and Block IO and the number of PIDs on a container level even when managed by Kubernetes. CPU and memory usage is reported as a percentage of the host’s CPU. Network IO gives the amount of data the container has sent. Block IO gives the data read and written from Block devices on the host. Lastly, PIDs gives the number of processes or threads used by the container. Normally Docker Stats gives all of these statistics on a container basis however, Docker Stats reported the exact same network usage of all containers within the same Pod. This means that the Network IO was given on a per pod basis.

The downside of using Docker Stats is that it has no time-stamp functionality. Docker Stats can generate an uninterrupted stream of statistics, meaning that it gives the statistics for each container specified in the optional container name argument or all of them if that argument is not specified. It is also possible to turn off streaming and only retrieve the first result. In order to plot the development of the different statistics, Docker Stats is called in a loop without the stream functionality with a generated time-stamp in quick succession. This way the interval between the statistics does not matter since the time-stamp indicates the timing of the retrieved statistics.

(30)

(31)

CHAPTER 6

Threat Handling System Validation

This chapter aims to assess whether it is possible to implement the Threat Handling System using the aforementioned metrics. This is done by first implementing malicious algorithms. Then we will analyse the resulting measurements corresponding to the CPU, memory, network and system call metrics to assess whether a system will be able to distinguish normal from malicious behaviour based on these metrics.

6.1 Method

6.1.1 Malicious Algorithms

As described in 2.4, the algorithm sent from the algorithm-provider has to conform to the digital agreement signed by the two parties. Based upon the policy of the scenario from Paragraph 3.3 and the risk estimation from Paragraph 3.1, we can create malicious algorithms. For this scenario, two malicious algorithms have been implemented. Both of these algorithms were designed to retain a close similarity to the original normal algorithm.

Malicious Algorithm 1: Saving the data as the result

In order to get access to the data, the algorithm provider may create an algorithm in which the received data is saved instead of saving just the result, as was stated in the policy. In this malicious algorithm, the data received from the databases is saved as the result of the computation. This result is visible to the algorithm provider, so it is able to distribute the data. This algorithm is malicious since it breaches policy clause number 8 directly. In the risk analysis, this corresponds to threat number 5: Data object is accessed by unauthorised receivers.

Malicious Algorithm 2: Distributing the result to another party

Instead of first saving the result and then retrieving it, it is also possible for the algorithm provider to provide an algorithm which immediately sends over the data to an unauthorised receiver. In this malicious algorithm, the data is sent to pod number 1 from Figure 5.1.

6.1.2 Test Setup

All experiments were conducted on a Linux machine running Ubuntu 18.04.2 LTS. The machine is outfitted with 62 GB of system memory and an Intel Xeon E5-1620 v4 CPU which has 4 cores clocked at 3.5 GHz and 8 threads. Minikube is configured to run with 3 cores and 4096 MB memory. The version of Minikube that is used for testing is v1.0.0. All containers run Python 3.7 from the Docker Hub repository. The containers running strace also use Ubuntu version 18.04. Virtualbox version 6.0.8r130520 is used as Minikube’s VM-driver.

(32)

6.2 Malicious Algorithm 1 results

6.2.1 Call Analysis

0 50 100 150 200 250 access arch_prctl brk close dup execve exit_group fcntl fstat futex getcwd getdents getegid geteuid getgid getpid getrandom getrlimit getuid ioctl lseek lstat mmap mprotect munmap open read readlink rt_sigaction rt_sigprocmask select set_robust_list set_tid_address sigaltstack stat sysinfo write

Normal Algorithm Malicious Algorithm 1

Frequency Histogram of System calls

Frequency

System calls

Figure 6.1: Full System call frequencies of the non-malicious algorithm and malicious algorithm 1: Save data as result. The system calls were sorted alphabetically.

This system call histogram features the same range of calls over both algorithms. This means that for this malicious algorithm, there is not a single call which on its appearance can indicate malicious behaviour. The top 15 system call frequencies from Figure 6.1 are selected and then sorted alphabetically in order to see whether we can draw conclusions from a subset of system call frequencies, to filter out the noise.

(33)

0 50 100 150 200 250 brk close fstat getdents ioctl lseek mmap mprotect munmap open read rt_sigaction select stat write

Frequency

System calls

Figure 6.2: Top 15 system call frequencies of the non-malicious algorithm and malicious algorithm 1: Save data as result. The system calls were sorted alphabetically after the top 15 were selected.

In figure 6.2, we show the top 15 system calls in terms of frequency sorted alphabetically. The top 15 system calls of the normal and the first malicious algorithm consist of the same system calls in both algorithms. Only their relative ordering is different. This frequency histogram of the system calls shows the similarity between the two algorithms. There are 5 system calls which differ significantly between the two algorithms. The following 5 system calls might, therefore, indicate malicious behaviour:

• mmap • munmap • select • stat • write

Of the 5 system calls which could indicate malicious behaviour, stat and select are system calls that can be associated with a variable part of the program. Firstly, the algorithm container has to wait until the entire file has been transferred to the receiver. It does so by checking whether the file exists created by the receiver container which indicates that the receiver has in fact received the entirety of the data. stat is called to give statistics over a file [15], in other words, it can give information about the existence of such a file. select allows a program to check whether a file descriptor has new data [15], select is called in order to receive messages from a socket. We can then conclude that the frequency of stat and select depends on how fast the data is sent over. This rate can vary due to other system load or environment events. The usage of these calls cannot be attributed to any malicious behaviour.

mmap and munmap are system calls which are used to map and unmap files into memory, respectively [15]. For these two calls, it might be possible to associate them or their frequency

(34)

with malicious behaviour in other scenario’s. Mapping large spaces of memory may give an indication of what actually happens in the algorithm.

The write system call is used to write data to a file descriptor [15]. This could mean that this file descriptor is, in fact, a regular file. If that is the case, then this system call may indicate malicious behaviour, which in this case is writing the data to a file.

6.2.2 CPU and Memory Usage Analysis

0.0 0.5 1.0 1.5 2.0 Time in minutes 0 2 4 6 8 10 12 14

CPU usage in percentage

Figure 6.3: CPU usage as percentage of VM assigned CPU’s over execution time

0.0 0.5 1.0 1.5 2.0 Time in minutes 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11

Memory usage in percentage of total memory

Figure 6.4: Memory usage as percentage of VM assigned memory over execution time

The CPU and memory usage of the two algorithms, found in Figure 6.3 and Figure 6.4, respectively, are relatively similar, meaning no significant differences between the two algorithms. The differences that do exist are on a very small scale and do not indicate suspicious behaviour. The execution time of the first malicious algorithm is also longer than the normal one.

6.2.3 Network Analysis

The outgoing network traffic, Figure 6.5 shows a slight increase over the malicious algorithm. Since Docker Stats measures all network traffic, this increase can possibly be attributed to the Python sockets using a TCP connection.

0.0 0.5 1.0 1.5 2.0 Time in minutes 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Outgoing network traffic in bytes

1e5

(35)

6.3 Malicious Algorithm 2 results

6.3.1 Call Analysis

0 5k 10k 15k 20k 25k 30k 35k access arch_prctl brk close connect dup execve exit_group fcntl fstat futex getcwd getdents getegid geteuid getgid getpeername getpid getrandom getrlimit getsockname getuid ioctl lseek lstat mmap mprotect munmap open read readlink rt_sigaction rt_sigprocmask select sendto set_robust_list set_tid_address sigaltstack socket stat sysinfo write

Frequency

System calls

Figure 6.6: Full System call frequencies of the non-malicious algorithm and malicious algorithm 2: Distribute data to another party. The system calls were sorted alphabetically.

This graph shows us that the sendto system call is called relatively often by the second malicious algorithm but not by the first and normal algorithm. This system call is used to send messages over sockets [15] and can, therefore, be used to flag this algorithm as a malicious one.

(36)

0 50 100 150 200 access arch_prctl brk close connect dup execve exit_group fcntl fstat futex getcwd getdents getegid geteuid getgid getpeername getpid getrandom getrlimit getsockname getuid ioctl lseek lstat mmap mprotect munmap open read readlink rt_sigaction rt_sigprocmask select set_robust_list set_tid_address sigaltstack socket stat sysinfo write

Frequency

System calls

Figure 6.7: Non-outlier System call frequencies of the non-malicious algorithm and malicious algorithm 2: Distribute data to another party. The system calls were sorted alphabetically. The sendto system call was removed from this graph.

(37)

0 50 100 150 200 brk close fstat getdents ioctl lseek mmap mprotect munmap open read rt_sigaction select stat write

Frequency

System calls

Figure 6.8: Non-outlier top 15 System call frequencies of the non-malicious algorithm and mali-cious algorithm 2: Distribute data to another party. The system calls were sorted alphabetically after the top 15 system calls were selected. The sendto system call was removed from this graph in order to observe the other frequencies.

In malicious algorithm 2, the top 15 system calls again correspond to the same system calls called by the non-malicious algorithm with the exception of the sendto system call which is again not shown in Figure 6.8. Judging from system call results of the top 15 of the two algorithms, it might be possible to determine whether an algorithm is malicious while only looking at the top X system calls, in which we chose X = 15 arbitrarily.

6.3.2 CPU, Network and Memory Analysis

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time in minutes 0 20 40 60 80

CPU usage in percentage

Figure 6.9: CPU usage as percentage of VM assigned CPU’s over execution time

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time in minutes 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225

Memory usage in percentage of total memory

Figure 6.10: Memory usage as percentage of VM assigned memory over execution time

(38)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time in minutes 0.0 0.5 1.0 1.5 2.0 2.5 3.0

Outgoing network traffic in bytes

1e6

Figure 6.11: Cumulative sent network data in bytes over execution time

The CPU usage (Figure 6.9) and the outgoing network traffic (Figure 6.11) have a large peak just before the end of the program. This timing corresponds to a large part of the sendto system calls. The memory usage can be found in Figure 6.10. There is no significant difference between the memory usage of the normal algorithm in comparison with the second malicious algorithm.

6.4 Discussion

In this chapter, we have shown that we can detect suspicious behaviour on our two malicious algorithms with real-time metrics. This is done by applying the procedure we described in Section 4.2 to system calls, CPU-, Memory- and network usage.

Our first malicious algorithm, which saves the entire data as the result of the computation was detected by looking at the frequency of the write system call.

Our second malicious algorithm sends the entire data over to another pod. This was detected by the frequency of the sendto system call as well as large spikes in CPU and network usage. When the THS would detect these anomalies, it could prevent the threat from further escalating by analysing the arguments of the system calls. It is then able to extract the IP address of the pod it is not supposed to send data to and then block this IP address. This can, for example, be done by utilising the namespace feature in Docker.

These results provide a proof-of-concept for the essential parts of the Threat Handling System, meaning that it can be integrated and used in a DMP in real time.

(39)

CHAPTER 7

Related Work

7.1 Container Security and DMPs

Sari Sultan et al. [23] provide an overview of the current work in Container Security. To the best of our knowledge, this is the first work that provides such an extensive generic security overview about containers since Docker popularised containers in 2014. They introduce four threat classes (in the original paper called use-cases) which cover the security requirements within the container-host threat landscape. We cover and use this classification together with their list of threats in Section 3.2.1 to create our threat analysis. In this chapter, we introduce another classification based around DMP infrastructure, which we call the ordering in stages. This study sparked the idea of having a TAS analysing threats to find out which are applicable to a specific scenario. Based on this study, we also determined whether the threats were preventable or detectable, which in turn brought us to the proposal of a THS which can detect threats in real time.

S. Cisneros-Cabrera et al. [5] presents a gap analysis in existing DMP platforms. They establish the lack of (automated) risk evaluation and monitoring capabilities (among others) in current DMPs. The TAS and THS we propose to fill in these gaps by providing a systematic method to analyse, evaluate and monitor threats.

M. Almorsy et al. [2] provide a framework similar to our TAS and THS by proposing risk identification, assessment, monitoring and undertaking security actions based on monitoring. The differences between the two are that our system is based on the container ecosystem and is developed specifically for DMPs. While their framework is aimed at general cloud services, this means that our threats are not only dependant on threats relating to containers (or our specific framework) but also to threats coming from the breaching of policies made by the DMP parties. The paper proposes a risk identification process. However, they do not go into detail how such a process may look like or even be automated at all.

7.2 Intrusion detection systems

The following works outline alternative methods for intrusion detection based on system calls. M.A. Hiltunen et al. [21] present a way to use authentication in system calls. A barrier in the use of a monitoring system in DMPs is the pro-activeness necessary for the different parties to react to alarms from monitoring systems. This is solved by rejecting system calls which are not allowed based on the policy during the call. This policy can be created in the same way(s) outlined in Section 4.2.2. It is interesting to see if a similar method can be used in our THS in order to prevent unwanted system calls.

D. Mutz et al. [19] introduce a technique to use multiple pieces of information, such as system call arguments, to determine whether an event in system calls is part of an attack using a Bayesian classifier.

A. S. Abed et al. [1] employ a frequency-based bag of system call technique for anomaly detection in Linux containers. A classifier using a sliding window technique is trained and tested

(40)

to detect anomalies. In our research, we propose to use a threshold based technique for detecting anomalies, using both system calls and other metrics which are not compatible with the way they train their classifier. However, it would be interesting to see if this method can also be used to monitor system calls in our THS.

S. Srinivasan et al. [25] propose an intrusion detection system based on n-grams of system calls. This sequence-based method is, like our research, also focused on anomaly detection in containers. This paper is based on research by A.S. Abed et al. [1]. It would be interesting to see if this method can also be applied to monitoring system calls in our THS.

(41)

CHAPTER 8

Conclusion

Digital Market Places are membership organisations which facilitate digital collaboration between businesses. This collaboration involves signing in a digital agreement in which parties express their wishes of what happens to the data shared to create a safe environment for competitors to collaborate. This research proposes systems which provide security in this environment by iden-tifying threats beforehand and reacting to real-time threats. This is done through the following research questions.

1. How can we establish a generalised risk estimation method for digital market place scenar-ios?

• What security technologies can be adopted to prevent breaches of the policy during execution?

2. How can we construct a system which handles the prevention and detection of threats in real time?

• What metrics can we monitor to detect breaches of policy during execution?

The first research question was answered by creating a risk estimation specialised to a defined scenario. For this risk estimation, we established how Docker works from both an architecture and security perspective based on established literature. The method we used in our risk estimation was then generalised in the form of a system for risk estimation system in DMPs. This Threat Analysis System (TAS) works by first identifying the threats for the scenario that is going to run. This can be done by analysing the scenario together with threat information provided by the DMP itself. These threats can then be associated with metrics and countermeasures supported by the DMP. In case there are multiple threats associated with the same countermeasure or metric, the most efficient countermeasure or metric should be chosen. These can then be implemented during the initialisation phase of the scenario.

For the second research question, a Threat Handling System (THS) was proposed. This system would be able to detect and then react to threats happening in real time. We proposed a method of detecting malicious behaviour, which we consequently validated. This method consists of creating a profile of what a normal algorithm would look like. During the execution of the algorithm, we measure the values of metrics which we then compare to the established profile. This is done by using establishing a threshold, the crossing of which will most likely be caused by malicious behaviour. In Chapter 6 we provide a proof of concept for this method in which we implement the scenario used in the TAS as a demo in Kubernetes and Docker. To do this, we created two malicious algorithms on which we tested four different metrics: system calls, CPU-, memory- and network usage. Using these metrics were able to identify the behaviour of malicious algorithms. This method can be extended to work in real time.

The TAS system we outlined will use this method to detect whether an algorithm is malicious. If the threshold is reached and there is nothing to prevent malicious behaviour in real time, the

(42)

algorithm is flagged as malicious. If there is a way to prevent malicious behaviour in real time, then it will be utilised.

The Digital Market Places and similar projects are still on the rise, so there are many future works to be done. Following the line of this research, the TAS and THS can be implemented in full and incorporated in a DMP framework. With such an implementation, more research can be done in the direction of finding other ways of detecting specific behaviour or anomalies in the execution of programs in DMPs and maybe even generalising these methods towards cloud services. Preferably this can then be tested with a full Kubernetes cluster or even a different container orchestration model like Singularity1_{which is more aimed towards High-Performance}

Computing.

Towards Risk Analysis and Threat Handling in Digital Market Places

Bachelor Informatica