Protocol containers for data transfers between DTNs

(1)

Bachelor Informatica

Protocol containers for data

transfers between DTNs

Pim Paardekooper

June 22, 2020

Supervisor(s): dr R.S. Cushing

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

In a world where data is getting bigger and more distributed, creating a network of servers between facilities, where data is stored, that are dedicated for the purpose of wide area data transfer can make data much better accessible. These servers exist and are called Data Transfer Nodes (DTNs). Research about how to tune these nodes to utilize their resources better and increase their throughput has already been done.

Yet how to program these DTN networks is still an area of research and is something we will be focusing on.

Currently, it is more the case that whatever software is running on these DTNs you are forced to use. We will try to gain better programmability with the use of containerization to get the flexibility to change protocols, fairness algorithms and application-specific software such as caching on the DTNs more dynamically. How effective this is gets quantified in different ways. For the protocols, we investigate the overhead of dynamically switching protocols in real time. For fairness algorithms, we look if we can control the usage of bandwidth on a per container bases and if we could prioritize certain container traffic. Lastly, we demonstrate a caching application use-case for programming the underlying infrastructure.

Out of the results, we concluded that the overhead of the use of containers is minimum. Also, we saw that maintaining fairness and prioritizing traffic can be done, without sacrificing bandwidth utilization.

(4)

(5)

Introduction

Research data for projects as cosmology and climate research are very large, in the multi-petabyte range. To distribute those data sets dedicated servers for wide data transfer are needed. Data Transfer Nodes (DTNs) are such dedicated servers. DTNs are typically PC-based Linux servers with high-throughput networking and storage. They have access to local storage or are directly mounted to a high-speed parallel filesystem and run software tools designed for high-speed data transfer to remote systems [1]. They are a key component of a Science DMZ [2]. A Science DMZ is part of a network near a campus or laboratory that is optimized for high-performance scientific applications. The need for this arised when the traffic through the ESnet, a high-performance network that carries science traffic for the U.S. Department of Energy, increased several orders of magnitude over the years and is now in the tens of petabytes range. The Science DMZ is ESnet solution for this problem, Science DMZs is therefore mainly a US approach to science grids. The DTNs is where data transfer application run and therefore the place where the user interacts with Science DMZs. All Science DMZs are powered with DTNs and can form a science-driven, data-centric ”freeway system” or a data ”highway”. The Pacific Research Platform is a program that tries to develop such a ”highway” [3].

1.1 Petascale DTN project

That such DTNs need to be properly configured is researched in the petascale DTN project, where through changing the DTN configuration of existing DTNs and adding new nodes, the throughput went from an average 5 GB/s to 30-50 GB/s. Moving one petabyte went from taking a week to 48 hours [4]. This has a huge impact on the research being done.

The petascale DTN project made use of Globus online to send data between the DTNs. Still, this is only one of the many applications out there. In this research, we will look at how to set up data transfer applications, by deploying application protocols containers, for the whole extend of the transfer. We will use containers because DTNs are interconnected with each other and only work if they use the same software. To setup many DTNs without containers is infeasible. Also, the network is much better programmable with containers which is something we want to show as well.

These DTNs are a shared resource so we also have to look at how to maintain fairness with multiple application protocols and multiple containers running on the DTN server.

1.2 Data access for LIGO

A great example of a data transfer application that has been used is StashCache for the Laser In-terferometer Gravitational-Wave Observatory (LIGO). That conducted a three-month observing campaign. Those observations delivered the first direct detection of gravitational waves, with the use of the Open Science Grid (OSG) that provides a fabric of services for achieving Distributing

(8)

High Throughput Computing (DHTC) across dozens of computational facilities.

PyCBC is a software program to analyze gravitational data. It analyzed the LIGO data in two week blocks requiring hundreds of thousands of individual HTCondor batch jobs, each lasting at least an hour. First, it was done at one facility having a full copy of the LIGO data. But nowadays that is not feasible with the amount of data there is, therefore it streamed the data from a central location. This increases the number of sites the LIGO could utilize.

It uses caches with the StashCache infrastructure, to improve the throughput. Using DTNs caches can be setup on the DTN nodes. So to setup custom application protocols unto the DTNs can help make the DTNs work even better [5]. By being able to setup these caches with containers the DTN becomes better programmable what is something we want to show.

1.3 Ethical aspects

Our project tries to make DTN networks better programmable. This will give research facilities a better way to setup there own DTN node or nodes and join a larger network, which in turn will make research being done at those facilities better. Programmability has also the benefit of being able to do things as prioritization of certain traffic and is something that we will look at. Prioritization can help prioritize research that is most important to certain facilities. In this time especially being able to prioritize COVID-19 research would be something that people want.

1.4 Research questions

Our research focuses on the programmability of DTNs. Because DTNs are general-purpose, Linux-based computers, they can be programmed with custom software. For example by system administrators to deploy services or by application developers to deploy custom application-specific software such as caching or overlay networks. We will deploy these kinds of software with the help of containers.

Our main research questions are:

• How to deploy custom application-level protocols on DTN networks using containers? • How to manage usage of shared resources by containers namely, bandwidth?

(9)

CHAPTER 2

Theoretical background

A DTN as we said is typically a PC-based Linux server with high-throughput networking and storage and is configured for wide area data transfer. DTNs are often deployed in a Science DMZ. [1].

2.1 Science DMZ

Figure 2.1: Science DMZ concept [6]

The motivation of a Science DMZ is to split wide area data traffic from general-purpose computing (see figure 2.1). General-purpose networks are not dedicated to data-intensive applications and can reduce throughput for those applications significantly. Therefore it is needed to split the general-purpose computing and the high-performance science network. It can then focus on making the best use of the network. For example, TCP reduces the sending rate significantly when encountering a packet loss or packet delay. When the distance between hosts increases the packet loss and delay increases. So to be able to change this for the high-performance science network and not for the general-purpose computing can give a big improvement in performance. Science DMZs future plan is making a big interconnected mesh, where everything is connected with Science DMZs (see figure 2.2). For this to become a reality being able to improve the programmability of DTNs by containers will help.

(10)

Figure 2.2: A bunch of science DMZs [6]

2.2 Containers, Singularity

A container is a standard unit of software that packages up code and all its dependencies, so applications run quickly and reliably and can switch from one computing environment to another. A container is a lightweight version of a virtual machine, instead of virtualizing the hardware stack containers virtualize at the operating system level directly, with multiple containers running atop the OS kernel directly [7]. By sharing the OS kernel containers are more lightweight, start much faster and use a fraction of the memory compared to booting an entire OS. There are a few containerization software out there the most used is Docker. However, Docker is not right for what we are trying to do.

2.2.1 Singularity over Docker

Docker can be used to develop reproducible containers on local machines. It improves collabo-ration on code or application by not having the deal with different software versions or broken dependencies. But for DTNs it might not be ideal, because DTNs are, as we already mentioned, a shared resource. A major reason why Docker is not used on a shared resource, such as HPC or DTNs, is for security reasons. Each Docker container that runs on a machine has to be spawned from a root-owned Docker daemon, as the user can interact with the Docker daemon, it is possi-ble to coerce the daemon process into granting user escalated privileged. In a non-shared system you now exactly what software is being run in the containers, in which case you can prevent this. But in a shared system, people can deploy whatever software they want [8]. This is one of the reasons Singularity was created, it was for example also created for the necessity for containers that could be used for scientific application-driven workloads. Docker is not ideal, because it aims to fully isolate the application including from the underlying hardware such as GPUs and NICs. This affects the performance of applications dependent on such hardware.

2.2.2 Singularity security

Singularity runtime enforces a unique security model that makes it appropriate for untrusted users to run untrusted containers safely on multi-tenant resources. This is because Singular-ity runtime dynamically writes UID and GID information to the appropriate files within the containers, so the user is the same inside as outside the container [9]. On top of that, the con-tainer’s file system is mounted using the nosuid option, and processes are spawned with the PR NO NEW PRIVS flag. To prevent escalating users privileged.

(11)

2.2.3 Performance consideration

The performance of Singularity containers for HPC is already researched and there is no signifi-cant performance loss noticeable when using Singularity containers on HPC [10] [11]. Sometimes Singularity containers perform even slightly better because it can use newer software versions without breaking compatibility [12].

2.3 Application protocols

Application protocols are used in the application layer. The application layer provides services for an application program to ensure effective communication with other application programs on the network. Different protocols are used for different kinds of communication. For DTNs you want to use application protocols that are good in sending large amounts of data. We have chosen to look at GridFTP, FDT and UDT because they are open source and easy to install. Moreover, a study on their performance differences has already been done, GridFTP is the fastest, FDT has a problem with small buffer sizes and UDT has an implementation issue [13]. This imple-mentation issue is something we will keep in mind in our experiments.

The reason you want to use different application protocols on a DTN is application-specific. By being able to use different application protocols we want to show that a DTN can be pro-grammable with containers. Therefore a DTN can be used for different purposes without out manually having to change the DTN’s configuration.

2.3.1 UDT

UDT is a UDP-based connection-oriented data transfer protocol. It is a unicast transport pro-tocol so all data and control packets are sent between two UDP ports (see figure 2.3) [14]. It has built a congestion control on top of UDP, called Decreasing Increases AIMD (DAIMD).

Figure 2.3: UDT protocol

2.3.2 FDT

FDT is written in java and is based on an asynchronous, flexible multithreaded system and is using the capabilities of the JAVA NIO libraries. It streams can stream a list of files continuously, using a managed pool of buffers through one or more TCP sockets. It can make use of multiple TCP streams in parallel. [15]

(12)

2.3.3 GridFTP

GridFTP is an extension of the File Transfer Protocol (FTP) for grid computing, computing with the use of widely distributed computer resources. FTP was chosen for its widespread use, and therefore GridFTP is built on top of it. It improves its bandwidth use by making use of simultaneous TCP streams. GridFTP integrates with the Grid Security Infrastructure, which provides authentication and encryption to file transfers. It is therefore more difficult to install with containers then UDT and FDT, mainly because the certificates need to be available to the container run by a user [16]. Therefore we decided to focus only on FDT and UDT.

2.4 Linux traffic control

When deploying our custom application protocol containers we want to be able to allocate bandwidth to these containers. This will help with maintaining fairness and help make the DTN better programmable. In the Linux kernel, you can configure Traffic Control with the tc command. When a kernel needs to send a packet to an interface it is enqueued to the qdisc configured for that interface. Qdisc stands for queuing discipline it represents the scheduling applied to a queue. By default, this is a basic FIFO type queue. Tc is the one in charge of sending packets to the device driver. There are classless qdisc to threat all incoming packets the same and you have classful qdisc to give different treatments to different classes [17].

2.4.1 HTB qdisc

To split the bandwidth we use a classful qdisc, the Hierarchical Token Bucket (HTB) qdisc. The HTB qdisc can have multiple classes and each class can be seen as a token bucket that gets tokens at a certain rate and can increase its token rate until a ceiling rate by lending tokens from classes that didn’t use their tokens. A queue can send a packet if it has enough tokens to send one, otherwise it has to wait for more tokens [18].

A packet gets enqueued in a class by a filter, that tells which class to use. Tc can use a mark to describe a packet. This mark can be set by the command ’iptables’. We will use this to mark the different workers.

(13)

CHAPTER 3

Implementations

Our main goal is to be able to deploy custom application protocols with the use of containers. In our implementations we don’t use real DTNs, we will use VMs with Ubuntu installed. This simulates the DTNs well because DTNs are Linux servers. In this chapter we show how we implemented the containers, explain the sif files and explain the design choices. Then we will show how we tried to maintain fairness on the DTN. Lastly, we will show the imitation of the StashCache as a use case application.

3.1 Setup

For our demo, we will be making use of Docker containers as well as Singularity containers, Docker should not be run on a shared resource, but we will use it because there is still no way to install Singularity inside Singularity. Therefore we cannot package our deployment logic easily onto the DTNs with Singularity, because that needs Singularity to deploy Singularity containers. Docker is thus only used to make the deployment of our application onto to VMs easier. The user can interact with the DTN through a flask web service. This web service takes a request to send a file, setups the custom application containers, sends a file and then breaks done the containers.

To make the web service work it uses three Docker containers. A container with a nginx reverse-proxy to get HTTP requests and give them to the web service by a WSGI HTTP server gunicorn, a container that has the flask web service with all the logic to deploy the custom application containers and lastly a Redis server container this is used to complete the clients request asyn-chronously and for maintaining fairness on the DTNs (see figure 3.1).

3.2 Application protocols containers

The Singularity container that contains the application protocols are very basic. They download all the necessary software to run and then have a client and server app section that deploys the server and client. It then runs the server in a subprocess and echos back the PID on which the server or client runs. This PID can then be used to figure out which random port the server has chosen. For UDT you need the server to be non-blocking and to know its output. You can know its output by writing it to a file.

To change how FDT works you can give it different parameters that you can give to the container by arguments to the deploy function or by the use of environment variables. By the experiments, we didn’t change any parameters, because this will only tell the protocols how much parallelism to use or change buffer sizes. This will not interfere with our experiments because by splitting the bandwidth we can ensure that a protocol doesn’t take more then it is given. We don’t focus on the fairness in system resources being used, such as memory or CPU. Otherwise, those variables are a problem for maintaining fairness.

(14)

Figure 3.1: Two DTNs that show the setup of the flask application and all containers. Every square is its own container. The rounded squares are part of the containers they are connected to.

3.3 Fairness

For maintaining fairness we define two mechanisms, one is using queues and the other is splitting the bandwidth. With queues, we can give priority to certain requests and with bandwidth splitting, we can prevent applications from fighting for bandwidth. We implement the queue mechanism with a Redis task queue and we split the bandwidth with the Linux traffic control command ”tc” for the Linux kernel.

3.3.1 Redis queue

A Redis queue holds task to be executed, in our case those requests are requests to send a file from one DTN to other DTNs. The tasks from that queue are then taken up by workers. Workers can listen on a queue. When they are not doing anything they take a task from the front of the queue and execute that task. The worker can also listen on multiple queues, it will then take the task from the queue with the highest priority. So to priorities certain requests you can have multiple queues and enqueue certain requests in a higher priority queue. If you let all workers listen on all queues then a lower priority queue will not be used, if there are tasks in higher priority queues. You can also distribute workers over different queues so that certain requests get more system resources.

Queuing stategies have been researched in great detail over the past decades. Trying to find the optimum strategy for a given scenario is a topic in itself and here we only touch it lightly and instead focus on the programmability aspects. Which means that we will show that certain goals that queuing strategies might have can be accomplished with our implementation using a Redis task queue, however we will not optimize these goals.

3.3.2 Bandwidth splitting

Another way we can split resources over the tasks is by giving workers a portion of the band-width. We do this by creating a class that tells how much guaranteed bandwidth you get and a ceiling bandwidth. We do this with the command ’tc’, which configures traffic control for the Linux kernel.

Then every packet marked by a certain mark gets a bandwidth portion. The marking is done on the source port where the application protocol sends its data through. You can mark packets

(15)

with ’iptables’, which holds a set of rules on which to filter packets and execute commands on those packets as marking. We mark each worker by giving each port where the data goes through the worker id as a mark. This way we can change the bandwidth for each worker separately.

To calculate how much bandwidth each mark gets, we have to know the total beforehand. In our case, we used a tool for performing network throughput measurements called ”iperf” to measure the bandwidth performance. However UDT has an implementation error, what was stated in the research that compared GridFTP, FDT and UDT [13]. This had something to do with recovery from packet loss. We also came across this in our experiments and suspect that it has probably something to do with the congestion control mechanisms.

When we gave workers a bandwidth much lower then the maximum bandwidth of the system, the throughput stayed the same. Yet whenever a worker got more bandwidth then the system could give the overlimits for each workers qdisc class went up. Overlimits is a shortage of tokens on the qdisc class, which means that packets couldn’t be send. Because of this the transfer speed would hang on 1Kb/s. Even when removing some workers. Which means that the packet rate is not adjusted to packet loss or delay and that’s why the transfer speed hangs on 1Kb/s.

Therefore we choose to lower the total bandwidth given to the workers so that we could see that bandwidth splitting can work if the application protocols have good congestion control.

3.4 Web service

We implemented a web service in Flask that makes use of Singularity python to deploy our custom application protocol containers. we have a route that enqueues the tasks that correspond to the request in a Redis queue. The task belongs to a custom application protocol and it chooses the queue based on a predefined filter or the request can specify the queue. It then enqueues it and sends back a job key, with which you can check if the job is done. These are the two routes that should only be accessible by the client.

3.4.1 Setting up containers

Figure 3.2: Data transfer with Singularity containers

Then we also have routes that take care of setting up the containers on both DTNs. This will depend on the implementation of the application protocol. FDT for example opens a server to send data to, while UDT opens a server to send data from. With HTTP requests the DTNs are asked to open the server container and client container. If the sif file for the container is not on the DTN it is pulled from the Singularity hub. After the file has been sent the containers are broken down and the task will be set to completed (see figure above). A possibility is also to not break down the application protocols immediately, but wait for more requests to use

(16)

them. This has not been implemented, as we don’t want to research how fast we can make our implementation, but more can we change the infrastructure for each client. Still, this will give overhead that can be prevented by a different implementation.

3.4.2 Deploying workers

The workers are deployed by calling a python CLI command. This script can deploy workers on the default queue or we can give a configuration file that shows how many workers listen on a queue and how much bandwidth these workers get. To enqueue tasks in the right queue you can specify the queue name in the request.

3.5 Caching

For showing a use case for an application developer deploying custom application-specific software we implemented a simple caching software. It deploys a database container that stores all files location and the number requests it has. If it detects that a file is often requested by two DTNs that are close to each other it will deploy a cache on one of them. This tries to imitate the StashCache from the introduction to reduce latency. In figure 3.3 it can be seen how a file being requested can be redirected.

Figure 3.3: Shows how a file that is being requested can be redirected to come from a cache. I1 holds a database with file locations. When a cache is deployed requests to files on the cache will be redirected to the DTN with the cache.

(17)

CHAPTER 4

Experiments

4.1 Evaluation of the implementation

To evaluate our system we setup 2 experiments. Experiment 1 is intended to evaluate the effect on the quality of service of our implementation and Experiment 2 is intended to show the ef-fect on throughput. To test our implementation we want to look at what efef-fects the number of workers, the bandwidth splitting implementation and protocol have on the quality of service and throughput. This by sending requests between two DTNs and measuring the usage of bandwidth, the workers’ execution time and the total execution time. More workers mean that more requests are being handled at the same time, but it also means that more system resources are being used.

To show how the factors influence the quality of service we will show how long the average job takes to complete and how long all jobs take to complete. This tells if more workers will help reduce the overall time and sees if one job takes longer or shorter with more workers. The effect of the chosen protocol can be shown by the time it takes to send the same amount of data and the rate at which it sends this data. The number of workers can also affect it.

The effect on throughput uses the same factors, but now it measures the throughput of the whole system. The throughput for different workers can show how workers utilize their portion of the bandwidth and it shows if parallelism degrades the throughput.

4.1.1 Experiment setup

For this experiment, we will use one Redis queue and send one type of request, UDT or FDT. We will fill the queue with 20 requests and then measure how the throughput changes when these requests are being handled. Every request sends a file of 1GB from one DTN (in our case a VM) to another DTN. We will also measure how long all 20 requests take to complete. For the workers, we can also get the number of jobs each worker has done and the total running time of the worker, with this we can calculate what the average job time was.

(18)

4.1.2 Results

Figure 4.1: Shows how more workers bring down the total time it takes for 20 requests to be finished. The request is for workers to send files of 1GB with a given application protocol, UDT or FDT. These workers can work in parallel to tackle more requests at a time, which reduces the total time. However, the compromise for this is that it will take more resources and increase the average time a request takes to complete. After 4 workers it flattens out, more parallelism is not changing anything. This could be because of the available bandwidth limits being reached.

Figure 4.2: This shows how the throughput, of sending 20 files of 1GB, changes with the appli-cation protocol, the number of workers (requests in parallel), and with or without dividing the bandwidth. Results that can be taken out of this figure is that FDT is faster then UDT, the change in protocol can have drastic effects on the performance. FDT looks very violent. This can be explained by the fact that FDT takes longer to startup and that at that time the throughput 0 is, which means it goes down every time FDT handles a new batch of requests. By UDT more workers mean more bandwidth being utilized and when the bandwidth is split the throughput is slightly higher. Only slightly, because UDT seems not to take advantage of all the bandwidth that it has been given.

(19)

4.1.3 Discussion

In the first figure 4.1 you can see that more workers improve the total time for all jobs to be done, while fewer workers have a faster single job time. You can also see that FDT is much faster than UDT, but that is an implementation error, as we already stated in the implementation section by bandwidth splitting.

Splitting the bandwidth and not splitting the bandwidth has a positive effect on the jobs being done. You can also see that with UDT it has a stronger effect. This is probably because of the congestion control, UDT has a strong congestion control which means that by a loss or delay the sending rate is decreased drastically. So when giving a reserved portion of the bandwidth it doesn’t have to fight for bandwidth and this reduces loss and delay. With FDT the difference is not so much, because its congestion control takes better advantage of the given bandwidth. So more workers mean more work being done, but every single job is done slower and bandwidth splitting has a positive effect on the execution time.

In the second figure 4.2 you see for FDT a very varying bandwidth and for UDT a more constant bandwidth usage. The FDT has a larger startup time. because it needs to setup small buffers to send the data. So when all workers are done at the same time there is a startup time where no bandwidth is being used. Also, FDT has for the bandwidth not splitted very large peaks, but those peaks don’t last long. This because those peaks get to 57-95MBps (456Mbps-760Mpbs) and a file is only 1GB so it can only last 2 seconds.

We see that fewer workers use less bandwidth with UDT, this is still an implementation error where it doesn’t utilize its given bandwidth well. Therefore it looks like the bandwidth usage increases the same amount as the number of workers. But you can see that it utilizes the bandwidth better and is therefore earlier done.

With FDT you also see that more workers take less time, but the decrease in time is most significant from 1 to two workers. By FDT you can also see that one worker gets almost the same bandwidth as the other number of workers.

Thus splitting the bandwidth works for FDT in the sense that it utilizes the given portion and for UDT it works because it gets a little bit more bandwidth and therefore the jobs are finished faster.

4.2 Switching protocols

A natural effect of our programmable infrastructure is the ability to switch protocols just-in-time. Here we measure the overhead incurred in the dynamic switching. We will do this by showing how the overhead of using container influences the transfer time. As we stated in the theoretical background by ”Containers, Singularity” in ”performance considerations”, research done on Singularity shows that it gives little overhead. We will check if that is also the case in our implementation. Increasing the file size increases the time a protocol uses the same container, so we will also look at which file size the overhead of Singularity negligible is.

4.2.1 Experiment setup

As an experiment, we will therefore increase the file size and measure how long it takes for all containers to be deployed, how long it takes for the application protocol inside the container to be ready to send the file and the total times it takes to send the file from one DTN to another.

(20)

4.2.2 Results

Figure 4.3: When a file is being sent between two DTNs, custom application protocol containers needs to be deployed on both of them. These figures show how much time it takes for those protocols to be setup and what the percentage of sending time is from the total time for a file to be sent. UDT has almost no setup, because it is a simple protocol (see table 4.1). FDT overhead is large when the file is smaller, but this is less when the file gets larger and the sending time is longer. Protocol setup is the time it takes for the containers to be deployed and protocol setup the time it takes for the application protocol to be ready to send.

FDT

protocol setup (s) protocol startup (s) sending time (s) file size (GB)

0.28 0.16 39.68 1 0.36 0.26 112.90 3 0.38 0.20 191.25 5 0.51 0.18 261.24 7 0.27 0.24 326.22 9 UDT

protocol setup (s) protocol startup (s) sending time (s) file size (GB)

1.10 1.64 1.71 1

1.24 1.59 8.87 3

1.09 1.53 20.46 5

1.25 1.55 24.17 7

1.11 1.50 31.24 9

Table 4.1: UDT and FDT custom application protocol containers deployment times. Protocol setup is the time it takes for the containers to be deployed and protocol setup the time it takes for the application protocol to be ready to send.

(21)

4.2.3 Discussion

In figure 4.3 can be seen that FDT has an overhead to file size ratio that decreases when the file size increases. The overhead of UDT seems to be negligible and much lower than that of FDT. It is smaller then FDT what is visible in table 4.1. However, it seems much lower then it is because the sending time of UDT is so much larger then FDT. The setup of FDT is larger because FDT needs to setup a pool of mapped buffers between the two hosts. While UDT on the other hand is a unicast transport protocol and only needs to setup two ports to begin sending the data. The overhead for FDT is only one second and when a 9GB file is sent it takes only 8% of the whole transport. Still, this can add up to a large amount if many small files are sent. In our implementation, however, we can group files which makes the total overhead only one second.

4.3 Fairness

An interesting use-case of DTN programmability is application prioritization on the network. For example in an emergency situation such as flooding simulation critical containers, to move data from sensors and compute infrastructures, could be prioritized on the network to get more resources [19]. As we said in the implementation section we can implement fairness on the hand of Redis queues, Redis workers and bandwidth splitting. What we want to show from the ex-perimentation on fairness is that multiple application protocols do not degrade the quality of service and if we can prioritize certain traffic.

In this thesis, we will only queue the sending side of the request. There is no queue for DTNs to queue the receiving side. Which means if two DTNs send data to one DTN, they use the same link which can flood the system.

4.3.1 Priority

To give prioritization on the network we have implemented a Redis task queue. In a Redis task queue we can have a queue with more workers working for it or for that queue a worker that gets a bigger slice of the network bandwidth. The two possibilities give different benefits. The first one lets more tasks of the prioritized traffic be worked on at a time and thus increases the number of jobs being done per time unit. The second possibility increases the time it takes for one job to be done.

We want to show that we can prioritize traffic, but when there is currently no prioritized traffic being sent that those resources get back to the other traffic. Also, we want to show that giving a queue more workers will increase the jobs that are being done per time unit. Lastly, we want to show that we can make the job time of prioritized traffic go down.

4.3.2 Coexisting application protocols

What we also want to know is if multiple application protocols can coexist. That when two protocols are deployed at the same time, that the bandwidth is still splitted and that we can still maintain fairness.

4.3.3 Experimentation setup

Firstly we will have two queues were one is prioritized over the other by given it more workers. Then we have 100 requests for the lower priority and 50 for the higher and then look if the resources go back to the lower priority.

Second, we check if when splitting the bandwidth equally over the two queues and then give the higher priority queue one worker with all the bandwidth and give the lower priority queues more workers with the bandwidth splitted, if then the higher priority queue has a faster job time.

(22)

Lastly, we will see if UDT and FDT can coexist by, first deploying only FDT and only UDT and do 20 requests, of sending a file of 1GB. After that, we will have two queues one FDT one UDT and then give each queue one worker with equal bandwidth.

(23)

4.3.4 Results

Figure 4.4: Shows low priority queue with 100 requests and high priority queue with 50 requests. Where the high priority has 8 workers and the low priority queue 2 workers, each with the same amount of bandwidth allocated. It can be seen that the high priority queue drains faster and that when it is finished the low priority drains just as fast. So we can give priority to a queue and later give back the resources, while the whole time the throughput is the same.

Figure 4.5: Shows two queues both queues have 50% of the bandwidth, the high priority has 1 worker with all bandwidth and the low priority queue has split the bandwidth over 5 workers. The worker in the high priority queue has a much lower average job time and the throughput stays the same throughout the experiment. So we can prioritize the average job time by giving more bandwidth to a worker.

(24)

Figure 4.6: Experiment for coexisting protocols. First, only one queue is deployed with 20 UDT request, to send a 1GB file, then the same is done with 20 FDT requests. After that, there are two queues one for UDT and one for FDT with 10 requests for UDT and 10 for FDT. You can find the individual application protocols throughput back in the figure with both protocols. From the figure one can clearly see the interference pattern.

4.3.5 Discussion

In figure 4.4 you can see that the higher priority queue drains its queue faster than the low priority queue. This is because it has more workers. After the last job in the high priority queue is done you can see that the low priority queue drains faster and that the bandwidth usage stays the same over the whole experiment. This concludes that we can have prioritized traffic and that the resources can go back to the low priority traffic when there is no more high priority traffic.

In the second figure 4.5 when the high priority traffic gets a worker with more bandwidth you see that the average job time is much lower. It also shows that 5 workers with the same amount of bandwidth as one worker work faster, which is something we already saw before. If you compare figure 4.4 and 4.5 it shows that with different traffic policies the bandwidth usage stays the same. So we can make traffic average job time go faster by giving it more bandwidth.

When protocols coexist they only use the bandwidth they have been given as can be seen in figure 4.6. In the last subfigure, you can see the two protocols coming back. FDT gets half of the bandwidth as in the second subfigure. Therefore FDT has only half in the third figure. So two application protocols can coexist.

(25)

4.4 Setup caching

Caching is application-specific software that can be deployed on the DTN. The programmability of the network means that we can shift some of the application logic into the network. This is a different approach than the classic paradigm of fitting the application to the infrastructure. Here the paradigm would be fitting the infrastructure to the application. By deploying a cache on a DTN near DTNs who request the same file a lot from a DTN that is further away we can reduce latency. By showing that our implementation can do this we show that DTNs are programmable this way.

4.4.1 Experiment setup

In this caching experiment, we will request a file with two DTNs from a source DTN. The source DTN holds a database with the file locations and the number of requests. When the number of requests is above a threshold, the source deploys a cache for the last request. This will ask the DTN to allocate space on the DTN and send the requested file to it. Then all requests for that file will be redirected to the cache (as explained in figure 3.3). We will then measure the incoming and outgoing bandwidth for all DTNs and show how it changes when the cache is deployed.

Figure 4.7: Caching experiment setup, shows how a cache is deployed onto the DTN. When a DTN, S1, gets a request to send a file it asks where the file is to I1, a database with file objects. If the file is requested more then a certain threshold by multiple DTNs and they are close to each other it deploys a cache on one of them. Then the cache deployment time is from the time you want to deploy a cache until the cache has all files you want it to hold.

(26)

4.4.2 Results

Figure 4.8: Shows a cache being deployed on DTN 2 and a source redirecting that data. The cache is only deployed after a couple of requests, this is at time 210s. When the cache is deployed the files that need to be on the cache are sent from the source to DTN 2, you can see the spike of the data being sent at time 220s. After that, DTN 2 has the data on its cache and sends it to itself and DTN 1. So it sends and receives data on one link, which explains the throughput degradation.

4.4.3 Discussion

In the results in figure 4.8 you see that source at first is the only one with the file. After the cache is deployed all requests get directed from s1 to DTN 2. We haven’t implemented a way to copy from the cache to the DTN, which can be easily done. However know it shows that DTN 2 is trying to send data to DTN 1 after the cache is deployed, and also send it and receive the data from itself. For incoming data, we have not implemented a queuing system so the link gets twice the amount of data, and the sending rate is drastically reduced.

For now, all DTNs are close to each other but when they are far apart this could reduce the overall load on the whole DTN network. What we can conclude is that we can setup a caching application software on our implementation.

(27)

CHAPTER 5

Conclusions

In our research we focused on the programmability of DTNs, to make it easy to deploy services and application-specific software. Our main research questions where:

How to deploy custom application-level protocols on DTN networks using containers? We have seen in the results that programming the DTN with Singularity gives little overhead, depend-ing on the Sdepend-ingularity container that is used. This overhead for the data transfer itself can be overcome by sending larger files, which is the purpose of the DTN. Singularity is the best-suited containerization program to use on a shared resource. We can already deploy multiple different application protocol Singularity containers at the same time without sacrificing throughput.

How to manage usage of shared resources by containers namely, bandwidth? With the use of a Redis queue, it is shown that certain traffic can be prioritized over other traffic. We used two mechanisms for fairness control, first the Redis queues and its workers and the second bandwidth splitting. The results showed that this can give traffic faster transfer times or let it handle more transfers at the same time, then other traffic on the network and this all while maintaining the same bandwidth utilization.

How to deploy custom application logic into the network such as caching? For this, we have shown that it is possible to deploy custom application software, caching. Which strengthens the point that DTNs can be programmable with custom application containers.

So DTNs are programmable with containers with little performance overhead, can maintain fair-ness and prioritize certain traffic and we showed that we can deploy application logic onto the network.

5.1 Future work

In this area, there is still allot to be done. Our thesis only queued the outgoing data and this can flood a link if multiple DTNs send to one link, which we saw back in the caching experiment. This could be implemented by sharing queues between DTNs or having a queue per-link basis. Also being able to not rely on Docker to get certain applications onto the DTNs would make it practical for real use. However, for this there needs to be a way for Singularity containers to be dynamically deployed. In the study on DTNs self, there is still being worked on configuring the parameters to make it work optimally, one example of this is smart DTNs [20]. If with the help of such smart DTNs the way to configure how to optimally setup the containers and prioritize traffic would be a good extension.

Then there is also the security aspect in a programmable infrastructure. On a DTN network it is not trivial to know who can do what, since DTNs are part of different administrative domains.

(28)

But how to allow foreign containers to run on a system could be something to research, this by implementing some federated credential management system for example.

(29)

Bibliography

[1] “Faster data transfer.” https://fasterdata.es.net/science-dmz/DTN/.

[2] J. Crichigno, E. Bou-Harb, and N. Ghani, “A comprehensive tutorial on science dmz,” IEEE Communications Surveys & Tutorials, vol. PP, 09 2018.

[3] L. Smarr, C. Crittenden, T. DeFanti, J. Graham, D. Mishin, R. Moore, P. Papadopoulos, and F. W¨urthwein, “The pacific research platform: Making high-speed networking a real-ity for the scientist,” in Proceedings of the Practice and Experience on Advanced Research Computing, PEARC 18, (New York, NY, USA), Association for Computing Machinery, 2018.

[4] J. Vu, L. & Bashor, “Esnets petascale dtn project speeds up data transfers between leading hpc centers,” Dec 2017.

[5] D. Weitzel, B. Bockelman, D. A. Brown, P. Couvares, F. W¨urthwein, and E. F. Hernandez, “Data access for LIGO on the OSG,” CoRR, vol. abs/1705.06202, 2017.

[6] E. Dart, “Petascale data architectures for portals and computing centers,” Jun 2018.

[7] “What are containers and their benefits.” https://cloud.google.com/containers. [8] G. M. Kurtzer, V. Sochat, and M. W. Bauer, “Singularity: Scientific containers for mobility

of compute,” PLOS ONE, vol. 12, pp. 1–20, 05 2017.

[9] ”Sylabs”, ”release 3.5” ed., ”jan” ”2020”.

[10] F. Han and N. Dandapanthula, “Containerizing hpc applications with singularity,” 2018.

[11] E. Le and D. Paz in Performance Analysis of Applications using Singularity Container on SDSC Comet, pp. 1–4, 07 2017.

[12] G. Hu, Y. Zhang, and W. Chen, “Exploring the performance of singularity for high performance computing scenarios,” in 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Confer-ence on Smart City; IEEE 5th International ConferConfer-ence on Data SciConfer-ence and Systems (HPCC/SmartCity/DSS), pp. 2587–2593, 2019.

[13] S. Yu, N. Brownlee, and A. Mahanti, “Characterizing performance and fairness of big data transfer protocols on long-haul networks,” in 2015 IEEE 40th Conference on Local Computer Networks (LCN), pp. 213–216, 2015.

[14] R. Gu, Y. & Grossman, “Using udp for reliable data transfer over high bandwidth-delay product networks,” Using UDP for Reliable Data Transfer over High Bandwidth-Delay Prod-uct Networks, p. 112, 2000.

[15] “Faster data transfer.” http://monalisa.cern.ch/FDT/.

[16] E. Yildirim, J. Kim, and T. Kosar, “How gridftp pipelining, parallelism and concurrency work: A guide for optimizing large dataset transfers,” in 2012 SC Companion: High Per-formance Computing, Networking Storage and Analysis, pp. 506–515, 2012.

(30)

[17] J. Vehent, “the linux kernel: Traffic control, shaping and qos.” https://www.cnblogs.com/ zengkefu/p/5635100.html, Feb 2016.

[18] B. Hubert, “Linux advanced routing & traffic control howto.” https://www.tldp.org/ HOWTO/Adv-Routing-HOWTO/index.html, Jul 2002.

[19] V. Krzhizhanovskaya, G. Shirshov, N. Melnikova, R. Belleman, F. Rusadi, B. Broekhuijsen, B. Gouldby, J. Lhomme, B. Balis, M. Bubak, A. Pyayt, I. Mokhov, A. Ozhigin, B. Lang, and R. Meijer, “Flood early warning system: design, implementation and computational modules,” Procedia Computer Science, vol. 4, pp. 106 – 115, 2011. Proceedings of the International Conference on Computational Science, ICCS 2011.

[20] Z. Liu, R. Kettimuthu, I. Foster, and P. Beckman, “Toward a smart data transfer node,” Future Generation Computer Systems, vol. 89, 06 2018.

Protocol containers for data transfers between DTNs

Bachelor Informatica