On the Feasibility of Software- Defined Networking in the Square Kilometre Array Science Data Processor

(1)

Bachelor Informatica

On the Feasibility of

Software-Defined Networking in the Square

Kilometre Array Science Data

Processor

Damiaan Twelker

June 8, 2016

Supervisors: P.C. Broekema (ASTRON)

Daniel Romão (University of Amsterdam)

In

fo

r

m

at

ic

a

—

U

n

iv

er

sit

eit

va

n

A

m

st

er

d

a

m

(2)

(3)

Abstract

A Software-Defined Network architecture is an appealing candidate for the network of the Science Data Processor component of the planned Square Kilometre Array radio tele-scope. Correlator nodes send their traffic to virtual destinations, and the network’s software component assigns physical SDP entities to the virtual addresses. A SDN architecture allows for quick and dynamic resource allocation. In this research we investigate the consequences of a SDN architecture in a small-scale test setup, consisting of one simulated correlator node and three simulated SDP receiving nodes. We identify OpenFlow features that are lacking full support in a current-day switch, introduce workarounds and discuss their application to the SDP. We measure the impact of traffic duplication and redirection, and find that traffic redirection takes effect relatively fast without significant packet loss, whereas traffic dupli-cation has serious consequences for packet loss at the original destination for the hardware we investigated.

(4)

(5)

2.2 Software-Defined Networking . . . 11 2.2.1 OpenFlow . . . 13 2.2.2 Floodlight . . . 15 3 Methodology 17 3.1 Network topology . . . 17 3.1.1 Switch . . . 18 3.1.2 Controller . . . 18 3.2 Traffic redirection . . . 18 3.2.1 Latency . . . 18 3.2.2 Packet loss . . . 19 3.3 Traffic duplication . . . 21 3.3.1 Ordering . . . 21 3.3.2 Packet loss . . . 22 4 Implementation 25 4.1 Toolkit . . . 25 4.2 Generating Experiments . . . 26 4.3 Flows . . . 27 5 Results 29 5.1 Traffic redirection . . . 29 5.1.1 Latency . . . 29 5.1.2 Packet loss . . . 29 5.2 Traffic duplication . . . 34 5.2.1 Ordering . . . 34 5.2.2 Packet loss . . . 34 6 Conclusions 43 7 Discussion 45

(6)

(7)

CHAPTER 1

Introduction

The Square Kilometre Array (SKA) is a planned global radio telescope. The first phase of the project, SKA1, encompasses the construction of two separate telescopes: SKA1-Low in Western Australia, and SKA1-Mid in South Africa. Both locations will share the same high-level system design. However, computational capabilities may diﬀer [1].

An integral part of the SKA1 system on each location is the Science Data Processor (SDP). Astronomical data from antenna stations is correlated in the Central Signal Processor (CSP), before it is reduced to scientific output (e.g. images) in the SDP. See Figure 1.1.

Remote

Station RemoteStation Remote

Station

Central Signal Processor (CSP)

Science Data Processor (SDP) SDN

Science Archive

Figure 1.1: A high-level overview of SKA1 system components [1].

It is proposed that the SDP is divided into independent components, and that a Software-Defined Network (SDN) architecture distributes data over the components [1]. Correlator nodes send data to a virtual destination, while the network’s software component dynamically assigns physical SDP components to the virtual destinations. Dynamic resource allocation leads to more flexibility, maintainability, and scalability. Changing the destination of an incoming data stream is quickly achieved by having the network’s software component assign a new physical node to the same virtual destination. Problems in LOFAR, a radio telescope of similar scale that features

(8)

a traditional networking architecture, have provided an incentive to look into a SDN approach for the SKA [2].

We hypothesise the relevance of two features to be implemented by the SDN: traffic redirection and traffic duplication. Traffic redirection allows us to move incoming traffic from one receiving node to the other, e.g. in case of maintenance or a change in the desired observation. Traffic duplication can be used to route traffic to multiple SDP nodes that independently run different algorithms on the same data stream.

The nature of the data flowing between correlator and SDP imposes several requirements on the software-defined network: data loss, out-of-order delivery, and latency shall be reduced to an absolute minimum. Loss of observational data is allowed up to a certain degree, but impacts the quality of the scientific output. The longer the network takes to redirect a packet away from its original destination, the higher the load on the buffering capacity of the network. The same is true for packets that arrive out-of-order, requiring effort from SDP components to reorder the incoming data. The data ingress rate at the SDPs is unprecedented: 5.2 Terabit per second for the SKA1-Mid and 4.7 Terabit per second for the SKA1-Low [1]. Thus there is a serious risk of overflowing buffers in the switches, even with the slightest latency and out-of-order count, resulting in loss of observational data. A low latency is desirable in case of a transient astronom-ical event as well. Resources need to be quickly allocated to capture data from the short-lived extraterrestrial occurrence.

1.1 Research Questions

To determine the feasibility of SDN in the SKA, we formulate the following research question: What SDN feature set is essential for SDN in the SDP to be useful, and what is the status of the support of these features in current SDN hardware?

Four sub-questions will guide our eﬀorts: 1. What is the delay of traﬃc redirection?

2. What is the packet loss incurred due to traﬃc redirection?

3. What is the impact of traﬃc duplication on packet ordering at the original destination? 4. What is the impact of traﬃc duplication on packet loss at the original destination?

1.2 Previous Work

Relevant research by Vandenvenne et al. in 2013 documents some of the hardware limitations of the switch we will be using in our experiments [3]. Certain OpenFlow features are implemented in software as opposed to hardware, yielding a significant performance hit. We have found a way to refrain from using software-implemented OpenFlow features and limit ourselves to those directly supported by the switch’s ASIC, without compromising the relevance and quality of the experiments (see Chapter 3).

(9)

CHAPTER 2

Background

The scale of our proposed work is reasonably small. The network topology of our test setup features no more than four start/end nodes (see Chapter 3), and only a few very specific metrics such as latency and packet loss will be considered. In this section, we will attempt to explain why our small-scale work is relevant and how it fits in the bigger picture of a planned multi-million dollar radio telescope.

2.1 Radio Telescopes

The universe emits electromagnetic radiation as first discovered by Jansky in 1933 [4]. Some of it in the visible part of the electromagnetic spectrum (e.g. a blinking star on the sky), while the rest is not immediately visible. It is exactly those electromagnetic waves that radio telescopes help us visualise.

Electromagnetic waves reveal properties of the emitting object that may not be apparant from visible light. Examples include temperature (Planck’s law), brightness, and molecular compo-sition (HI 21 cm line) [5]. An electromagnetic wave describes oscillations in the electric field. A radio telescope utilises antennas to measure those oscillations. The incoming electromagnetic waves displace electrons in an electric circuit. The induced voltage is the superposition of all electromagnetic waves that reached the antenna. A receiver commences further processing of the signal by amplifying it, for it is too weak for immediate processing. The signal does not purely describe the data intended to obtain, but is obscured by noise. Noise is introduced by the surroundings of the antenna (e.g. lightning) and the individual components of the antenna and receiver circuits (Nyquist noise) [6, p105]. A way to expose periodicity and reduce the influence of noise is to multiply the signal with a delayed version of itself. This is called autocorrelation [7, p130]. Multiplication with signals from other receivers is called correlation, which shows how the signals from individual stations relate to each other. A Fourier transform is used to expose the individual electric field oscillations [8, p52].

The receiver’s processing steps can either be performed on the analog signal in hardware (using electrical circuits), or in software on a digitised version. Digitalization requires sampling. The sampling rate is derived using the Nyquist-Shannon sampling theorem [9], which describes how many samples are required per unit time to not lose precision.

A phased array combines data from multiple antenna stations to identify point sources on the sky. An electromagnetic wave from a source reaches station A at a different point in time than it reaches station B. The time difference, together with the distance between both stations, can be used to calculate the coordinates of the source on the sky. The process that combines signals from multiple antenna stations in a way that the effective signal increases in the desired direction

(10)

Physical delay

+

Beamformer _{Artificial delay}

Output

Figure 2.1: A phased array [11]. The black squares represent antenna stations. The artificial delay allows for beamforming: combination of signals from diﬀerent parts of the sky.

and is surpressed in other directions is called beamforming [10]. The interferometer or aperture array’s purpose is to remediate noise: it aligns the output from multiple stations and multiplies the resulting signals to reduce the eﬀect of noise at individual stations.

2.1.1 The Low-Frequency Array

The Low-Frequency Array (LOFAR) is a deployed multi-station radio telescope with interfer-ometer and phased array capabilities. Most of LOFAR’s antenna stations are concentrated on Dutch soil [12]. Remote antenna stations can be found in the UK, France, Germany and Sweden. Digitised data is delivered over the Wide-Area Network (WAN) to a processing facility in the Dutch city of Groningen where correlation takes place. The ingress rate at the correlator is approximately 0.5 Tbit/s [13].

Each antenna station houses a multitude of antennas operating in the low-frequency band as well as antennas operating in a higher frequency range. The analog antenna output is digitised in the station cabinet. Before leaving for the processing facility in Groningen, an initial reduction of the data is made using a digital filter and beamformer, see [12]. Computing resources attach a static UDP/IP header to the packets leaving the station, providing both MAC and IP destination addresses. ARP is not implemented to resolve MAC addresses due to a limitation in resources on the sending side. The receiving side operates in a similar, static fashion. The switch guarding the networking resources at the processing facility in Groningen maintains a static address table for all receiving nodes. As a consequence, the network is static, labor-intensive to maintain, and the possibility of misconfiguration arises [2].

2.1.2 The Square Kilometre Array

The antenna technology deployed in the SKA diﬀers between the Australia and South Africa locations. Factors contributing to the choice of one technology over the other include the assigned science case and the cost [14]. The SKA1-Low telescope in Western Australia is set to feature an aperture array focused at the lower frequencies. SKA1-Mid in South Africa intercepts higher frequency electromagnetic waves and expands on dish antennas that are currently part of the

(11)

CSP Sw itc h Sw itc h Batch processing Batch processing Batch processing Batch processing Batch processing WAN (regional science centres) Mirror Science archive High performance storage SKA Science Data Processor

X X X X X Ingest Ingest Ingest Ingest Ingest Ne ar re al -ti m e Compute Island Medium performance storage Long term storage Sc ie nc e ar chi ve

Figure 2.2: The data flow through the SDP. The X represents a switch within the Compute Island. The big switch guarding the Compute Islands assigns Compute Islands to CSP data streams. Image appeared in [1], used with permission.

MeerKAT telescope. Originally another array called the SKA1-Survey was planned for the Australian site, but has since been postponed [15].

The total collecting area of the SKA is up to one million square meters, or one square kilometre, hence the name [14]. The collecting area is a measure for sensitivity, and the SKA will be more sensitive than any other radio telescope that exists today [16]. Large distances between antenna stations allow for identification of point sources further away from Earth.

At a central location, each SKA1 system features a Central Signal Processor (CSP). The digitised signal from antenna stations is transported to the CSP where channelisation, beamforming and correlation occur. The CSP produces visibilities: cross-products for each station pair [1]. The Science Data Processor (SDP) subsequently calibrates the data and turns it into images. Science Data Processor

The SDP is to be made up of multiple independent computing nodes, so-called Compute Islands (CIs). An unprecedented amount of data is distributed over the Compute Island nodes each second: 5.2 Terabit per second for the SKA1-Mid and 4.7 Terabit per second for the SKA1-Low [1]. UDP traﬃc from the CSP travels several hundred kilometres before it reaches the SDP. While the CSP is located relatively close to the antenna stations in the desert, the SDP’s site is envisioned to be in Perth for the SKA1-Low, and in Cape Town for the SKA1-Mid.

Compute Islands in essence are collections of computing nodes. Tasks of a Compute Island include: reorder out-of-order received data, buﬀer data for batch processing, do the actual data reduction, store the obtained scientific products, and send them oﬀ to a science archive. The idea is that correlator nodes in the CSP send data to virtual nodes. The SDN’s software component assigns Compute Islands to virtual destinations, causing a feed from a CSP node to reach one or more CIs. This hypothetically allows for a fast and dynamic allocation of CI nodes to process CSP feeds. The datapath through the SDP is pictured in Figure 2.2.

2.2 Software-Defined Networking

Software-defined networking refers to a networking concept that abstracts a switch’s forwarding functionality into dedicated remote-controlled software. In order to make a meaningful compar-ison with ‘normal‘ networks, we will reiterate on some networking fundamentals first.

All the bolts and nuts that enable communication between two computers over a certain distance are spread out over five layers [17, p50]. The transport layer, the network layer and the link

(12)

Layer 5: Application Layer Layer 4: Transport Layer

Layer 3: Network Layer Layer 2: Link Layer Layer 1: Physical Layer

Figure 2.3: The five network layers [17, p50]. This simplified model is based on the OSI model that distinguishes seven layers.

layer are most relevant to our research. A brief summary of the topics associated with each layer follows.

The transport layer caters to application processes running on diﬀerent hosts. It is home to the TCP and UDP protocols. An application at the sending side hands the transport layer a chunk of bytes and specifies a destination IP address and port as well as the protocol to be used. The bytes are subsequently packaged and delivered to the receiving end. Both the TCP and UDP protocol are associated with a set of expectations. UDP is a connection-less protocol that does not provide any guarantees on the quality of its service. Packets might be delivered out of order, or they might not arrive at all, while the sender has no means of finding out. Only the integrity of individual packets called datagrams is guaranteed. TCP on the other hand is a 2-way protocol that guarantees reliable data transfer, including in-order delivery. The reliability of TCP comes with the cost of a higher latency, i.e. longer transmission times.

The network layer is responsible for determining the route a packet will take through the internet. Routers’ forwarding tables are populated by routing algorithms. A router indexes its forwarding table with an IP address to obtain the output interface that constitutes the shortest path to the destination host. Routers operate on IP packets: encapsulations of transport-layer packets. Depending on the limitations of the underlying links, a transport layer packet might have to be fragmented into multiple IP packets. The maximum payload of a link-layer frame is called the maximum transmission unit or MTU. Ethernet for example places an upper bound of 1500 bytes on the frame payload size. The protocol stack on the receiving side will reassemble the fragmented packets based on information in the packet headers. Ethernet frames that exceed the 1500 byte maximum are called jumbo frames.

Switches belong to the link layer’s domain. A specialized hardware component called the ASIC (application-specific integrated circuit) provides switching functionality at high speeds - much faster than could be accomplished in software by a CPU. The switch’s MAC address table specifies what output interface leads to what MAC address. It could initially be empty, or configured by hand. If the switch receives a packet with a certain source MAC address that is not yet in its table, it will add a new entry mapping the MAC address to the receiving interface. If the switch receives a packet with a certain destination MAC address that is not yet in its table, it will output the packet on all ports (’flooding’ the local network) or drop it. ARP traﬃc is a valuable source of info for a switch’s MAC address table. The Address Resolution Protocol (ARP) is used to find out what IP address belongs to what MAC address. An ARP request packet has a destination MAC address of FF:FF:FF:FF:FF:FF. This is called the MAC broadcast address. It will cause all hosts on the subnet to receive and process the message. Only the host that matches the requested IP address will respond with an ARP reply packet providing its MAC address to the host that initiated the ARP request. Note that a host usually has multiple interfaces and

(13)

each interface has its own MAC and IP address. If a host receives a frame whose destination MAC address does not match the host’s MAC address, the host’s link layer will reject the frame. The frame is not propagated to upper layers but discarded.

The idea behind Software-Defined Networking is to take control over the forwarding behaviour of switches. Packet headers can be modified on request to facilitate traﬃc redirection and duplication. A remote-controller application has fine-grained control over the fate of each packet (including the possibility of dropping it).

2.2.1 OpenFlow

Several protocols have appeared that implement the principles of SDN, such as OpenFlow and ForCES [18]. OpenFlow is currently the most common implementation of SDN, perhaps fueled by rapid adoption on switch manufacturers’ end. The origins of the protocol trace back to the need for a large-scale practical network experimentation environment [19]. Aside from hardware vendors building OpenFlow-supported switches, OpenFlow is supported by the publicly available switching software Open vSwitch as well. Open vSwitch is responsible for the switching behaviour in the hardware switch we will be using. It runs as a process within our hardware switch’s Linux-based operating system. Although Open vSwitch was designed to run on hosts to facilitate sharing of networking resources between diﬀerent VMs [20], its sole purpose in our case is to provide the switch’s forwarding functionality. It is certainly more economic from the vendor’s perspective to bundle the switch with software that is developed in the open and supports a wide range of protocols, as opposed to writing the nitty-gritty details of these protocols from scratch.

The major components of an OpenFlow-enabled switch are displayed in Figure 2.4. The Open-Flow protocol itself is an application-layer protocol that enables communication between the switch and a remote application called a controller. Messages can be one of three kinds: controller-to-switch, asynchronous, and symmetric [21]. Controller-to-switch messages allow the controller to modify or inspect the switch’s state. Asynchronous messages are occasionally dis-patched by the switch to inform the controller of state changes. Symmetric messages can be initiated by either end, e.g. to verify the other end’s presence. The OpenFlow protocol runs on top of TCP. The specification does not enforce encryption of the communication channel between controller and switch (e.g. using TLS) [21], but doing so reduces the risk of man-in-the-middle (MITM) attacks [22] that potentially bring an attacker in control of the network.

The responsibility of forwarding decisions is placed at the controller. A forwarding rule or ‘flow‘ is uniquely identified by its match fields and priority. A packet is said to ‘match‘ a flow when the packet’s header fields conform to values defined by the match fields. In case of a match, instructions in the flow’s instruction set are executed. Matching is not confined to one specific layer; example header fields include the source IP address (layer 3) and the source port (layer 4). The flow’s priority enables diﬀerentiation amongst multiple matching flows. In case multiple matching flows have the same highest priority, behaviour is undefined [21].

Flows can be installed in two diﬀerent ways: proactively and reactively. In the former case,

Secure Channel Flow Table OpenFlow Switch Controller OpenFlow Protocol SSL

(14)

the controller instigates a controller-to-switch message that describes a flow change (addition, deletion, or modification), causing the switch to modify its flow table: an ordered collection of flows. A switch can have multiple flow tables. In the case of reactive flow setup, the switch pushes an asynchronous packet-in message to the controller on receipt of a packet that does not match any existing flows. Based on information from the incoming packet’s header, the controller can decide to respond with a new flow that matches the packet. The switch adds the new flow to its flow table and re-evaluates the fate of the incoming packet.

Before any processing takes place, an incoming packet is assigned an empty action set. As it travels through the switch, matching flows either deposit actions into the packet’s action set or choose to apply actions immediately. See Figure 2.5. In the latter case, a flow defines the Apply-Actions instruction, supplying an ordered action list. The actions in the action list are executed the order specified. The Apply-Actions instruction is optional, not all OpenFlow implementations might support it [21]. The Write-Actions instruction adds the specified actions to a packet’s action set. After all flow tables have been traversed, the actions present in the packet’s action set are executed in a predefined order. Sample actions include set and output. The set action enables modification of packet header fields, the output action directs the packet to an output port. We will be relying on the set action to facilitate packet redirection: modification of destination MAC and IP address is required to successfully redirect a packet without touching the sender. The output action is always performed last. Multiple output actions may be supplied to output a packet multiple times. Note that duplication to diﬀerent hosts will again require modification of the MAC and IP destination fields.

In the context of the SDP, the virtual destinations that CSP nodes direct their traﬃc at are non-existant IP addresses. An OpenFlow controller is responsible for setting up the mappings from virtual IP to physical IP address on the switch. Suppose a packet with virtual destination address IPV is received by the SDP entry switch pictured in Figure 2.2. The switch consults its flow table, modifies the packet destination IP to the physical address IPP, modifies the destination MAC address to the corresponding MACP, and forwards the packet. If a hybrid OpenFlow switch were to be used, we could opt for traditional forwarding where we rely on the switch’s MAC address table to determine the output port corresponding to MACP. In that case we would have to make sure that the switch’s MAC address table is properly populated in advance, e.g. by implementing ARP on the sending side. Otherwise the first packet for a destination MAC address Y will always cause a table miss, resulting in the switch forwarding the packet to all ports or dropping it. The other possibility is to explicitly define the output port in a flow. The choice of one approach over the other is constrained to whether or not the ability exists to properly populate the switch’s MAC address table in advance, since flooding and loss must be avoided in our high-throughput environment. The switch used in our research is a hybrid switch, supporting both operation of a normal switch and OpenFlow. A flow requests normal switching behaviour (referring to how a normal switch would match an output interface to a MAC address) by supplying an output action with a port argument of ‘normal‘ (e.g. output:normal as opposed to output:27 ). For the redirection experiments we compared the performance of

Table 0 Table1 ¨ ¨ ¨ Tablen Execute Action Set Packet Ingress port Action Set = {} Packet + ingress port + metadata Action Set Packet Action Set Packet

Figure 2.5: The path of a packet through an OpenFlow switch [21]. Table 0 up to n are flow tables. Flows have access to the packet’s header fields, the ingress port, and meta data exchanged from one flow to the next.

(15)

normal forwarding behaviour with explicit forwarding. The results were so contrasting that we decided not to look into the performance of normal forwarding for the duplication experiment (see Chapter 5).

With the planned experiments in mind, the Apply-Actions instruction seems to be the ideal candidate to guide our packet duplication eﬀorts. The Write-Actions instruction is not suitable because the output action is always performed last within an action set. Suppose we want to duplicate a packet headed for host X to host Y and Z. The packet would undergo the following processing steps: output to X, rewrite IP/MAC destination to IP/MAC of Y, output to Y, rewrite IP/MAC destination to IP/MAC of Z, output to Z. If these actions would be added to an action set in this order, they would be reordered such that all IP/MAC address modification occurs before the output actions, rendering the modification actions useless. The Apply-Actions instruction on the other hand applies the actions in the supplied action list immediately in the given order, and this is true even if the list includes output actions: “If the action list contains an output action, a copy of the packet is forwarded in its current state to the desired port.” [21, p22]. If our switch would not support the Apply-Actions instruction (we found out it doesn’t), OpenFlow oﬀers another useful feature: groups and action buckets. group in this section refers to the OpenFlow all group type. For a description of other group types available, please see [21, p17]. An OpenFlow group forms a packet processing pipeline composed of one or more action buckets. An action bucket contains a set of actions that are executed in the same order as actions in an action set. The Write-Actions instruction of a flow could include a group action that forwards a packet to a specific group. On arrival at the group, the packet is cloned for each bucket and the bucket’s actions are applied [21, p17]. See Figure 2.6.

2.2.2 Floodlight

Floodlight is an open-source Java implementation of an OpenFlow controller. Floodlight intro-duces yet another abstraction: it exposes interaction with the switch through an HTTP API called the Static Flow Pusher API. Experience with Floodlight (through the UvA SNE Group) have determined our choice for Floodlight, as well as the fact that it exposes a HTTP API. Other controllers, like RYU [24], are available, but an exploration of the features and performance of various network controllers is outside the scope of this project.

ID Type=ALL Counters Bucket Actions Bucket Actions ... Bucket Actions Group

Figure 2.6: Schematic overview of an OpenFlow group [23]. The ID uniquely identifies the group, the counters serve statistics purposes and are updated when packets get processed.

(16)

(17)

CHAPTER 3

Methodology

To examine the OpenFlow performance metrics, we built a small-scale SDP network featuring a hybrid OpenFlow switch, a host that could be seen as a correlator node, and several more receiving hosts that could be seen as Compute Island entities. UDP traﬃc is generated by the correlator node and forwarded to the simulated SDP nodes by the OpenFlow switch.

3.1 Network topology

A schematic overview of the network topology used throughout all experiments is given in Figure 3.1. Host A will function as the sending host, and, depending on the experiment, one or multiple of hosts B C and D will be receiving packets from A. Host C4 will be running the OpenFlow controller. Hosts are interconnected through 1 Gbps CAT 5E copper cables. Host A could be seen as a correlator node, while B, C and D are Compute Island nodes. Host C1 is the host interacting with the Floodlight Static Flow Pusher API. The equipment used is part of the OpenFlow testbed of the SNE group’s OpenLab environment [25]. All hosts are virtual machines. A 10.0.0.1 Correlator B 10.0.0.2 Compute Island C 10.0.0.2 Compute Island D 10.0.0.2 Compute Island Switch C4 192.168.100.14 C1 192.168.200.21 192.168.200.22 192.168.200.23 192.168.200.24 192.168.200.14

(18)

3.1.1 Switch

The hardware switch is a Pica8 P-3290 switch. The switch runs Open vSwitch version 2.0.90 and supports OpenFlow version 1.0 to 1.4. Hosts A-D are connected to switch ports 25-28. The switch is a hybrid OpenFlow switch: it supports both OpenFlow and normal layer 2 switching.

3.1.2 Controller

The controller node (C4 in Figure 3.1) runs Floodlight v1.2. Due to Floodlight v1.2 having only experimental support for OpenFlow 1.4, and since the OpenFlow features we need are available in OpenFlow 1.3 as well, we will force the controller not to advertise support for OpenFlow 1.4 by modifying the Floodlight config, resulting in OpenFlow 1.3 being negotiated and used between the switch and the controller. In this section we occasionally refer to “the Floodlight client”; this is the application developed for our purposes that interacts with the Static Flow API exposed by the Floodlight controller to install flows on the switch.

3.2 Traﬃc redirection

As mentioned earlier, we are interested in the consequences of redirecting actively flowing traﬃc to a new destination. We will look at the time it takes for the first packet to arrive at the new destination, and the number of packets that are lost (if any) while setting up the path to the new destination.

Rerouting a packet to a new destination without tempering with the packet’s source theoretically speaking requires a flow table entry on the switch that modifies both the destination IP and MAC addresses. As depicted in Figure 3.1, all receiving hosts have been assigned the same IP address (10.0.0.2). This is because of a major limitation of the switch: it performs IP address modification in software instead of in hardware, resulting in major packet loss. Fortunately it does support MAC modification in hardware as concluded by [3]. To work around the IP modification limitation we therefore assign all receiving hosts the same IP, eliminating the need of modifying the destination IP address. Destination MAC address modification, which is performed in hardware, is now suﬃcient to reroute a packet to a diﬀerent destination host.

3.2.1 Latency

C C4 (Floodlight) HTTP HTTP Switch OF UDP A UDP B UDP

Figure 3.2: A close-up of the setup used for measuring redirection latency. UDP traﬃc originates at node A and arrives at node B. C is our Floodlight API client. The dashed arrow points at the destination of traﬃc from A after redirection.

(19)

We define latency as the time it takes from the moment the HTTP request with the new redirec-tion flow leaves the Floodlight client, until the first packet arrives at the new destinaredirec-tion host. To measure the latency without the added complexity of synchronising clocks across multiple hosts, the Floodlight client runs on the machine that acts as the traﬃc’s new destination. Traﬃc flows for 2 seconds from A to B, then the redirection flow is installed on the switch. The experiment stops as soon as the first packet arrives at C. C is the new destination and hosts a Floodlight client at the same time. See Figure 3.2.

Let tsbe the moment the Floodlight client initiates the HTTP request carrying the static flow to route traﬃc from A to C, and tfthe moment the first UDP packet from A arrives at C. Then the latency is defined as tl“ tf´ ts. Following the path the initial HTTP API request takes, we can identify several presumably major components the latency tl is made up of, as shown in Figure 3.3. At each stop, representing a node on the network, internal processing takes place, which has not been accounted for in the figure. First there is the time it takes until the HTTP request reaches the Floodlight server. The server will do some internal processing, such as parsing the JSON body of the request and determining the action to take, after which it sends an OpenFlow message to the switch. The switch will then process the OpenFlow message, and add a new entry to its flow table. The new flow won’t be triggered until the next packet from A arrives. Once a packet does arrive from A, the switch will route the packet to the interface connected to C. When the first packet reaches C, the traﬃc has been rerouted successfully.

HTTP OpenFlow idle UDP UDP OpenFlow HTTP ts C FL OVS OVS A tf C FL C

Figure 3.3: Major latency components (not to scale)

3.2.2 Packet loss

A OVS B IB C IC {a0, ..., aku tbju tclu

Figure 3.4: Setup used to measure packet loss incurred on redirecting traﬃc from A destined for B to C. ta0, ..., aku is the sequence of packets leaving the application layer at A (note that this sequence at that point is still in-order), tbju the set of packets arriving at B, and tclu the set of packets arriving at C.

Since there is no time component involved in measuring packet loss, we do not need to run the Floodlight client on host C again. Instead, the client will run on C1 while C remains the

(20)

destination of rerouted traﬃc. The traﬃc again originates from A and is targeted at B. In Figure 3.4 we provide a schematic overview of the setup.

If we attach an incrementing identifier to each packet leaving A, we could compare the largest identifier received at B with the smallest identifier received at C to calculate the number of packets lost in between. It is not sufficient to simply take the last identifier received at B and compare this value with the first identifier received at C due to the nature of the UDP protocol: packets could arrive out-of-order. We let traffic flow for 2 seconds between A and B, then set up the redirection flow and let traffic flow for another 1 second.

The IP protocol already provides us with an identifier field, incremented for each packet leaving a host. The identifier header field is 16 bits wide and ought to be used for fragmentation and defragmentation purposes. Work has been done to probe whether or not it is feasible to use the identification field for other purposes, such as identifying packet loss, duplication, and arrival order [26]. However: RFC 6864 explicitly forbids usage of the IP identification header for non-fragmentation purposes [27]. We will therefore not use this field and resort to an alternative identifier value that is carried in the application layer.

If a single 64 bit unsigned integer is used for the packet identifier, integer overflow will occur after 264_{´ 1 packets. For each run of the experiment we let traﬃc flow for 2 seconds from A to B, and} for 1 second from A to C. It will require a throughput of at least 1

1ˆ1012¨1₃¨1296¨p264´1q “ 8.0ˆ109

TB/s to run into integer overflow over this 3 second window. Since we are performing tests at a much lower throughput (15 MB/s), integer overflow will not occur more than once during a single rest run. It is therefore safe to use a 64 bit unsigned integer to hold the packet identifiers, keeping in mind that when comparing identifier values at B and C, integer overflow might have occurred once, given that traﬃc is continuously generated at A across all runs.

Given a sequence of packets a0, a1, ..., ak´1, ak where k is any k P N. We formally define identifier I_pakq for packet ak:

Ipakq “ #

1 _{if k “ 0 _ Ipa}k´1q ` 1 “ 264´ 1 Ipak´1q ` 1 otherwise

Let b0, b1, ..., bj´1, bj be the sequence of packets that arrive at B where j P N, 0 § j § k. Then we define IB, the highest sequence number received at B over a period of time, as

IB “ fpIpbjq, IBq “ $ ’ ’ ’ ’ ’ ’ & ’ ’ ’ ’ ’ ’ % I_pbjq if j “ 0 $ ’ ’ ’ ’ & ’ ’ ’ ’ % # Ipbjq if Ipbjq ´ IB † IB otherwise if Ipb jq ° IB # Ipbjq if IB´ Ipbjq ° IB otherwise if Ipb jq † IB if j ° 0

where “ 10000 forms a threshold for out of order packets. This could be simplified as:

IB“ $ ’ & ’ % Ipbjq if j “ 0 #

I_pbjq if hpIpbjq, IBq ¨ abspIpbjq ´ IBq † hpIpbjq, IBq ¨

IB otherwise if j ° 0

where hpx, yq “ sgnpx ´ yq.

Similarly, we define the sequence of packets that arrive at C as c0, c1, ..., cl´1, clwhere l P N, 0 § l _{§ k. Then I}C, the lowest sequence number received at C over a period of time, is defined as:

(21)

IC “ gpIpclq, ICq “ $ ’ & ’ % I_pclq if l “ 0 # Ipclq if IC´ Ipclq † ^ hpIC, Ipclqq “ 1 IC otherwise if l ° 0

Note that gpIpclq, ICq will only be valid for Ipclq † pIB´ ` C ` 264´ 1q mod p264´ 1q. The upper bound on the running time of measuring IC is not very relevant in our low-throughput setup, and we only measure IC for 1 second as mentioned earlier. The number of lost packets is C “ pIC´ IB` 264´ 1q mod p264´ 1q ´ 1. Note that C will have to be calculated in an environment that supports arbitrary long integers such as Python, since 64 bits are not suﬃcient when 264_{´ 1 is added to another integer.}

3.3 Traﬃc duplication

Another major area of interest is the impact of duplication flows. We will look at a single flow that takes traﬃc from host A, outputs it to the original destination (B), and duplicates each packet to both C and D. Duplication to hosts diﬀerent than the original destination will require destination MAC address modification.

One way we could achieve this is through an OpenFlow group. Each bucket in the group receives a diﬀerent copy of the incoming packet. The first bucket simply outputs the packet to B, the second modifies the destination MAC address to the MAC of C and outputs to C, and the third modifies the destination MAC address to the MAC of D and outputs to D. Unfortunately, in practice this did not work. It seems as if the switch could only perform one action to all packets directed at the group before they are transmitted: all packets arriving at B, C and D had the same randomly chosen MAC address (either B’s, C’s or D’s). To work around this limitation we will not be using groups and instead install a flow that modifies the MAC address of the packets from A to the broadcast address FF:FF:FF:FF:FF:FF. The flow subsequently outputs the packet three times: once on the interface leading to B, once on the interface leading to C and once to the interface leading to D. Because the packet contains the MAC broadcast address when arriving at the receiving hosts instead of either B’s, C’s, or D’s, MAC address, it won’t be rejected by the receiving network stacks. Note that we will need to use explicit forwarding now, as normal forwarding would emit the packet to all ports on the switch.

3.3.1 Ordering

Sequence numbers are attached to the packets leaving A in the same manner as in Section 3.2.2. The Floodlight client runs on C1. We are interested in the number of packets that are received out of order at B while a duplication flow is active. The resulting measurements will be compared against measurements obtained while a single flow forwarding traﬃc from A to B is active. As opposed to previous experiments, the generator is not continuously running across individual measurements. The out-of-order count at B is obtained in one go with the packet loss count. Section 3.3.2 explains why obtaining the packet loss count requires the generator to be restarted between individual measurements. A restart means that sequence numbers start counting from 1 again, thus making the overflow check from previous sections obsolete.

At B over a period of 2 seconds we compare the previously received sequence number to the current one. If the current sequence number is smaller, the out-of-order-counter is incremented by one. Formally:

(22)

SB“ mpSB, Ipbjq, Ipbj`1qq “

#

SB` 1 if Ipbjq ° Ipbj`1q SB otherwise

The likelihood of the sequence number overflowing over a period of 2 seconds is 0, as discussed earlier. SB is initialised to 0.

3.3.2 Packet loss

The most accurate way of measuring packet loss would be to substract the number of packets received by B from the number of packets sent by A over the same period of time. In the experiments from Section 3.2, the generator program was never interrupted. That would make it extremely hard to accurately determine how many packets were sent over a certain period of time. Initially our method of counting the number of packets lost solely relied on the traﬃc reaching B: we kept track of the lowest and highest sequence number reaching B over 2 seconds. These values form a lower and upper bound respectively (taking into account 1x overflow at most) of the full range of sequence numbers that would have reached B in case of 0 loss. We substract the number of packets that actually reached B from the theoretical max to obtain the number of packets lost. It is not hard to see that this method is not fool-proof. For, if the only packets that reach B are the packets with sequence numbers a and a ` 1, while all other packets with sequence numbers ranging from a ` 2 all the way to a ` 2 ` did not make it, the method would yield a packet loss of 0% whereas the actual loss could be significantly larger (depending on the value of ).

Therefore we introduce the alternative setup pictured in Figure 3.5. The generator is no longer reckless in nature, but instead diligently controlled by a tiny Python process. The Python process exposes two functions through an HTTP API: startup and termination of the generator. Both are implemented by execution of bash commands. On receiving a termination HTTP request, the Python process sends the generator the SIGUSR1 signal. Upon receiving the SIGUSR1 signal and after having finalised a packet transmission (we don’t want to interrupt the generator if it is in the middle of a transmission), the generator writes the total number of packets sent since startup to a memory region shared between itself and the Python process, and exits gracefully. The Python process waits until the generator has exited, reads the value from shared memory and serves the result in the body of the HTTP response. Meanwhile host B keeps track of the number of packets received, and after detecting traﬃc starvation (caused by termination of the generator) outputs the resulting count to a file. The generator runs for a period of 2 seconds. Inter-process communication seems a cumbersome approach, but it does ensure consistency: we can reuse the packet generation logic used throughout all experiments, without porting it to Python or writing an HTTP server in C.

(23)

Generator 10.0.0.1 Server 192.168.200.21 Switch Shared memory A C1 C4 C B D Stop Start Write Read UDP HTTP HTTP OpenFlow _UDP

Figure 3.5: Schematic overview of the setup used to measure packet loss at B while a duplication flow is active on the switch.

(24)

(25)

CHAPTER 4

Implementation

There are two components to our implementation: a suite of C programs generating and analysing traﬃc, and a Python program interacting with the Floodlight API.

4.1 Toolkit

The Netherlands Institute for Radio Astronomy (ASTRON) has produced a suite of C programs to simulate and analyse data streams from a LOFAR antenna station. We will be using a modified version of two of these programs in our experiments: the generator program and the analyser program.

An explanation of the specific workings of these programs is out of scope for this project, since it is heavily modelled around the functionality of LOFAR. Parameters allow for tuning of mimicked output, such as the number of subbands, beamlets and samples per packet. The interested reader is referred to [28] for a detailed explanation of these concepts. Together the parameters determine the size of the packets transmitted by the generator program. Their individual value is not relevant to us - the final packet payload is.

The reasoning behind using the LOFAR software to model an SKA data stream is two-fold. In the first place, we have extensive experience with the programs, due to ASTRON’s involvement in this project. Additionally, the generator provides a data stream with characteristics similar to the stream the SDP nodes will receive in practice. It generates an uninterrupted stream of UDP packets. Across all experiments, we have chosen to use UDP packets with a payload of 1296 bytes. Packets are delivered to the network at a rate of 195312.5{16 “ 12207 packets per second. 195 MHz indicates the number of samples per second, and each packet carries 16 samples. These are program defaults, and in the context of this project are more or less arbitrary values that work well. It puts a decent amount of traﬃc on the network and allows us to run multiple streams concurrently, without exceeding the capacity of the underlying links (1 Gbps). Line rate (˘ 115 MB/s) is approximated with seven concurrent streams. In the deployed SKA the rate at which Compute Island nodes will receive data is much higher than 195 MHz. This project however does not focus on approximating the SDP scenario as closely as possible, we are merely interested in exploring the required OpenFlow functionality, and 1 Gbit traﬃc therefore still yields valuable results. The size of the payload has been chosen such that it stays below the MTU of the links (1500 bytes). A payload larger than 1500 bytes minus the UDP header will require fragmentation of the packets, introducing a performance hit and obscuring the performance metrics we intend to obtain. The deployed SDP’s network infrastructure will host packets larger than 1500 bytes and thus will be equipped with support for jumbo frames to prevent fragmentation.

(26)

packet, Section 3.2.2 explains how they are generated. The analyser program provides us with an easy hook to operate on incoming packets. It opens a socket and waits for packets delivered from the generator program. Depending on the experiment, the analyser program will keep track of certain sequence numbers or count the number of packets received. Additionally the analyser has been extended to include a means of detecting traﬃc starvation on the listening socket. If elapsed time between two subsequent packet receptions equals or exceeds 1.8 seconds, the pending metrics (i.e. count of received packets) are appended to a file and reset, before the new packet is analysed. More on the reasoning behind this in Section 4.2.

4.2 Generating Experiments

Experiments are initiated and terminated by a Python program. In that sense the program ‘controls‘ the experiments and could be labelled a controller, but this terminology is easily confused with the SDN network controller. In earlier sections it has been referred to as the Floodlight client since its feature set includes interacting with the Floodlight Static Flow Pusher API. In this section we will call it the ‘timer‘. The program’s main purpose is to time experiments and modify traﬃc flows on the switch. Depending on the experiment, it has other tasks as well. In Section 3.2.1, the timer acts as the new destination of redirected traﬃc. It has the additional task of registering the time it takes between the moment it sends an HTTP request to the Floodlight Static Flow Pusher API with the redirection flow until it receives the first redirected UDP packet. The result is appended to a file.

There is some tasks that the timer has in common across all experiments. At the start of each experiment, a (blocking) HTTP request is performed to the Static Flow Pusher API to clear all flows on the switch. There could be previous flows active on the switch that interfere with our experiment. Secondly, we introduce the concept of experiment ‘runs‘. An experiment, e.g. the experiment as described in Section 3.2.2, is performed multiple times in so-called successive runs. A single run is composed of multiple steps. The first steps configure the environment, a subsequent step installs the flow of interest on the switch through the Static Flow Pusher API, and finally the metric of interest is measured. For each experiment we have obtained a 1000 results to reduce any bias of individual runs. Note that 1000 results does not necessarily equal 1000 runs. In Section 3.2.2 and 3.3 at the end of each run, right after clearing all flows on the switch, the timer introduces an idle period of 2 seconds. During this period the switch discards all arriving traffic. Traffic ‘starves‘ at the receiving node. Once the next run is entered and traffic reaches the receiver again, the receiver notices that the elapsed time between the reception of the first packet of the new run and the last packet of the previous run exceeds 2 seconds, causing it to flush collected metrics from the previous run to disk. In that case 1001 runs are required to obtain 1000 results. An example of ‘configuring the environment‘ includes installing a flow that connects A to B in case of the redirection experiment (traffic that flows from A to B before being redirected to C makes for a more realistic scenario). For the two redirection experiments in Section 3.2, the generator program is uninterrupted between all runs. For the two duplication experiments in Section 3.3, the timer has been assigned the additional task of launching and terminating the generator program at the start and end of each run, respectively. The exact implementation of the timer for each experiment is available upon request.

For the experiment in Section 3.3.2, inter-process communication is required. The simple Python server running on the same host as the generator needs to obtain a counter holding the number of packets sent from the generator process (see Figure 3.5). To facilitate this we resort to the POSIX Realtime Extension which allows us to read from and write to a memory region shared between processes. Both the Python server and the generator program call into the original C library that implements the Realtime Extension to ensure consistent behaviour. Calling into C code from within a Python program is achieved by using Python’s C API.

(27)

4.3 Flows

The diﬀerent flows used throughout the experiments are presented here, formulated in the syntax the Open vSwitch command-line tool ‘ovs-ofctl‘ expects them. Note that for the redirection experiment the ‘actions‘ argument varies between ‘output:<port number>‘ and ‘normal‘. The flow connecting A to B in Section 3.2 is provided in Figure 4.1.

priority=65534,dl_type=0x0800,nw_src=10.0.0.1,dl_dst=<MAC B>,actions=output:26 Figure 4.1: The flow connecting A to B.

The comma-separated list up to the ‘actions‘ keyword filters the traﬃc the flow shall be applied to. A dl_type of 0x0800 filters all IPv4 traﬃc, nw_src matches source IP, and dl_dst destination MAC address. The actual MAC address has been omitted.

The redirection flow is pictured in Figure 4.2, the duplication flow in Figure 4.3. In the ex-periments in Section 3.2, the duplication flow is installed while the ordinary flow (Figure 4.1) is still active. The higher priority of the duplication flow will deactivate the latter. The same methodology is used for the redirection flow.

priority=65535,dl_type=0x0800,nw_src=10.0.0.1,dl_dst=<MAC B>,actions=set_field:<MAC C>->eth_dst,output:27 Figure 4.2: The flow redirecting traﬃc from A destined for B to C. priority=65535,dl_type=0x0800,nw_src=10.0.0.1,dl_dst=<MAC

B>,actions=set_field:FF:FF:FF:FF:FF:FF->eth_dst,output:26,output:27,output:28 Figure 4.3: The flow forwarding traﬃc from A to B, C and D.

(28)

(29)

CHAPTER 5

Results

5.1 Traﬃc redirection

5.1.1 Latency

In Figure 5.1a we present the latency as measured in two diﬀerent configurations: 1000 mea-surements were taken with the ‘normal‘ output port specified in both flows, and another 1000 measurements were taken with explicit output port actions specified in both flows. The variable n denotes the number of results obtained in each configuration (1000 in each of the aforementioned cases).

The latency incurred with normal forwarding behaviour is highly uncertain, and ranges from 12.9 milliseconds up to 225.6 ms (Figure 5.1a). Explicit forwarding comes with a more predictable latency: 76% of the results is around 6.8 milliseconds, and the entire range covers 6.8 to 46 ms. This range hardly changes when we increase the throughput to 7 concurrent streams: the results are between 7.2 ms and 41.8 ms, but a bit more spread out with the largest bin (58%) starting at 8.7 ms (between 8.7 ms and 10.3 ms) (Figure 5.1b).

A closer look at the switch during normal forwarding showed that CPU usage constantly stabilised around 99%. A restart of the switch brought CPU usage back to normal, but also inadvertedly caused our flows to be deleted. Re-adding the flow resulted in CPU usage peaking again - this means that the flows introduce the high CPU usage. On further inspection, it turned out that providing the ‘normal‘ output port leads to increased demand on the CPU. An event called ‘poll_fd_wait‘ drained CPU resources.

5.1.2 Packet loss

This time we start oﬀ with specifying explicit output port actions instead of requesting normal forwarding behaviour. The results are shown in Figure 5.2a. Loss in most cases is concentrated in the 0-10 packets range, but there are significant outliers as well, with a maximum at 55 lost packets. At a rate of ˘12207 packets per second this is 0.4%. Figure 5.2b shows the loss for each stream in case of 7 concurrent streams. Each color represents the packet loss histogram for one of the seven streams. The solid color, turquoise, is at the intersection of all seven histograms. An approximately equal number of packets is lost by each individual stream. The absolute max for a single stream increases to 74 lost packets (74{12207 ¨ 100 “ 0.6%).

The methodology from Section 3.2.2 did not work for the flows that have their explicit output port replaced with the ‘normal‘ output port. It turned out that for all measurements, IC † IB by an amount ranging from 53 to 2624. This suggests that after requesting the traﬃc redirection

(30)

(a) Normal forwarding (grey) vs explicit forwarding (yellow).

(b) Explicit output port action, 7 concurrent streams.

Figure 5.1: Redirection latency for n “ 1000 and diﬀerent configurations. A single bin covers 1.5 milliseconds.

(31)

flow to be installed, some packets are still being forwarded to B, whilst others are buﬀered, have their header modified, and are subsequently transmitted over the interface that leads to C. To verify this assumption, we run a small-scale one-time experiment that records all sequence numbers received at B and C, and plot the values over time. The exact timestamps have been omitted and the space between the last sequence number at B and the first at C in Figure 5.3a is not to scale. We can see that our assumption is incorrect. Out of all packets arriving at C (the purple dots), just one completely outlies the sequence. This is a duplicated packet: the raw data shows that a packet with the same sequence number as the outlier was received at B earlier. Another interesting conclusion can be drawn from the graph: packet loss is much larger than in Figure 5.2a.

As we saw earlier, in all 1000 cases for n “ 1000, IC † IB. Therefore, there has to be at least one such outlier every time the experiment was run. The experiment was run a second time for n“ 1000. This time, not the lowest value of I at C but the second lowest was assigned to IC. In four instances, IC† IB while in all other instances we obtained the expected result of IC° IB. Interestingly the number of duplicated packets does not seem to be predictable, judging from the 4 cases where there was more than 1 duplicated packet. Loss ranges from 21 all the way up to 2558 (Figure 5.3b). Once again normal forwarding results in worse performance compared to explicit forwarding (Figure 5.2a).

(32)

(a) Single stream.

(b) Seven concurrent streams. Each color represents the histogram for one of the seven streams. Figure 5.2: Packet loss during redirection for a single data stream vs 7 concurrent streams, n_{“ 1000 and explicit forwarding. A single bin covers 1 packet.}

(33)

(a) A close-up of sequence numbers received at B (blue) and C (purple). The distance between the arrival time of the last packet at B and arrival time of the first received at C is not to scale.

(b) n “ 996. In 4 cases, more than one duplication occurred. Bin width is 10 packets. Figure 5.3: Packet loss during redirection for normal forwarding.

(34)

5.2 Traﬃc duplication

5.2.1 Ordering

In order to be able to make a meaningful conclusion on the measurements at B, we inspected two diﬀerent scenarios. In the first scenario, the only flow active on the switch is a basic flow that simply connects A to B. We compare the results obtained with a flow that not only connects A to B, but also duplicates traﬃc to C and D. The former scenario acts as a reference frame. Due to the unreliable nature of UDP such a reference frame helps put the obtained results into perspective. Both cases are displayed in Figure 5.4.

Next up we set up the same cases, but with an increase in traﬃc. Instead of initiating a single 15 MB/s stream from A, we launch 7 concurrent streams of equal bandwith, causing the total throughput to approximate line rate. The results are displayed in Figure 5.5.

5.2.2 Packet loss

In similar fashion as in Section 5.2.1, we compare a reference frame against measurements ob-tained with the duplication flow active. See Figure 5.6. Again the same experiments are con-ducted with 7 streams as pictured in Figure 5.7. The results are remarkable. While we have not seen a significant impact of the duplication flow on the ordering or loss at B, the 7 concurrent streams incur around 5-6% packet loss each at B. There is no immediate explanation for the outliers between 40 and 50%. Out of 7000 measurements, we measured packet loss at B for a single 15 MB/s stream from A to B to be greater than or equal to 20% in 63 cases. In all these cases the number of sent packets is the expected number, whereas the number of received packets is lower. The sudden performance hiccup going from 1 to 7 streams could hypothetically be explained by the switch hitting a capacity limit in the ASIC’s pipeline processing. To verify if this is the case, we return to the single stream scenario and redo the experiments with gradually more streams. Results are displayed in Figure 5.8 - 5.12. Clearly, there is no sudden increase in packet loss which would indicate a hardware limitation. Instead, packet loss gradually increases with a higher throughput. Starting at 6 concurrent streams, there is not a single case anymore where packet loss at B equals 0 lost packets, whereas in the reference frame obtained with 7 streams (Figure 5.7a) there is still cases of 0 packet loss.

(35)

(a) Flow connecting A to B.

(b) Flow duplicating to C and D.

Figure 5.4: Percentage of packets that arrived out of order at B for n “ 1000 and diﬀerent flows, with explicit output port actions, active. Bin width is 0.001% (˘25 packets for 2 sec traﬃc).

(36)

Figure 5.5: Percentage of packets that arrived out of order at B for n “ 1000 and diﬀerent flows, with explicit output port actions, active. Each color represents the histogram of one of 7 concurrent streams of traﬃc. Bin width is 0.001%.

(37)

Figure 5.6: Packet loss at B for n “ 1000 and diﬀerent flows, with explicit output port actions, active. Bin width is 0.05% (˘1250 packets).

(38)

Figure 5.7: Packet loss at B for diﬀerent flows, with explicit output port actions, active. Each color represents the histogram of one of 7 concurrent streams of traﬃc, each histogram has 1000 data points.

(39)

Figure 5.8: 2 concurrent streams

(40)

Figure 5.10: 4 concurrent streams

(41)

(42)

(43)

CHAPTER 6

Conclusions

In this work we have examined the performance of traffic redirection and duplication achieved using OpenFlow and the Pica8 P-3290 switch. We started off comparing normal forwarding with explicit forwarding, but soon dropped that idea further due to the significant difference in performance. Normal forwarding takes up more CPU resources than explicit forwarding, resulting in significantly worse performance compared to explicit forwarding.

It takes at most 42 ms for a redirection flow to take eﬀect at approximately line rate. Note that this includes overhead on the switch’s part (e.g. the OpenFlow message exchange), on Floodlight’s part, and the physical link latency from source to destination for an individual packet. Line rate vs a single 15 MB/s stream does not make a significant diﬀerence. Loss on the other hand is approximately equal for each of seven streams (75 packets at most) vs a single data stream, suggesting the metric to scale linearly with the throughput.

The impact of traﬃc duplication at the original destination is neglectable in terms of out-of-order arriving packets. Packet loss on the other hand grows towards 10%, with a max at 48%. Table 6.1 summarises the OpenFlow features examined across all experiments, including those that lack full support in our setup. Destination MAC address modification (the mod_nw_dst action) is fully supported in hardware, and has been essential to accommodate packet redirection. IP destination address modification (the mod_ip_dst action) is not supported in hardware. The switch instead forwards the packets from ASIC to CPU, where the modification is performed in software. This introduces a major performance hit. Support at line rate is essential for successful application in the SDP. Lastly, both the Apply-Actions instruction and groups are not fully supported, forcing us to resort to the broadcast address workaround. The Apply-Actions instruction always applied output actions last (essentially giving us Write-Actions behaviour), despite the OpenFlow spec saying that output actions shall be performed in the order specified. Groups behave in a similar manner: there were no signs of packet uniqueness across buckets, instead all modification seemed to appear on the same packet, before the output actions were performed. Either one of the Apply-Actions instructions or groups is essential to apply diﬀerent modifications to a packet and output the packet in between those modifications.

Feature Support

Destination MAC address modification Supported in hardware.

Destination IP address modification Supported in software, not in hardware.

Apply-Actions instruction Not fully supported.

Groups Not fully supported.

(44)

(45)

CHAPTER 7

Discussion

The key part of this research involved uncovering the OpenFlow functionality required for the SDP, and the status of support of these features in current hardware. It is now clear what functionality is required, and that the switching equipment needs to support this functionality through its ASIC. Otherwise we would get just a fraction of the possible throughput.

We have seen that there is a significant diﬀerence in performance between explicit forwarding and normal switching behaviour for the hybrid OpenFlow switch used. Implementation of one behaviour over the other depends on the resources of correlator nodes. For normal forwarding special care has to be taken in order for the switch to recognize MAC addresses that it has not seen before to prevent flooding, e.g. by implementing ARP on the sending side or pre-configuring its MAC address table. Pre-configuration would again introduce a static component. Explicit forwarding is however more performant according to our results, and would therefore be the recommended approach.

The trick with the broadcast address suﬃced as a workaround in our setup for duplication to diﬀerent hosts, and could potentially work in the SDP if explicit forwarding is used (normal forwarding would result in flooding). This is especially an interesting option if rewriting the destination IP/MAC addresses once turns out to be remarkably faster than rewriting them multiple times. We therefore recommend comparison of both, in a setup with a switch that supports groups or the Apply-Actions instruction. On a higher level we attest it to be useful to expand experimentation to state-of-the-art switching equipment that does support all required OpenFlow features natively. At this point we do not see any incentive for assigning all Compute Islands the same IP address, as we expect there to be switches on the market that do support IP destination modification in hardware. It might however be a viable option if both IP and MAC modification are significantly slower than just destination MAC address modification.

Aside from a more powerful switch, we recommend extending the test environment to a more realistic scenario. The SDP will employ IPv6 over IPv4, as well as jumbo frames. A setup with a higher throughput and jumbo frame support might reveal unexpected complications, and so might be the case for IPv6 packets that have a larger header.

Finally, the switch lacks clear documentation on OpenFlow features that are supported and those that are not. It is therefore recommended that any switching equipment that is to be considered for the SDP is thoroughly examined for usability first, to prevent surprises in a later stadium.

(46)

(47)

Bibliography

[1] P.C. Broekema, Rob V van Nieuwpoort, and Henri E Bal. The Square Kilometre Array Science Data Processor. Preliminary compute platform design. Journal of Instrumentation, 10(07):C07004, 2015.

[2] P.C. Broekema. Improving sensor network robustness and flexibility using software-defined networks. 2015.

[3] Diederik Vandenvenne, Tjebbe Vlieg, Marijke Kaat, and Ronald van der Pol. OpenFlow Enlightenment. 2013.

[4] Karl G Jansky. Electrical disturbances apparently of extraterrestrial origin. Radio Engineers, Proceedings of the Institute of, 21(10):1387–1398, 1933.

[5] Marc L Kutner. Astronomy: A physical perspective. Cambridge University Press, 2003. [6] Robert C Smith. Observational astrophysics. Cambridge University Press, 1995.

[7] Andy Lawrence. Astronomical Measurement: A Concise Guide. Springer Science & Business Media, 2013.

[8] Kristen Rohlfs and Thomas Wilson. Tools of radio astronomy. Springer Science & Business Media, 2013.

[9] Claude E Shannon. Communication in the presence of noise. Proceedings of the IRE, 37(1): 10–21, 1949.

[10] SKA. SKA Aperture Arrays. https://www.skatelescope.org/aperture-arrays/. [On-line, accessed 28th May 2016].

[11] M Bentum. LOFAR, de radiotelescoop van de toekomst. http://www.fisme.science.uu. nl/woudschotennatuurkunde/verslagen/Vrsl2003/bentum.htm. [Online, accessed 8th of June 2016].

[12] MP Van Haarlem, MW Wise, AW Gunst, George Heald, JP McKean, JWT Hessels, AG De Bruyn, Ronald Nijboer, John Swinbank, Richard Fallows, et al. LOFAR: The low-frequency array. Astronomy & Astrophysics, 556:A2, 2013.

[13] Heino D Falcke, Michiel P van Haarlem, A Ger de Bruyn, Robert Braun, Huub JA Röttger-ing, Benjamin Stappers, Wilfried HWM Boland, Harvey R Butcher, Eugène J de Geus, Leon V Koopmans, et al. A very brief description of LOFAR–the Low Frequency Array. Proceedings of the International Astronomical Union, 2(14):386–387, 2006.

[14] P Dewdney, W Turner, R Millenaar, R McCool, J Lazio, and T Cornwell. SKA1 system baseline design. Document number SKA-TEL-SKO-DD-001 Revision, 1(1), 2013.

[15] Philip Diamond. The outcome of baselining.

[16] A B Bhattacharya, R Bhattacharya and S Joardar. Astronomy and Astrophysics. Jones and Bartlett Publishers, 2010.

(48)

[17] James F Kurose and Keith W Ross. Computer networking: a top-down approach. Addison-Wesley, 2007.

[18] Adrian Lara, Anisha Kolasani, and Byrav Ramamurthy. Network innovation using openflow: A survey. Communications Surveys & Tutorials, IEEE, 16(1):493–512, 2014.

[19] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 38(2):69–74, 2008.

[20] Ben Pfaﬀ, Justin Pettit, Keith Amidon, Martin Casado, Teemu Koponen, and Scott Shenker. Extending Networking into the Virtualization Layer. In Hotnets, 2009.

[21] ONF. OpenFlow Switch Specification Version 1.3.2, 2013.

[22] Kevin Benton, L Jean Camp, and Chris Small. Openflow vulnerability assessment. In Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking, pages 151–152. ACM, 2013.

[23] Pica8. PicOS Open VSwitch Configuration Guide. http://www.pica8.com/document/v2. 6/html/ovs-configuration-guide/. [Online, accessed 29th May 2016].

[24] RYU. https://osrg.github.io/ryu/. [Online, accessed 8th of June 2016].

[25] OpenLab Overview. https://ivi.fnwi.uva.nl/sne/openlab/openlab-overview/. [On-line, accessed 5th of June 2016].

[26] Weifeng Chen, Yong Huang, Bruno F Ribeiro, Kyoungwon Suh, Honggang Zhang, Edmundo de Souza e Silva, Jim Kurose, and Don Towsley. Exploiting the IPID field to infer network path and end-system characteristics. In Passive and Active Network Measurement, pages 108–120. Springer, 2005.

[27] Joseph Touch. Updated specification of the IPv4 ID field. 2013.

[28] Jan David Mol and John W Romein. The LOFAR beam former: implementation and performance analysis. In Euro-Par 2011 Parallel Processing, pages 328–339. Springer, 2011.

On the Feasibility of Software- Defined Networking in the Square Kilometre Array Science Data Processor

Bachelor Informatica