Performance Analysis of Virtualized Video Streaming Service

(1)

Bachelor Informatica

Performance Analysis of

Vir-tualized Video Streaming

Ser-vice

Mark Croes

Student number: 10609059

June 8, 2017

Supervisor(s): Dirk Griffioen (Unified Streaming) & Rufael Mukeria (Unified Streaming) & Arie Taal (UvA)

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

Video streaming has been rising for that past years and is responsible for the majority of all consumer Internet traffic. This means video streaming services have to scale their infrastructure to meet the resource demands by an increasing user base. Due to its flexibility, virtualized video streaming and cloud services have emerged as a solution. However, this solution will impact the performance of video streaming. This paper aims to analyse the impact on performance of virtualizing video streaming by looking at the throughput and latency. For this purpose, a dedicated experimental testbed will be created. In this testbed a load is generated for a video streaming service. The video streaming service ran on two types of virtualization, a container and a virtual machine, and on a bare metal system without any virtualization to see how it performs in different circumstances. The container proved to reach performance similar to the bare metal system, whilst the virtual machine produces less throughput and higher latency. Additionally, this paper studies the performance of several virtual instances of Amazon’s cloud service, Amazon Web Services. These instances offer more hardware and as a result were able to produce higher throughput with lower latency.

(4)

(5)

Introduction

Video streaming is immensely popular and is expected to continue to rise in the coming years. Already video streaming accounted for 70% of all consumer Internet traffic in 2015 and is es-timated to rise to 82% in 2020 [6]. Popular streaming sites such as Netflix and Twitch are amongst the top 50 most visited websites in the world1_{. Considering the immense popularity}

of streaming and its expected growth, optimizing the performance of video streaming services can be very valuable. Video streaming is a service which requires a lot of resources, in terms of storage, bandwidth and computation in case the data files need to be transcoded. Additionally, these resources are requested unevenly throughout the day. During peak hours a large number of users send requests to these streaming services whilst expecting a minimum level of streaming performance. Therefore, streaming services will have to develop their infrastructure in such a way that its resources can attend to users during peak hours. However, this means that during off-peak hours many of these resources will be redundant thus wasted.

As a sophisticated solution to deal with the large scale of these resources, video streaming services make use of a cloud infrastructure. The attraction of cloud computing is that it provides the services with flexible resource allocation. Not only is it useful on a day-by-day basis in man-aging the peak hours, but it also allows services to more easily expand their infrastructure as their business and the industry grows. This is one of the reasons Netflix has turned to Amazon’s cloud services, AWS, as a platform for their streaming service [4]. As mentioned earlier, as the industry continues to scale, video streaming services will want to scale horizontally or vertically. AWS provides an Infrastructure as a Service model which in turn ensures Netflix can focus on their software. Rather than continuing to work on scaling their infrastructure to meet the de-mand of an increasing user basis, companies can now to turn to virtualization and cloud services for help.

This research will study virtualized video streaming; what it is and how it is accomplished. It will also provide additional insight into the effect of virtualization overhead and streaming behavior. Firstly, this paper will give a technical background of video streaming and different virtualization techniques such as containers and virtual machines. Furthermore, existing liter-ature about this subject matter will be studied and briefly summarized. Afterwards, several experiments will be conducted in both a dedicated experimental setup and in the cloud. Key performance indicators for large scale video streaming services are throughput and latency which will be the center of these experiments. The experiments in the dedicated setup will analyse the performance of different virtualization techniques and the performance of different streaming protocols combined with different virtualization techniques. The other dedicated experiment will study video streams at different bitrates.

Additionally, experiments in the Amazon’s cloud computing service will be set up. Amazon Web Services offers several types of instances which will this research will analyse in terms

(8)

of performance and price per throughput. Other experiments in the cloud will analyse the behavior of various streaming protocols in high and low-bitrate streams. This research is useful for operators looking to deploy virtualized video either on their own premises or in the cloud with high performance and cost efficiency.

(9)

CHAPTER 2

Technical background

2.1 Video Streaming Technology

The product of a video streaming service has to achieve certain minimum requirements. Users do not want long loading or buffering times, yet, they still want videos in the highest quality. To deliver these demands to many users at the same time, a streaming service needs high throughput to deliver all this data. Not only do these users have different devices with different capabili-ties, but their network connection is changing continuously. In recent years, Adaptive Bit-rate Streaming (ABS) has been introduced as a solution to this heterogeneity.

Adaptive Bit-rate Streaming partitions the video in fragments. Each fragment, which is typ-ically between 2 and 10 seconds long, can be requested at different encoding bit-rates by the video player [2]. As figure 2.1 shows, the client dynamically alters the bit-rate if there are any changes to its bandwidth and the viewer does not receive the video fast enough. By lowering the bit-rate, the streaming service can deliver the same video without much buffering but in a lower quality. However, this means that the video needs to be encoded in different resolutions and at different bit-rates which requires more resources. With a large number of concurrent video channels, this can increase significantly.

Figure 2.1: Adaptive Bit Rate Streaming: bit rate changes depending on the availible bandwidth (Source: https://en.wikipedia.org/wiki/Adaptive bitrate streaming).

Since video streaming services largely work on the Internet today, the premier streaming technique is HTTP streaming which uses the TCP/IP protocol. The most used ABS formats are Apple Adobe HTTP Dynamic Streaming (HDS), Microsoft Smooth Streaming (MSS), HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP, also known as MPEG-DASH.

(10)

Amongst these formats, MPEG-DASH and HTTP Live Streaming are the most popular1. Other important factors of Adaptive Bit-rate streaming are Digital Rights Management, which encrypts the content, and Closed Caption Formats, used for displaying text on media. Both of these factors, however, are outside the scope of this paper.

2.1.1 Adobe HTTP Dynamic Streaming

Adobe uses both HTTP Dynamic Bitrate streaming and Real-Time Messaging Protocol to sup-port adaptive bit-rate streaming. HDS uses video content with the Flash MP4 video file format and extends it using an additional standards-based MP4 fragment format. It supports H.264 video encoding and AAC or MP3 audio encoding.

2.1.2 Microsoft Smooth Streaming

Microsoft Smooth Streaming’s format specification, Protected Interoperable File Format (PIFF), is based on the ISO base media file format. These PIFF segments are usually 2 seconds long and are indexed in an XML-formatted Manifest. MSS supports VC-1 advanced and H.264 video encoding.

2.1.3 HTTP Live Streaming

HTTP Live Streaming is the media streaming protocol designed by Apple. It can send both audio and video over HTTP from a web server. HLS is used for playback on iOS-based devices and desktop computers. It supports both live streaming and video on demand. HLS consists of three parts; the server component, the distribution component and the client software [9]. The server component uses a hardware encoder which encapsulates an H.264 encoded video file and an AAC, MP3, AC-3 or EC-3 encoded audio file into an MPEG-2 Transport Stream. A segmenter then divides this stream into a series of media files, also creating an M3U8 format index file which lists the media files. The distributor responds to requests from clients and delivers the index file and media files. Afterwards, a client reads these indexes and then requests the media files in order so it can be displayed seamlessly.

2.1.4 MPEG-DASH

In November 2011, Dynamic Adaptive Streaming over HTTP (DASH) became the first inter-national standard for Adaptive HTTP Streaming. It was developed as a universal solution to provide interoperability between servers and devices [15]. Similarly to other formats, MPEG-DASH stores a manifest, Media Presentation Description (MPD), which describes the content and the media segments on a HTTP server. The operation method of an MPEG-DASH client is displayed in figure 2.2; first the client requests the manifest file, which consequently is parsed so the client receives all the available information about the content; bit-rates, segment length, and more. The segments are then obtained through HTTP requests.

Figure 2.2: MPEG-DASH standard for multimedia streaming [15].

(11)

Since MPEG-DASH was designed as a universal standard it supports both segment-container formats ISO base media file (used by HDS and MSS) and MPEG-2 TS (used by HLS). As a result, it is codec agnostic which means it can be encoded with any coding format.

2.1.5 Video Streaming Performance

The ABS technology has helped improving the performance of video streaming. In order to judge these improvements in performance certain performance metrics have to be selected. These metrics can then be analyzed to find a conclusion about the performance. According to Neotys [5], the 5 most important metrics that directly influence viewer experience are:

1. Bit Rate

Bit rate is equal to the number of bits of video transmitted over a period of time. The number of bits determines the quality of the video for the same resolution. As more bits are sent, more information is transferred and the player can display a video of higher quality. The bit rate is closely related to the throughput.

2. Buffer Fill

The buffer fill is the time it takes between the users video request and the start of the video. After the viewer requests a video the playback buffer has to be filled before the video can start playing. It has to be kept to a minimum as viewers do not want to wait too long. One way ABS has improved performance is by requesting fragments at the lowest bit rate to fill this buffer as quickly as possible.

3. Lag Length

The lag length is the accumulated time the video is halted during its playtime. When the playback rate of the video is higher than the throughput, the ABS buffer will be emp-tied by the time the client has downloaded the next segments. As a result the viewer must wait till the buffer is full.

4. Play Length

All the amount of data is streamed over a period time; the hours and minutes of video streamed is the play length. This is an important metric because it tells the video stream-ing service how much of their content is streamed, which is important for their scalstream-ing and their infrastructure.

5. Lag Ratio

The lag ratio is the ratio of waiting time compared to the watching time. It indicates how many minutes a user has waited for every minute of watching. Ideally, this ration is as low as possible because that would mean the video is played smoothly.

In complex media systems, all of these metrics are related to each other so ultimately, all of these indicators relate to throughput and latency. Throughput and latency are good ways for large video streaming services to measure both the quality and the scale of their service.

2.2 Virtualization

In virtualization a guest operating system runs on top of a host. It is an additional layer of abstraction on top of the host’s hardware or operating system. However, running a guest oper-ating system brings overhead. As a sophisticated alternative, containers have been developed. Virtualization where a guest operating system runs on a host is known as Hypervsor-based vir-tualiation and containers are used in Operating system-level virtualization. This secion will give a technical analysis of both types of virtualization.

(12)

2.2.1 Hypervisor-based Virtualization

In full virtualization a virtual machine is created to emulate a computer system. The virtual ma-chine separates the software from the hardware which allows the host to run a different operating system on the virtual machine. For example, a Windows computer can host a virtual machine running a Linux operating system. A virtual machine has its own virtual hardware and guest OS, which can be used to run applications and software. Hypervisor-based virtualization makes use of hypervisors to control the virtual machines. This is the layer on top of the hardware and host operating system. It communicates with the host and controls the resource allocation of the virtual machines running on the host. There are two types of hypervisors: type 1 and type 2. Type 1 hypervisors run directly on the host’s system hardware, whilst type 2 hypervisor run on top of a host operating system. Because type 1 hypervisors run directly on the hardware they lack the overhead of a host operating system, which is beneficial for performance.

Virtual machines run independently and separately. The host uses CPU pinning to dedicate one or more CPUs to a virtual machine instance. In CPU pinning a process or thread is bound to a CPU to ensure that the process only executes on the designated CPU(s). Virtual machines, also, offer isolation in terms of memory space, disk and operating system. However, virtual machines can still share a network interface with the host machine leading to a possible bottleneck. Even though there is a possibility that an exception in the guest corrupts the host operating system, it is likely that failures and security breaches only affect the virtual machine where they occur due to thel level of isolation. Virtual machines can easily be moved and reallocated between servers, which is ideal in the case of fickle resource demand. Popular hypervisors are, for example, the Xen hypervisor and the Kernel-based Virtual Machine (KVM).

Figure 2.3: A diagram with a virtual machine running on top of a type 2 hypervisor (source: https://docs.docker.com/get-started/#virtual-machine-diagram).

Xen

The Xen virtualization is based on paravirtualization [1]. Paravirtualization is a hypervisor-based virtualization where a VM is created that is similar but not identical to the underlying physical hardware. It is the alternative to full virtualization. In para-virtualization the guest operating system on the virtual machines are required to have some modification and the guest OS knows it is executing on a VM. In Xen virtualization there is one most privileged virtual machine, which can only access the hypervisor (virtual machine monitor). Through Domain 0 the hypervisor can be managed and additional virtual machines can be launched. The technology makes use of 4 levels of privilege modes. The hypervisor executes at level 0 and the guest operating system at level 1. The applications which are run in the virtual machines execute at level 3, resulting in a level of isolation between the applications and the guest OS.

KVM

The kernel-based virtual machine is a technology that turns the Linux kernel into a hypervisor. KVM gives the Linux kernel native virtualization capabilities [22]. It creates a third execution

(13)

mode; the guest mode. This mode has its own kernel which handles the memory management. It uses QEMU software for hardware emulation and CPU emulation for user-level processes.

2.2.2 Operating-system-level Virtualization

With operating-system-level virtualization, multiple virtual instances are created. However, in contrast to hypervisor-based virtualization, operating-system-level virtualization does not run its own guest operating system or kernerl. As a result, they can only be run with the same operating system as the host. Containers do not emulate hardware, but provide isolation and resource management, making efficient use of the available resources. A number of virtual ma-chines will exhaust the host’s resources more quickly than the same number of containers; as a result, containers are more suitable for scalability. Since they are not emulating a computer system, they are instantiated very quickly and many can be run on a single machine.

Some operating-system-level virtualization make use of the chroot mechanism. Chroot, change root, is a Linux command which changes the root directory of a process and its children to a different directory. Containers can use this chroot command to change the root directory of processes and share filesystems. However, this decreases the isolation between containers and with the host environment. Most virtualization techniques make use of cgroups and/or linux namespaces. Cgroups, control groups, are a linux kernel feature used for resource management. They can manage and limit the resources for a group of processes. Cgroups can control which groups have higher priority and receive more CPU. They can also freeze, checkpoint and restart groups of processes, very similar to the snapshot of a virtual machine. Linux namespaces are very similar to cgroups and are often used along with cgroups, since they provide the isolation and virtualization for groups of processes. There are 6 namespace types and each isolates a different resource: process id, mount for filesystems, network stack, interprocess communication, uts and user id. These namespaces together provide an isolation layer which creates the illusion that a container is a system on its own. The most common operating-system-level virtualization technologies are LXC, OpenVZ and Docker.

LXC

LXC is a virtualization technique that makes use of the linux kernel namespaces for isolated resources. These layers of namespaces consist of process and user IDs, mounting point for the file system and a network namespace for outward communication. These isolated resources are managed by cgroups, who also do the process control. The combination of this creates containers; an isolated environment where applications can be run.

OpenVZ

Similar to LXC, OpenVZ uses kernel namespaces to make sure every container has it own subset of isolated resources. It has four resource management components called User Beancounters (UBS), fair CPU scheduling, Disk Quotas and I/O scheduling [22]. Each of these components handles a certain resource of resource group. Together, they divide the CPU and memory usage for the containers. It handles the prioritization scheduling of the processes and schedules the I/O bandwidth.

Docker

Docker is an open source software project which automates the deployment of applications in software containers [3]. Like LXC, it uses linux features cgroups and namespaces to isolate resources. However, it extends the LXC like functionality with image management and Union File System capability. Docker doesn’t try to emulate a machine, but focuses more on the developer. As a result, it offers features such as deployment portability and repeatability. Docker containers are created using base images by executing commands manually or using Dockerfiles. A Dockerfile is a script consisting of commands which are performed on a base image to automatically create an image. Each of these commands forms a new layer and eventually form a union file system

(14)

layer. In the end, each layer describes how to recreate an action. Similarly to snapshot, Docker can also store and retrieve the state of the container.

Figure 2.4: A Docker container diagram

(Source https://docs.docker.com/get-started/#virtual-machine-diagram).

Type of Virtualization Example

Hypervisor type 1 Xen, Kernel-Based Virtual Machine (KVM) Hypervisor type 2 VMware, Virtualbox

Container Docker, LXC, OpenVZ

Table 2.1: Examples of differet types of virtualization

2.2.3 Virtual Machines vs Containers

Virtual machines have had a place in computer science for a long time and are still very prevalent in virtualization. The biggest advantage of virtual machines is that it allows users to run multiple operating systems on a single machine at the same time. It provides full isolation, which means different pieces of software can be run concurrently and safely, as any fault will not have an impact on other virtual machines or the host. The virtual machine, however, does bring overhead. It requires a significant amount of booting time, since every time the virtual machine is (re)started the entire OS needs to be booted as well. Because of these additional overhead costs a virtual machine will not deliver native performance. Also, the communication with the outside net-work has to go through two operating systems which increases the netnet-work stack. Subsequently, this will increase overall network performance thus decreasing throughput and increasing latency. More recently, however, containerization technologies have resurged and have gained popu-larity. Containers provide a lightweight solution. It runs using the host operating system which means it does not need to boot up an entire OS every time it is started. Containers share many resources and as a result, is a much smaller deployment than an entire virtual machine. Because of this the number of containers that can be run on a physical host is a lot higher than the number of virtual machines. They are also easier to migrate and to start and restart. Since containers make use of a lot of existing linux kernel features, the performance of containers is near native. One of these important kernel features is the namespace for the network stack. Containers use the host’s existing kernel network stack namespace which should not introduce extra network overhead. The downside, however, is that containers provide significantly less isolation which means that containers are less secure. In deployments where security and isolation is a key issue, virtual machines will be a better choice. The other advantage that virtual machines bring is the ability to run multiple operating systems on a single host where containers can only run on the

(15)

host operating system. However, since isolation and security is not a primary goal for a video streaming service it is expected that containers will be a better alternative to virtual machines.

2.2.4 Cloud Computing

Cloud computing has emerged as a paradigm to deploy and host services over the Internet [23]. In cloud computing, servers are pooled in data centers to provide computing resources and util-ities. It is a convenient way to provide resources such as CPU, storage and bandwidth to users anytime. These services are usually provided in three different models: Software as a Service (SaaS), Platform as a Service (PaaS), or Infrastructure as a Service (IaaS) [21]. Cloud computing is an excellent solution for industries where the resource demands are very dynamic. In video streaming user behavior is unpredictable and fickle, which means resource demand can vary from low to high. With cloud computing, the resources can be leased released by consumers which offers them great flexibility. Instead than creating their own infrastructure and wasting resources during off-peak hours, cloud computing reduces the video streaming services’ overprovisioning of resources.

Cloud computing uses virtualization to run multiple virtual machines or containers on the servers. As discussed earlier, virtualization eases the migration of resources and means multiple processes can be concurrently run on the same server. In recent years cloud services have become very popular and companies such as Google, Amazon and Microsoft have all built large data centers to provide their services. Each of their services focus on IaaS and offer computing and storage resources amongst a wide variety of products. An advantage of cloud computing is that the available virtual machines, or instances, have different underlying hardware. As a result, the consumers can choose how many virtual CPU’s (vCPU’s) or much memory their virtual instance needs to have depending on its use.

(16)

(17)

CHAPTER 3

Related Work

With the rising popularity of virtualized streaming services, an interest from the academic world has also emerged. Current literature focuses on analytical models to determine the performance of the services. Akhshabi et al. [2] introduce a new queuing network model to characterize viewing behaviors. It also introduces an algorithm to dynamically configure cloud resources. However, it does not discuss the streaming performance and focuses on the cloud. Virtualization does not only happen in the cloud, but can be run on any machine. Ma et al. [11] do research into MPEG -DASH streaming in the cloud. However, it focuses on a dynamic scheduling algo-rithm in the cloud, rather than looking at the influence of virtualization (or the cloud) on the performance of the stream in general.

The performance of virtual machines is analysed by Quetier et al. [13]. The research stud-ies several important metrics, including network performance. The article extensively compares several virtual machines, however it does not compare different virtualization techniques. Addi-tionally, the network performance analysis is only limited to response time to a HTTP request. Although, the reserach is more extensive compared to other literature the analysis of a video streaming service requires more research into overall throughput and latency over multiple con-nections.

In the paper by Soltesz et al. [16], containers are discussed as an alternative for hypervisors and virtual machines. The research is not focused on video streaming, however. Although it analyses the differences in performance, the performance of a video streaming service is different from a single machine’s performance. Rather than performance metrics about I/O and CPU, metrics about throughput and latency are more important. Sukaridhoto et al. [18] look into these metrics. This paper tests two different virtualization techniques with a few benchmarks. However, as the paper is from 2009 and in the world of virtualization this means their software is already outdated. Moreover, the benchmark does not generate an increasing load which means the analysis about performance under certain loads is not as extensive as it could have been.

Another similar research was done by Padala et al. [12], where the research compares OpenVZ containers with Xen virtual machines and a base Linux set up. The paper looks into the through-put and response time, the two main metrics this paper discusses. The results are promising and depict a significant overheard for the Xen virtual machines, whilst OpenVZ’s containers show performance similar to that of the base Linux system. However, this research tries provides in-sight into the performance of virtualization techniques on Adaptive Bit-rate streaming services, whereas Padala et al. used a benchmark simulating an auction website.

(18)

(19)

CHAPTER 4

Implementation of Experimental Testbed

The goal of this research is to analyse the performance of virtualized video streaming. Virtualiza-tion brings some overhead impacting the performance. Unlike other researches, this research will not focus on one virtualization technique or optimization, but on the performance of virtualized video streaming. Figure 4.1 shows the configuration of the experimental testbed used to conduct the experiments. The virtualization layer will be either a container or a virtual machine. There will also be experiments on bare metal, without virtualization. This testbed will be deployed in a dedicated environment and in a cloud environment.

Figure 4.1: Experimental testbed configuration.

4.1 Video Streaming Software

This research will use the Unified Origin software as its streaming software. Unified Origin is a software created by the company Unified Streaming. It is a software plug-in for webservers as Apache, Nginx, Microsoft IIS and Lighttpd [17]. It allows a webserver to package one input format into multiple output formats, such as HLS and MPEG-DASH. This research’ installation of Unified Origin effectively means launching an Apache webserver and storing the Origin soft-ware on the (virtualized) machine. It stores a video file with multiple bit-rate encodings and a video player. The Origin software then runs on top of the webserver and packages the video into several formats.

This software uses premier webservers as Apache in addition to popular ABS formats such as MPEG-DASH and HLS making this software the optimal choice for conducting experiments into video streaming. By packaging the videos dynamically, it eliminates the need for static storage by content distributors. The focus of this research will be on the MPEG-DASH and HLS formats as they are the most used ones. Furthermore, HLS employs larger segments and

(20)

uses MPEG-2 TS, a format known for larger overhead costs [14], which means it is expected to produce different results from MPEG-DASH.

4.2 Virtual Machine

The virtual machine is set up with Virtualbox with a type 1 hypervisor [19]. Virtualbox can be used to set up multiple virtual machines to be run on a physical machine. For this research an Ubuntu 16.04 virtual machine will be deployed on which the Unified Origin software will be run.

4.3 Container

For the other virtualization technique, a Docker container will be implemented. Docker is the world’s leading software container platform and can be used to run applications side-by-side in isolated containers [8]. A docker image will be created which can be run on machines to automatically deploy applications. These docker images are created through a Dockerfile and is essentially a lightweight, package of software. In these Dockerfiles, command lines are run to install the necessary packages for the application to run. By automatically installing all the necessary packages, the docker container image can be deployed on any machine as long as it supports the same operating system type. For this research a docker container is created with an image of the Unified Streaming Origin software.

4.4 Load Generator

A load generator is a tool which can be used to stimulate load for a performance test. It will send requests to another system or server thereby creating a load. Depending on this load the host system’s performance will change and in order to adequately judge these changes the tool will also monitor certain performance metrics. For this research the load generation tool Tensor will be used. Tensor is a tool created by Abe Wiersma at Unified streaming [20].

Tensor was based on the Netflix Vector open-source monitoring framework. It generates a load using the wrk benchmarking tool. Wrk is a HTTP benchmarking tool which generates a significant load on a single multi-core CPU [10]. The tensor web interface is hosted on an Apache web server and is used to display multiple graphs with the performance metrics. Wrk requires a URL, duration and a number of concurrent connections. In the tensor web interface an URL is requested which has to be in a video manifest format. This manifest file, which is an xml file, can be one of the Adaptive Bit-rate streaming formats supported by the Unified Streaming Origin software. To imitate a realistic video streaming session, Tensor integrates Unified Capture with a Lua script to parse the manifest file and send requests to multiple video segments stored on a server. Through the tensor web interface the wrk load generation can be determined. There are possibilities to choose a certain number of simultaneous connections and a selection of bit-rates a video is pulled from Unified Origin. The load generation is performed by creating multiple threads to perform HTTP requests. It increases the number of HTTP connections each 5 seconds. The web interface also offers an option to repeat a certain load multiple times before increasing the number of connections. By periodically incrementing the number of HTTP connections, the tool can be used to analyze at which load the streaming set up performs adequately and at which point it collapses.

The tensor tools measures 5 metrics: baseline ping, baseline throughput, wrk throughput, wrk segments and wrk segment latency. The baseline ping and throughput are established at the start and gives information about the network connection. The wrk then produces the load and periodically increments the http connections. The WRK Throughput indicates how many megabytes are send each second, whilst the WRK Segments tell how many video segments are being send each second. It also tells the number of errors per second, which are segments that are completely dropped. Lastly, the WRK Segment Latency displays the mean, minimum and

(21)

maximum latency for each segment. It also displays the standard deviation of the latency. Additionally, tensor also measures 4 host system measures: Host Network Throughput, Host CPU utilization, Host Disk IOPS and Host Memory Utilization. Tensor generates these metrics with Performance Co-Pilot (PCP), which is a system performance and analysis framework [7]. PCP collects data from sources, such as the kernel or the database. Tensor uses this to generate information about the host’s system and possible overhead in the measured areas.

4.5 Cloud setup

In addition to the dedicated experiments, there will also be a test setup deployed in the cloud. The goal of the experiments in the cloud is to give insight into the advantages cloud computing provides. By deploying the same setup on instances with different underlying hardware, the benefits of cloud services can be examined. Cloud services’ possibility to offer different instances with hardware, and differing costs, increases the flexibility of virtualization. This experiment aims to provide insight into the increase in performances of instances with more vCPU’s and more memory.

The open-source tool packer is used to create the virtual machine environment. It creates a machine image which can deployed in a virtual machine to install the necessary software. Amazon Web Services provide a wide range of different virtual machines with different hardware emulations. This makes it possible to analyze the virtual machine’s performance in different circumstances. In this research two Amazon Machine Instances will be created and deployed on the Amazon Elastic Cloud (EC2) service. One of these instances will run the Unified Origin software and the other instance will run the Tensor load generator. Both of these instances will run on EC2 in order to avoid losing performance because of network connectivity issues. By running both instances on the same EC2 location the connection will be more stable so the performance results are more accurate.

(22)

(23)

CHAPTER 5

Dedicated Experimental Results

The first set of experiments are conducted on a dedicated setup. The dedicated setup follows the configuration from figure 4.1; one machine running the Tensor load generator and another one running a video service with the Unified Origin software. As shown in figure 5.1, these machines are only connected with each other through a crosscable to eliminate any interference from the outside. By creating an isolated network between the two machines the differences in performance of the virtualized video can only be attributed to difference in experiments. There will be two sets of experiments; a virtualization technique comparison and a bit-rate comparison.

Figure 5.1: Setup connecting two dedicated machines with a crosscable.

Left Machine Right Machine

Hosting software Tensor on Apache2.4.7 Unified Streaming Package 1.7.28 on Apache2.4.7

OS Ubuntu 16.10 Ubuntu 14.04.5 LTS

CPU 4 core @ 2.6 GHz (Intel Core i7) 8 core @ 2.40 GHz (Intel Core i7-3630QM)

RAM 16 GB 5.8 GB

Network Card Realtek USB GbE Qualcomm Atheros AR8161 PCI-E

(24)

5.1 Saturation

Wrk aims to maximize the throughput by sending as many requests as possible. Maximizing the throughput means the connection reaches a saturation point where the throughput will not increase anymore. These experiments will increase the number of HTTP connections until it reaches the point of saturation. Each HTTP connection will be repeated 5 times before it is incremented. Figure 5.1 shows the increase in throughput for each additional HTTP connection until it stagnates at the saturation point. The resulting throughput and the matching latency will be used to do the performance analysis on the different experiments.

Figure 5.2: Example of increasing throughput reaching saturation point.

5.2 Virtualization Experiment

The experiment will be carried out on the dedicated setup with both a MPEG-DASH format-ted video stream and a HLS formatformat-ted video. It starts with one HTTP connection requesting segments at all bitrates from Unified Origin. Every 25 seconds the number of HTTP connec-tions increases. At 7 HTTP connecconnec-tions the experiment finishes as steady throughput has been reached and the connection is saturated. In order to get reliable, accurate results this experiment is repeated 4 times. In order to be sure that the saturated throughput has been reached, the experiments on a virtual machine are executed until 13 HTTP connections as the connection proved to be less stable.

Additionally, this experiment will also conduct research into the difference between the MPEG-DASH and HLS formats under different virtualization techniques. This can be used to test the hypothesis that HLS produces more overhead with its MPEG-2 TS container mentioned in section 4.1.

(25)

5.2.1 Bare Metal

MPEG-DASH

Figure 5.3: Throughput in MB/s (HTTP connections increase every 25s until 7 connections).

Figure 5.4: Latency in ms (HTTP connections increase every 25s until 7 connections).

HLS

5.2.2 Virtual Machine

MPEG-DASH

(26)

HLS

5.2.3 Container

MPEG-DASH

HLS

(27)

Virtualization Technique ABS format Saturated Throughput Average latency Peak Latency

Bare Metal MPEG-DASH ∼ 110MB/s ∼ 9ms ∼ 16ms

HLS ∼ 114MB/s ∼ 13ms ∼ 23ms

Virtual Machine MPEG-DASH ∼ 90MB/s ∼ 27ms ∼ 120ms

HLS ∼ 105MB/s ∼ 27ms ∼ 46ms

Containers MPEG-DASH ∼ 110MB/s ∼ 11ms ∼ 16ms

HLS ∼ 114MB/s ∼ 14ms ∼ 22ms

Table 5.2: Amazon Web Services Instances performance for 1-10 HTTP connections.

5.3 Bit-rate Experiment

This setup is similar to the setup for the virtualization comparison, only this time Unified Origin is only run in a Docker Container. This experiment continues until 10 HTTP connections are open, since at that point throughput does not increase anymore and the connection is saturated. The bit-rate comparison is done with the MPEG-DASH format by requesting the two lowest bit-rates, 64 and 128kbit, and the two highest bit-rates, 1989 and 2997 kbit. Tensor works by requesting segments at all bit-rates which is not an identical imitation of adaptive bit-rate stream-ing. The goal of the experiment is to provide insight into stream behavior with different bit-rates.

5.3.1 Low Bit-rate

5.3.2 High Bit-rate

(28)

(29)

CHAPTER 6

Cloud Experimental Results

6.1 Saturation

Similar to the dedicated experiments, saturation will also occur in the online cloud results due to the nature of wrk. However, since the cloud runs in an online environment the connection between the load generator and the streaming service is less stable. Figure 6.1 shows that despite this less stable connection the throughput still reaches a saturation point after which the throughput never increases for a long time. There are exceptional peaks and valleys but the throughput always reverts back to the saturatio point.

Figure 6.1: The throughput of the cloud instance increases until the saturation point.

6.2 Cloud Instances

For the cloud experiments, the virtual instances with the software will be deployed in the Amazon Elastic Compute Cloud (EC2) in Ireland. Amazon’s virtual instances are type 1 hypervisors which theoretically should achieve better results than type 2 hypervisors such as the Virtualbox environment. The two instance types that will be used are the M3 and the C3. According to Amazon, the M3 instances are for general purpose, whilst the C3 instances are compute optimized. Within these instance types there are instance options with increasing vCPU’s and increasing memory. Similarly to the bit-rate comparison experiment, this experiment runs until 10 open HTTP connections and uses the MPEG-DASH format. Since Tensor creates threads and uses a lot of computational power for its load generation the tensor tool will be installed on the largest compute optimized instance, c3.8xlarge.

(30)

Instance Type Model vCPU Memory (GiB) SSD Storage (GB) General Purpose m3.medium 1 3.75 1 x 4 m3.large 2 7.5 1 x 32 m3.xlarge 4 15 2 x 40 m3.2xlarge 8 30 2 x 80 Compute Optimized c3.large 2 3.75 2 x 16 c3.xlarge 4 7.5 2 x 40 c3.2xlarge 8 15 2 x 80 c3.4xlarge 16 30 2 x 160 c3.8xlarge 32 60 2 x 320

Table 6.1: Amazon Web Services Instances.

M3.medium

(31)

M3.large

Figure 6.7: CPU utilization (HTTP connections increase every 25s until 10 connections).

M3.xlarge

(32)

M3.2xlarge

C3.large

(33)

C3.xlarge

C3.2xlarge

Figure 6.20: Throughput in MB (HTTP connections increase every 25s until 10 connections).

(34)

C3.4xlarge

C3.8xlarge

(35)

Instance Type Model Steady state Through-put Mean Latency Start Mean La-tency End Mean Latency Peak Steady State CPU utilization General Purpose m3.medium ∼ 36MB/s ∼ 9ms ∼ 56ms ∼ 240ms ∼ 18% m3.large ∼ 84MB/s ∼ 17ms ∼ 27ms ∼ 204ms ∼19% m3.xlarge ∼ 121MB/s ∼ 7ms ∼ 19ms ∼ 234ms ∼12% m3.2xlarge ∼ 121MB/s ∼ 3ms ∼ 18ms ∼ 231ms ∼ 5% Compute Optimized c3.large ∼ 60MB/s ∼ 4ms ∼ 35ms ∼ 320ms ∼ 16% c3.xlarge ∼ 85MB/s ∼ 2ms ∼ 57ms ∼ 260ms ∼ 8% c3.2xlarge ∼ 122MB/s ∼ 1.5ms ∼ 19ms ∼ 231ms ∼ 5% c3.4xlarge ∼ 243MB/s ∼ 1.5ms ∼ 10ms ∼ 39ms ∼ 5% c3.8xlarge ∼ 610MB/s ∼ 1.5ms ∼ 3.5ms ∼ 25.5ms ∼ 5%

Table 6.2: Amazon Web Services Instances performance for 1-10 HTTP connections. Video streaming services pay Amazon a fee per hour for each virtual instance1_{. Combining}

the throughputs for each instances in table 6.2 with the prices found in the footnote the price per throughput can be calculated.

Instance Type Model Cost per throughput

General Purpose m3.medium 2.03$/GB/s m3.large 1.74$/GB/s m3.xlarge 2.42$/GB/s m3.2xlarge 4.83$/GB/s Compute Optimized c3.large 2.00$/GB/s c3.xlarge 2.81$/GB/s c3.2xlarge 3.92 $/GB/s c3.4xlarge 3.93 $/GB/s c3.8xlarge 3.13 $/GB/s

Table 6.3: Amazon Web Services Instances price per throughput.

6.3 Adaptive Bit-rate Streaming Format

This experiment is similar to the dedicated experiment, however, this experiment will be looking into the differences between the HLS and MPEG-DASH protocol in the cloud. The results of this experiment can illustrate differences between the dedicated and cloud setup as well as provide additional insight into HLS and MPEG-DASH performance. For this experiment Tensor will still be run on the c3.8xlarge instance and the m3.xlarge instance model will be used for the video streaming since the results in the previous section show that it performs the very well considering its price. The only cheaper options in terms of dollar per GB throughput have inferior hardware and achieve significantly lower throughput.

(36)

6.3.1 MPEG-DASH

6.3.2 HLS

(37)

6.4 Bit-rate

The Bit-rate experiment will also be executed in the cloud. This can help explain the peaks occuring at the start of the cloud experiments and to find any differences with the dedicated experiment. Since HLS and MPEG-DASH use different bit-rates and by extension different segment sizes, the experiments will be conducted with both formats. For MPEG-DASH the lowest bit-rates are 64 and 128kbit, and the two highest bit-rates are 1989 and 2997 kbit. For HLS the lowest bit-rates are 68 and 134 kbit and the highest bit-rates are 2245 kbit and 3313 kbit. Due to its cost-performance efficiency, the m3.xlarge instance will once again be used alongside the c3.8xlarge Tensor instance.

6.4.1 Low Bit-rate

MPEG-DASH

(38)

HLS

6.4.2 High Bit-rate

MPEG-DASH

Figure 6.41: Throughput in MB/s (1-10 connections) Figure 6.42: Latency in ms (1-10 connections)

(39)

HLS

(40)

(41)

CHAPTER 7

Conclusions

The research in this paper can be divided into two parts; the impact of different virtualization techniques and the possibilities cloud computing offers. Within the former part this paper also looks how both MPEG-DASH and HLS video formats perform as they are the most used formats. The results in table 5.2 show a nearly identical performance between the Containerized setup and the bare metal setup. However, there is a notable difference in performance for the virtual machine setup. For MPEG-DASH the virtual machine’s throughput is 20 MB/s lower and for HLS this difference is 9 MB/s. In both cases the average and peak latency is higher than the other setups. There is also difference in behavior between the MPEG-DASH stream and the HLS stream in virtual machines. For MPEG-DASH the connection becomes less stable as the number of HTTP connection increases whereas the HLS maintains its steady throughput.

The results of the bit-rate experiments reveal a significant difference in the start-up latency between high bit-rate and low bit-rate streams. At 1 HTTP connections, a high bit-rate stream shows a peak of a mean latency of 363 milliseconds, whereas low bit-rate stream does not show a peak at all and the latency only gradually increases as the number of HTTP connections rises. Table 6.2 shows the influence of different cloud instances on the performance. Within one instance type an increase in vCPU, memory and SSD storage leads to a performance improve-ment in throughput, latency and CPU utilization. The C3 instances show lower latency and lower CPU utilization than the M3 instances, but similar throughput for instances with similar resources. There are two C3 instances with more resources than any of the M3 instances. Both have significantly higher throughput with lower latency and similar CPU utilization. However, the price for these compute optimized instances are higher. Table 6.3 shows that the general purpose M3 instances on average offer lower cost per throughput for similar instances. Judging on the experiments the m3.large instance is the most cost effective instance.

7.1 Discussion

7.1.1 Suggestions for Findings

The purpose of this research was to explore the influence of different virtualization techniques on the performance of a video streaming service. Based on the results, there seems to be a per-formance overhead for hypervisor-based virtualization. This matches earlier research of Pradeep et al. As expected, the docker container achieved near-native performance and brings no over-head at all in this research’ setup. Taking this into account, containers seem like an excellent virtualization solution since a large number of containers can be run on one physical server and they hardly offer any overhead to a native system.

(42)

Both the dedicated virtualization experiments and the cloud experiments show a high latency peak at the start of the experiment. At the start of a video stream adaptive bit-rate stream tech-nology first requests low bit-rate video to fill the buffer as quickly as possible. After the buffer is filled and the client can start playing the video it will start requesting higher bit-rate segments. However, for its load generation Tensor requests bit-rates at all segments. Although this sim-ulates multiple clients with different bandwidth who will request different bit-rates, it does not perfectly simulate the ABS technology. To test this hypothesis behind the early latency peak, a bit-rate comparison experiment was conducted. The results show a clear peak at the start of the high bit-rate experiment, as opposed to the low bit-rate experiment where this peak is completely absent. This supports the hypothesis that the early peaks are created by requesting high bit-rate segments at the start of a HTTP connection. HTTP requests are executed with the TCP/IP protocol. This protocol suffers from the sliding window concept which means the receiver can only receive a limited amount of data at the start. A high bit-rate means more data thus larger segments and the receiver can not accept all this data at the start leading to a high latency peak.

The cloud computing experiments show the flexibility and improvement cloud computing services can offer. This experiment conducted research into the specialized instances Amazon Web Services offer as well as their general purpose instances. Since video streaming can demand a lot of computing resources, optimized hardware can be beneficial. The compute optimized C3 instances show an improvement in both CPU utilization and latency in comparison to the general purpose M3 instances. Additionally, AWS offers virtual instances with multiple vCPU’s and high SSD storage. As the results show, this increase in resources can lead to a high increase in overall throughput.

The bit-rate comparison in the cloud shows different results than the dedicated experiments. Where in the dedicated experiments the low-bit rate stream shows no peaks at the start, the cloud experiments show peaks at the start of both high and low bit-rate streams. Since the network connection and therefore latency is overall less stable in the cloud is difficult to pinpoint an explanation for this phenomenon. Regarding any differences between the MPEG-DASH and HLS experiments the cloud performs similarly to the dedicated setup in the sense that HLS has a higher throughput. The only exception is the bit-rate comparison. Here the results show a significant difference between MPEG-DASH and HLS as the MPEG-DASH setup collapses in the low bit-rate. This can be explained by the smaller segments used by MPEG-DASH and wrk’s method of sending many requests to maximize throughput. For a low bit-rate stream with small segments this means wrk sends out many HTTP requests for segments which overloads the connection and causes it to collapse. HLS uses larger segments, therefore, avoiding the collapse and maintaining a stable throughput. This can also be the explanation behind MPEG-DASH’s throughput fall witnessed in figure 5.7. The connection with the virtual machine can have a lower quality and not be able to handle the high number of segments that are sent for a MPEG-DASH stream.

7.1.2 Future Work

This research did not look into the scalability or isolation of virtualization. These two charac-teristics are often cited as the two biggest advantages of virtualization, but were not a focus of this research.

This paper proved containers are a sophisticated virtualization solution. Since containers share kernel resources, future research into neighboring effects can shed more light onto con-tainer’s influence on performance. One other advantage of virtual machines and containers is that they can be taken down during off-peak hours to minimize resource waste and cost. How-ever, this means that during peak hours they also have to be booted up. The exact amount of time wasted during this process and how to possibly optimize this is another suggestion for future research.

(43)

In this research, the dedicated experiments were executed with only one virtual instance running on the machine. Future work could look into running multiple virtual instances on the same physical server as that increases overall system load which can impact the performance of these virtual instances.

In terms of cloud experimentation, there is a lot of room for additional experimentation. Amazon has multiple server location in different locations offering different types of instances. The network connectivity is different for each location which impacts the overall throughput and latency. As mentioned previously, video streaming services experience peak hours during which a lot of resources are requested. This can also apply to cloud services resulting into a higher load on the connection to the cloud. Additional research into the EC2 connection throughput and latency at relatively busy and relatively quiet periods of the day can be useful for video streaming services.

(44)

(45)

Bibliography

[1] Tim Abels, Puneet Dhawan, and Balasubramanian Chandrasekaran. An overview of xen virtualization. Dell Power Solutions, 8:109–111, 2005.

[2] Saamer Akhshabi, Ali C. Begen, and Constantine Dovrolis. An experimental evaluation of rate-adaptation algorithms in adaptive streaming over http. In Proceedings of the Second Annual ACM Conference on Multimedia Systems, MMSys ’11, pages 157–168, New York, NY, USA, 2011. ACM.

[3] David Bernstein. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing, 1(3):81–84, 2014.

[4] Netflix Technology Blog. Four reasons we choose amazon’s cloud as our computing platform. http://techblog.netflix.com/2010/12/ four-reasons-we-choose-amazons-cloud-as.html, Dec 2010. Last visited on 08-06-2017.

[5] Tim Hinds Blog. Top 5 metrics for streaming video performance. http://www.neotys. com/blog/top-5-metrics-for-streaming-video-performance, Mar 2014. Last visited on 08-06-2017.

[6] Cisco. White paper: Cisco vni forecast and methodology, 2015-2020. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/

visual-networking-index-vni/complete-white-paper-c11-481360.html, Aug 2016. Last visited on 03-06-2017.

[7] Performance Co-Pilot. http://pcp.io/docs/pcpintro.html. Last visited on 03-06-2017. [8] Docker. https://www.docker.com/what-docker. Last visited on 28-05-2017.

[9] Andrew Fecheyr-Lippens. A review of http live streaming. Internet Citation, pages 1–37, 2010.

[10] Will Glozer. Wrk modern http benchmarking tool. https://github.com/wg/wrk, 2015. Last visited on 07-06-2017.

[11] He Ma, Beomjoo Seo, and Roger Zimmermann. Dynamic scheduling on video transcoding for mpeg dash in the cloud environment. In Proceedings of the 5th ACM Multimedia Systems Conference, MMSys ’14, pages 283–294, New York, NY, USA, 2014. ACM.

[12] Pradeep Padala, Xiaoyun Zhu, Zhikui Wang, Sharad Singhal, Kang G Shin, et al. Per-formance evaluation of virtualization technologies for server consolidation. HP Labs Tec. Report, 2007.

[13] Benjamin Qu´etier, Vincent Neri, and Franck Cappello. Scalability comparison of four host virtualization tools. Journal of Grid Computing, 5(1):83–98, 2007.

[14] Haakon Riiser, P˚al Halvorsen, Carsten Griwodz, and Dag Johansen. Low overhead container format for adaptive streaming. In Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, MMSys ’10, pages 193–198, New York, NY, USA, 2010. ACM.

(46)

[15] Iraj Sodagar. The mpeg-dash standard for multimedia streaming over the internet. IEEE MultiMedia, 18(4):62–67, 2011.

[16] Stephen Soltesz, Herbert P¨otzl, Marc E. Fiuczynski, Andy Bavier, and Larry Peterson. Container-based operating system virtualization: A scalable, high-performance alternative to hypervisors. SIGOPS Oper. Syst. Rev., 41(3):275–287, March 2007.

[17] Unified Streaming. http://www.unified-streaming.com/products/unified-origin. Last visited on 03-06-2017.

[18] Sritrusta Sukaridhoto, Nobuo Funabiki, Toru Nakanishi, and Dadet Pramadihanto. A com-parative study of open source softwares for virtualization with streaming server applications. In Consumer Electronics, 2009. ISCE’09. IEEE 13th International Symposium on, pages 577–581. IEEE, 2009.

[19] Virtualbox. https://www.virtualbox.org/wiki/VirtualBox. Last visited on 08-06-2017. [20] Abe Wiersma. Determining meaningful metrics for abr http video delivery. 2016. Last

visited on 08-06-2017.

[21] Yu Wu, Chuan Wu, Bo Li, Xuanjia Qiu, and Francis CM Lau. Cloudmedia: When cloud on demand meets video on demand. In Distributed Computing Systems (ICDCS), 2011 31st International Conference on, pages 268–277. IEEE, 2011.

[22] Miguel G Xavier, Marcelo V Neves, Fabio D Rossi, Tiago C Ferreto, Timoteo Lange, and Cesar AF De Rose. Performance evaluation of container-based virtualization for high per-formance computing environments. In Parallel, Distributed and Network-Based Processing (PDP), 2013 21st Euromicro International Conference on, pages 233–240. IEEE, 2013. [23] Qi Zhang, Lu Cheng, and Raouf Boutaba. Cloud computing: state-of-the-art and research

Performance Analysis of Virtualized Video Streaming Service

Bachelor Informatica