Extending ManyMan with additional back-ends for big.LITTLE and Parallella

(1)

Bachelor Informatica

Extending ManyMan with additional

back-ends for big.LITTLE and

Parallella

Floris Turkenburg

June 17, 2015

Supervisor(s): Roy Bakker (CSA, UvA)

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

As the demand for high performance, yet power efficient, processors still increases, new many-core systems and architectures are rapidly being developed. For research purposes, understanding of the capabilities of these systems and comparing them is important, which generally requires tools for testing and monitoring. It is desired to have one tool that can be used on all systems, instead of separate tools for every system. With this in mind, ManyMan, an interactive visualization and dynamic management tool for many-core systems, has been extended for use on a big.LITTLE system and the Parallella-16 system.

The ManyMan developed for the big.LITTLE has been found to be easily modifiable for use on regular Linux systems, such as laptops. The Epiphany chip on the Parallella does not allow easy monitoring or management, accordingly, there is still some room left for improvement in the Parallella ManyMan.

With this expansion, ManyMan has taken its first steps towards more global use in many-core systems.

(4)

Introduction

With the ever continuing demand and need for more processing power, and the desire of men to push computers to their limits, researchers have always been looking for ways to improve comput-ers and their performance. One way of achieving increased performance is increasing the amount of transistors on a computer-chip, which could be realized by the technological progressions in decreasing the transistor size. Following Moore’s Law, which states that the amount of transis-tors on a single chip doubles every 18 months, this way of increasing computing performance has been the main trend for years. However, about ten years ago, researchers and manufacturers encountered a different barrier. The tiny transistors are harder to regulate in terms of power, and thus, are becoming less power-efficient. Also the increase in clock-frequency of the processor, which has also been a means to improve performance, requires more power to be supplied. With more power, comes more heat, heat which can not be sufficiently drained from the dense chips as the cooling technology does not improve fast enough [23].

In order to continue improving performance, researchers and chip manufacturers have turned to multi- and many-core systems. These systems contain many smaller cores on a chip instead of a small number of big cores. Besides the increase in performance, these systems also provide improvements in terms of power consumption. Different types of cores can be integrated on one chip to match the needs of different usage models, idle cores can be powered down to save power, and load can be balanced better to distribute heat across the chip, improving reliability and power leakage [18, 14].

With new many-core systems being developed frequently, it is important to be able to get an understanding of the system and how it works. However, despite all these new many-core systems being developed, there is a lack of tools available for users to visualize and monitor the many-core system. Therefore, ManyMan has been created by Jimi van der Woning in 2012 [24]. ManyMan is a tool that offers interactive visualization and dynamic management of many-core systems, initially developed for the 48-core Intel Single-chip Cloud Computer (SCC) [19]. It provides the user with information such as CPU usage and memory usage on chip-, core- and processlevel. Not only does ManyMan combine all this different information into one tool, it also gives the user the ability to manage the many-core system, such as the starting of tasks on specified cores and frequency scaling. These properties make ManyMan a tool that makes the underlying hardware more accessible to the user, and is of great use in the process of under-standing and testing the capabilities of the many-core system. To improve the user’s experience, ManyMan has been optimized for usage on a multi-touch supported device, while it can still be controlled using a mouse and keyboard.

In the field of research and education, it is often relevant to discover and compare the capa-bilities of multiple types of many-core systems or boards. Facilitated by the current many-core trend, more and more systems become available. If every board or system provides its own tool for managing and monitoring the many-core, researchers and/or students do not only need to get an understanding of the many-core system, but also need to learn (to work with) the tool

(6)

in order to do so. This costs time and effort best spent otherwise, and having to work with all kinds of different tools can be a nuisance. Therefore, having one general tool that can be used for different many-core systems is desired. One tool to manage them all, a ‘many many-core man-ager’. Setting out to develop such a tool, the goal of this project is to explore the capabilities of extending ManyMan to support additional many-core systems by developing ManyMan software for two many-core systems. The many-core systems in question are the big.LITTLE [16] and the Parallella [11], which are both available at the CSA group of the University of Amsterdam for this project.

For this project, the following items are of importance:

• How to retrieve the details and real time information from the many-core systems. • Processing of this information for use in the visualization.

• Controlling the many-core system remotely through the front-end.

• Modification of the ManyMan front-end to properly represent the targeted many-core sys-tem.

This thesis will describe the software that has been developed in order to extend ManyMan for use on the big.LITTLE and Parallella. The many-core systems will be shortly described and future improvements of ManyMan will be proposed. In the scope of this thesis, “the big.LITTLE” or “the big.LITTLE system” will refer to the ODROID-XU3 board/system (see section 3.1) used for this project, unless specified differently.

1.1 Outline

In chapter 2, some related works are discussed. The hardware that is used for this project is described in chapter 3, followed by the relevant software tools in chapter 4. The back- and front-ends that have been made for the big.LITTLE and Parallella are described in chapter 5 and are evaluated in chapter 6. Finally, the conclusions of the project are discussed in chapter 7, ending with suggestions for future work in chapter 8.

(7)

CHAPTER 2

Related work

Despite the rapid development in many-core systems and architectures, monitoring-, visualization-and management tools visualization-and software for these systems is still scarce. Van der Woning already discussed several available tools in [24], and their advantages and disadvantages. The following sections will add a few tools to the list.

2.1 Gpfmon

Gpfmon[5] is a graphical front-end to pfmon, a performance monitoring tool originally developed for the Linux 2.6 kernel by Hewlett-Packard and CERN. Gpfmon provides a convenient and user friendly way to launch pfmon/perfmon2 [15] monitoring sessions, providing an advantage on top of pfmon to both less advanced users and advanced users requiring visualization capabilities. Not only does the tool relieve users from writing 250-character long command lines, it also provides visual aid in event selection, plots and project + monitoring session management and comparison. With this tool, one can get a visualized representation of the collected information about the performance of a system or application, such as stall cycles, TLB misses and memory access latency, which is retrieved from the Performance Monitoring Unit (PMU) by perfmon2/pfmon. Gpfmon also supports remote monitoring sessions via SSH, lifting the burden of running the GUI from the monitored machine, and enabling the monitoring of machines on a less robust network, which for example might not support X-forwarding [20]. Some features of gpfmon are shown in figure 2.1.

However, the information that this tool provides is too extensive and too detailed for Many-Man purposes. Furthermore, it does not allow (easy) tasks management such as task migration, and simultaneous system wide monitoring and per task/thread monitoring is not possible.

(8)

Figure 2.2: The ODROID-XU3 EnergyMonitor tool

2.2 EnergyMonitor

The EnergyMonitor is a tool provided by Hardkernel [6] to monitor the power consumption of the ODROID-XU3 board. This tool monitors the voltage, watts and amperes for the big cores (A15), the LITTLE cores (A7), the GPU and the DRAM separately (see figure 2.2). The values for the power consumption are obtained from the power sensors which are integrated on the board. Besides the power statistics, the tool also displays the current CPU/GPU frequencies per processor and the temperatures for the big cores and the GPU. This tool has been developed for easy access to the power statistics of the board, aiding developers and users in their process of debugging for power consumption on the ODROID-XU3 board. As this tool is limited to solely displaying the statistics and does not offer any interaction, it does not meet the requirements of ManyMan, and has not been used as is. It has however been a reference point when implementing the retrieval of power consumption in the ManyMan back-end for the big.LITTLE.

(9)

CHAPTER 3

Hardware

For this project the existing ManyMan tool is extended to support use on two additional many-core architectures. These are the big.LITTLE [16] and the Parallella [11] and will be discussed in sections 3.1 and 3.2 respectively.

3.1 big.LITTLE

In 2011, ARM announced their big.LITTLE technology which served as one of their answers to the demand for more performance but also better power efficiency of mobile devices. The big.LITTLE architecture provides big processors for maximum compute performance paired with LITTLE processors for maximum power efficiency. The two types are fully coherent and have the same instruction set architecture (ISA). This allows the same instructions or program to be executed on both processor type in a consistent manner, facilitating in easy task migration between the big and LITTLE cores.

3.1.1 The board

The specific board that is used for this project is the ODROID-XU3, displayed in figure 3.1 and 3.2. The main component of this board is the Samsung Exynos 5422 Application Processor. This System-on-Chip contains a CortexTM_{-A15 2.0GHz quad core with 2MB L2-Cache (the big}

cores) and a CortexTM-A7 1.4GHz quad core CPU with 512KB L2-Cache (the LITTLE cores). Also on this chip, are the ARM Mali-T628 MP6 600MHz GPU and 2GB LPDDR3 RAM which runs at 933MHz. An overview is shown in figure 3.3.

The board includes 5 USB Host ports (4x USB2.0, 1x USB3.0), a 10/100 Ethernet port, MicroSD and eMMC connectors for storage and boot, and a Micro HDMI connector. A 5V/4A DC adapter must be used to power the board. The board came with a fan which is mounted over the Exynos 5422 for cooling and can be controlled through the PWM cooling fan connector.

(10)

Figure 3.2: The ODROID-XU3 board (topview)

(11)

3.1.2 Power sensors

Integrated on the board are four current and power monitors, placed on the I2_{C buses to the}

A15 quad core, A7 quad core, GPU and DRAM. These monitors are INA231 Current/Power monitors from Texas Instruments [7], requiring a power supply of 2.7V to 5.5V for operation. Each monitor can report the current (in amperes), power (in watts) and voltage (in volts) on buses that vary from 0V to 28V. Having a maximum Gain Error of 0.5%, the INA231 offers high accuracy in the measurements. The INA231 is specified to operate in temperatures ranging from -40◦Cto +125◦C, which is realistic for the scope of this project (and in most other cases).

3.2 Parallella

In 2008, Andreas Olofsson, a processor developer/designer founded Adapteva with the mission to create an easy programmable general purpose floating-point processor which would have a more than 10 times better energy efficiency than legacy CPU architectures. The new architecture had to be scalable to thousands of cores, easy programmable in ANSI-C, have a high raw performance (2 GFLOPS/core), be implementable by a small team of engineers and reach an energy efficiency of 50 GFLOPS/W. A year and a half later, in 2009, Olofsson announced the Epiphany many-core architecture. This architecture “was a clean slate design based on a bare-bones floating-point RISC instruction set architecture (ISA) and a packet based mesh Network-On-Chip (NOC) for effectively connecting together thousands of individual processors on a single chip.” [22] targeting a power consumption of 1W per 50 GFLOPS of performance.

In May 2011, Adapteva introduced their first Epiphany based product, a 16-core 32 GFLOPS chip (E16G301). A few months later in August 2011, Adapteva released a 64-core design (the E64G401) which could achieve a performance of 50 GFLOPS/W (and even 70 GLOPS/W with-out IO).

Being a new company and not having established a big community yet, Adapteva did not gain much position on the market with their Epiphany chips, despite them being the most energy-efficient floating-point processors available. In order to establish a community around the Epiphany and to finance further development, Adapteva started a Kickstarter project [12] in September 2012, named “Parallella”. Within a month, Adapteva had raised close to 1M USD from almost 5,000 project backers. In June 2014, all of the Parallella boards promised to the backers were delivered.

Figure 3.4: The Parallella-16 Zynq 7020

3.2.1 The board

The Parallella-16 board is a fully open-source credit-card sized computer containing a 16-core Epiphany E16G301 coprocessor, a Xilinx Zynq 7010/7020, and 1 GB of RAM (see figure 3.4 and 3.5). The Xilinx Zynq is a System-on-Chip (SoC) containing two ARM CortexTM_{-A9 processor}

cores and FPGA logic. A Gigabit-Ethernet port, USB port and MicroHDMI port are present on the board, along with a MicroSD card slot from which the board is booted. To power the board, either a 5V DC barrel connector or MicroUSB can be used. On the back/bottom of the board,

(12)

Figure 3.5: Parallella-16 board top view

four expansion connectors are placed to provide access the power supply, I2_{C, UART, GPIO}

and JTAG interface, and to the Epiphany chip eLink interface. The eLink interface is used to exchange data between the Epiphany coprocessor and the ARM core, and is implemented in the FPGA logic on the Zynq. A block of 32 MiB memory is shared between the ARM cores and the Epiphany by default. The Epiphany system is shown in more detail in figure 3.6.

(13)

3.2.2 The Epiphany chip

The Epiphany E16G301 is a System-on-Chip containing 16 superscalar floating-point RISC CPUs (eCore). Each eCore is capable of two floating-point operations per clock cycle and one inte-ger calculation per clock cycle. The CPU is efficiently programmable in C/C++ and has a general-purpose instruction set, specialized for compute intensive applications. The Software Development Kit that provided to program for the Epiphany is described in section 4.3. The memory architecture is a flat and unprotected memory map, providing up to 1MB of local mem-ory for each core. Each core can access its own local memmem-ory, other cores’ memories, and shared off-chip DRAM. The local memory per core is comprised of four separate banks to support simultaneous instruction fetching, data fetching, and multicore communication.

The communication in the Epiphany chip is supported by the eMesh Network-on-Chip (NoC), which consists of three independent 2D scalable mesh networks, one for off-chip write trans-actions, one for on-chip write transactions and one for read requests.

(14)

CHAPTER 4

Software

In order to extend the ManyMan tool, some additional software was required. First, and most importantly, a newer version of the Kivy framework was used for the front-end, this is discussed in section 4.1. Second, section 4.2 describes the changes that have been made in the ODROID-XU3 kernel to allow for frequency scaling. In order to write programs for the Epiphany coprocessor, Adapteva provides the Epiphany SDK (eSDK), which is discussed in section 4.3.

4.1 Kivy

The ManyMan front-end is completely written in Python and uses the Kivy framework [8] to build up the application. Kivy is an open source project and provides good support for multi-touch purposes. It is highly portable and can be run not only on standard operating systems such as Linux, Windows and Mac OS, but also on mobile operating systems like Android and IOS, without needing to change the source code of the application. For these reasons, Kivy is a suitable candidate for the ManyMan tool.

The original ManyMan supported Kivy version 1.2.0, however, now three years later, Kivy has reached version 1.9.0. This version has been used in the scope of this project. The upgrade to this newer version of Kivy required a few changes in the front-end due to name changes and deprecated properties. For example, the image name popup-background has been changed to modalview-background and updates for the text-input widget and the introduction of Focus-Behaviour caused the self-made extension of this widget by the original writer of ManyMan to become obsolete, and even unusable. Besides these few changes that concern the ManyMan tool directly, the most important changes in Kivy are internal. Many new features have been added to the framework and bugs have been fixed in order to provide for a better user-experience. For a detailed overview of the changes made to Kivy, one can consult their website [21]. As mentioned before, Kivy is an open source project, and as such, it also provides a platform for users to submit their home-made classes or improvements and extensions to the framework. This highly contributes to the development of Kivy and their aim to provide users with the best experience and broad possibilities.

4.2 The big.LITTLE kernel

The big.LITTLE system used in this project runs on the Linux kernel made for the ODROID-XU3 by Hardkernel [10]. The default kernel configuration however, does not support manual CPU-frequency scaling, which is required for the ManyMan tool. Therefore, a new kernel has

(15)

Figure 4.1: The eSDK framework (from [2])

4.3 Epiphany SDK

Adapteva has released a fully open source software development kit (SDK) to facilitate the writing of parallelized C code for the Epiphany chip, the eSDK [2]. The eSDK framework is shown in figure 4.1. Some of the key components of the eSDK are the optimized ANSI-C compiler, a multi-core debugger, communication and hardware utility libraries and an Eclipse IDE. Each core in the Epiphany chip runs a separate program, which is built and loaded onto the eCore by a host processor (the ARM-A9 on the Parallella). The host processor can access the eCores by use of the Epiphany Host Library (eHAL). This library provides methods for loading programs on the eCores, starting the programs, resetting cores, and passage messages to communicate with the eCores. Utilities on the Epiphany cores are provided by a standard C environment and the eLib API.

The basic steps to run a program on the Epiphany are: 1) The host program initializes a workgroup by specifying the number of rows and columns and the position of the start node in the group; 2) the host resets the nodes and loads the device-side executable on the eCores; 3) the host signals all the eCores to start the execution; 4) the host communicates with the eCores either through shared memory or through the local memory of a core; 5) when the execution is complete, the host is signalled and it reads the results either from the eCore’s local memory or from shared memory [26].

(16)

CHAPTER 5

Implementation

ManyMan consist of two separate layers, the front-end, which is responsible for the visualization and the interaction with the user and typically runs on a touch-supported device, and the back-end, which runs on the many-core system or the system that controls the many-core device, and handles the execution of tasks and the retrieval of the required data. For this project, and as the title suggests, two additional back-ends have been created, one for the big.LITTLE system and one for the Parallella-16. These back-ends are described in section 5.1, starting with the general part of the back-ends followed by the more specific features for the separate systems. Additionally, as the two many-core systems differ in both characteristics and available information from one another, as well as from the Intel SCC, the front-end has been modified to support these characteristics. The two resulting front-ends are described in section 5.2. In case more (detailed) information is wanted, one can consult Van Der Woning’s articles [24, 25] and the source code [9].

5.1 Back-end

ManyMan is Open Source and published under the GNU General Public License [4] and thus, the source code was available for this project. This eliminated the need to create the back-ends (aswell as the front-ends) from scratch and as a result, the basis of the software remains similar to the original ManyMan. The ManyMan structure is complex and well designed/implemented, and as such, it takes some dedication to fully understand the program. But when full understanding of the code is reached, it is a nice and structured piece of software to work with.

The back-end functions as the server in the ManyMan software. It initializes the basic information about the many-core system, such as the number of cores and the coregroups (in terms of frequency and/or voltage level). This is loaded from a default settings dictionary in the code but can also be updated by providing a settings file when starting the back-end. A TCP server thread is started to accept an incoming connection from the front-end and handle communication.

5.1.1 Monitoring

One of the most important parts of ManyMan is the monitoring of the the many-core systems. The back-end needs to retrieve the information about running processes, the payload of the cores, memory usage, etc. and send this to the front-end so it can be visualized. This information is retrieved through several commands and/or programs, described in the following paragraphs.

In order to retrieve the CPU usage for the individual CPU cores, the command mpstat is used for each core. This command outputs the CPU usage of the specified core with an interval of 1 second. The format of the output however, can differ between systems, for example, it depends

(17)

on spaces. Currently, the back-end works in both cases where AM/PM is, and is not, indicated. Originally, the top command was used to retrieve the payload per core. This was suitable for the Intel SCC since the top command would only obtain the information for the core it was ran on. However, in the case of the big.LITTLE and the Parallella, this command retrieves the payload information of the system as a whole, containing all the CPU’s, causing the top command to be no longer suitable to retrieve individual core payload. For this reason, top has been replaced by mpstat. Also, mpstat produces few other information than needed, unlike top.

The memory usage for each core is determined by the collective memory usage of the tasks that are running on said core. Note that knowing the memory usage per core might not be directly useful since the big.LITTLE and Parallella are shared-memory systems (for the ARM cores), but it could help to identify the core where a memory-intensive task is running. In order to prevent each core from parsing the same data in search for their individual tasks, a top command is run for each task, specified with the task’s Process ID (PID). This causes top to only output the process information for the given PID. This output is then filtered on the PID using grep, resulting in output that only contains the lines with the process information. This saves the trouble and time of parsing the additional (irrelevant) lines in the back-end. As an example, top produces about 6 lines before the line containing the process information, as every line is read individually, 6 (useless) iterations have to be done before the required information is found.

Note that the determined memory usage per core only consists of the memory usage of the tasks on that core that have been started using ManyMan. As there is no simple way to retrieve the total memory usage per core, one would have to retrieve the memory usage of every process running on the system, and determine to which core it belongs (which could be more than one core). This would burden the system (this would have to be done once every second) unnecessarily as monitoring per core memory usage is often not of great importance and even useless on shared memory systems.

5.1.2 Task creation

In order to start a new task, a Python subprocess is opened which first checks if the given task is an existing executable on the system. If so, the Parent Process ID (PPID) is printed to the output, needed to determine the task’s PID which will be described later. Then the task is assigned to the specified core and started, using the taskset command. This command can be used to set or retrieve the CPU affinity of an existing process, or launch a new process with a given CPU affinity, which makes it an ideal command to start and move tasks on the many-core systems. The output of the task is set to be line buffered, in order to prevent a delay when reading the output.

When the task is started, its Process ID can be determined. This is done by using the ps command and filtering on the PPID of the task with grep. This will obtain the PID’s of the child processes. Since the actual process that is running and needs to be monitored is not necessarily the direct child of the PPID, for instance when the task is executed via a shell script or with sudo, the found child processes are recursively checked for children until the process that matches the taskname has been found. Usually, the recursion only consists of one branch, as the initial PPID is a new process and generally only starts one child-process and does not fork. The intermediate PID’s of the parents are also stored for later use in the task interaction. One could wonder why the taskname is not used directly to find the PID, but since multiple tasks with the same name can be run at a time, this would not guarantee the right PID. Please note that problems may arise when the task finishes before the PID could be determined. This could be fixed by checking if the Python subprocess has terminated and taking actions accordingly.

5.1.3 Task interaction

When a task is created, certain actions can be performed to interact with the task. A task can be paused/resumed, moved between cores, stopped and killed.

(18)

there are intermediate processes, like sudo, these will also be stopped. This has to be taken into account when continuing the task. Pausing or stopping a task has become internally the same and the distinction between the two has become more of a visual feature. In order to move tasks from one core to another on the Intel SCC, it was necessary to checkpoint a task, which was done with use of the Berkeley Lab Checkpoint/Restart (BLCR) library [17]. This creates a context file which can be moved to another core in order to restart the task on that core. A checkpointed task can be terminated and as such, it will release its occupied resources. As evaluated by van der Woning [24], checkpointing a task creates a lot of overhead and writing to disk can be slow, this, plus the fact that checkpointing is not necessary on the big.LITTLE and the Parallella, resulted in the BLCR commands to be replaced by taskset. However, this has the downside that neither a stopped task nor a paused task will release its resources. If memory usage is an issue, BLCR could be re-implemented. To kill a task, a KILL signal is send to the process, the top command monitoring the task is stopped and the task instance is removed from the back-end. When continuing a task, it is important to not only send a CONT signal to the process, but also to the possible intermediate processes, which were found and stored when getting the PID, in order to prevent the back-end from hanging on a task thread when trying to read the output of the task. Moving a task to a specific core is also done by using taskset provided with the PID of the task, after which the task is continued in case it was not already running.

5.1.4 big.LITTLE specific back-end

For the big.LITTLE system, it is possible to change the CPU frequency of the big and LITTLE core groups. This is done with use of the cpufreq-info and cpufreq-set commands from cpufrequtils. When starting up the back-end, cpufreq-info is called to check if the userspace governor is available (see section 4.2), this governor is needed to be able to manually set the frequencies. If it is available, this governor will be set for each core, along with the minimum and maximum frequency for the core. These limits are determined from the frequency tables in the settings. For the LITTLE cores, the frequency ranges from 200 to 1400MHz, and for the big cores the frequency ranges from 200 to 2000MHz, both with intervals of 100MHz.

When the back-end receives a request to change the frequency, it will use the cpufreq-set to set the frequency of the specified core. In the big.LITTLE system used for this project, the big cores were grouped together, and the LITTLE cores were grouped together which means that all cores in the same group will run at the same frequency. Accordingly, setting the frequency of all cores in a group or setting the frequency of just one core in a group, will have the same result. The big.LITTLE system also allowed for the monitoring of the power usage of several com-ponents. This is done by a INA231 sensor on the I2_{C buses. At the start up of the back-end,}

these sensors are enabled. Then a simple script is used to read the values from the sensors for the big and the LITTLE cores. The volts, watts and amperes can be retrieved from the sensors, but currently, only the watts are used in the front-end.

On a side note, though important (!), the big.LITTLE back-end must be run with root privileges (sudo), as cpufreq-set requires super-user rights.

5.1.5 Parallella specific back-end

The monitoring of the Epiphany coprocessor is not a trivial task. In order to start a task or program on the Epiphany chip, one must start a host program on the host core(s) (the ARM-A9 CPU’s) which loads an Epiphany program onto the Epiphany cores and is responsible for the execution. This makes it impossible to track tasks and their status which are running on the Epiphany chip from the back-end. To get any information about a running process on the Epiphany, the process must provide this information itself, for example by writing to certain registers, and the host program must read and process this information. As there is no stan-dard method (yet) of doing this, it is program dependent. It has to be noted that running two

(19)

Figure 5.1: Epiphany power consumption as a function of the voltage (from [3], edited) example when the ERM (Epiphany Resource Manager) program, provided be Adapteva in the epiphany-examples repository [1], is active when starting another Epiphany example program (besides the “erm example” program). With programming programs for the Epiphany chip not being the focus of this project, the ManyMan back-end is able to start, and retrieve results from, host programs for the Epiphany, but no real-time monitoring of Epiphany tasks has yet been implemented.

Directly changing the frequencies of the CPU and coprocessor on the Parallella is not pos-sible and are set to a frequency of 667MHz for the Zynq ARM-A9 dual-core and 600MHz for the Epiphany. It is however possible to change the voltage of the Epiphany chip to some ex-tend. Using the eVolt program, provided by Adapteva in the Parallella-utils repository [13], the voltage can be set, ranging from 0.900V to 1.200V. As the voltage and clockspeed are related, changing the voltage will also affect the frequency. Figure 5.1 shows the power consumption of the Epiphany as a function of the voltage with all 16 cores executing a heavy duty workload. The maximum operating frequency is shown for each voltage level.

Since the Parallella does not come with an active cooling system (like a fan), it has been found useful to monitor the temperature of the board. The temperature is retrieved with use of the xtemp utility, from the same repository as mentioned above, slightly modified for compatibility with the back-end. This utility retrieves the temperature of the Zynq chip in Celsius.

5.2 Front-end

The visualization of the many-core systems is done by the front-end. The front-end typically does not run on the many-core system itself, but on a separate device. When the front-end is started, it will connect with the back-end for which the IP-address is supplied in the default settings or through a separate settings file. When connected, the front-end will receive the necessary data needed to build up the user interface, such as the configuration of the cores.

5.2.1 The main view

The main overview in the ManyMan front-end consists of several components. Figure 5.2 shows the overview for the big.LITTLE and figure 5.3 shows the overview for the Parallella.

(20)

On the left side, the list of available tasks is shown and a button is provided to add new tasks to this list (see figure 5.4). When this button is pressed, a pop-up opens where the path to a new program/task can be entered, with use of an included virtual keyboard or the system keyboard. The task will be added to the task list by clicking on the create button. A task can be started on a core by dragging it from the task list onto a core in the middle section of the overview. The tasks in the list all provide two buttons, the left button duplicates the task in the list, the right button will start the task on a core chosen by ManyMan. The graph underneath the task list displays the overall CPU usage of the system. As of the next patch, this graph will also display the total memory usage of the system.

The right side (figure 5.5) of the main view contains a help and an exit button, and the list of finished and failed tasks executed by the ManyMan. In the bottom right corner, a graph is drawn. For the big.LITTLE system, this graph displays the power usage in Watts for the big and LITTLE core groups. For the Parallella, this graph is used to show the temperature of the Zynq chip in Celsius.

The middle part of the overview visualizes the cores of the many-core system, along with sliders to set the CPU frequency, on the big.LITTLE, or the voltage level of the Epiphany, on the Parallella. The cores are displayed in a way that corresponds to the characteristics of the many-core system. The big and LITTLE cores for the big.LITTLE, and the ARM A9 dual-core and 16-core coprocessor for the Parallella. Each core has a coloured overlay that visualizes the CPU usage on said core, ranging from fully covering and red at 100% CPU usage, to not covering and green at 0% CPU usage.

5.2.2 The detailed views

Clicking on a core will open a pop-up that can be dragged, scaled and rotated. These proper-ties allow the user to arrange multiple pop-ups in a way that he/she finds useful. This pop-up displays more detailed information about the corresponding core (figure 5.6). A list of tasks and their states on the core is shown and two graphs displaying the CPU and memory usage of the tasks on the core. From here, a task can be moved to a different core by dragging it from the pop-up to a core in the main view. If the task is not released on a core, it will be stopped and moved to the task list in the main view. The tasks in the list contain an information icon which, when clicked on, opens a pop-up for the information of the task (see figure 5.7).

(21)

Figure 5.3: The Parallella ManyMan main view.

Figure 5.4: The task list and cpu-usage. Figure 5.5: Finished tasks and power usage.

The task information pop-up again contains two graphs to display the CPU and memory usage of the task. On the right, there is a scrollview, containing the last 100 lines of output of the task. This number can be changed but note that a too high number might slow down the

(22)

Figure 5.6: Core information pop-up. Figure 5.7: Task information pop-up front-end. The complete output of a task will also be written to a file on the front-end device.

Above the output, buttons are provided to control the task. The stop and pause button will send a request to the back-end to stop the task, but not kill or terminate it. The difference between stopping and pausing a task is that a paused task will remain on the core, and a stopped task will be moved back to the task list in the main view. This is however mainly a visual feature. Internally, paused and stopped tasks do not differ and will both still be on a core and occupying resources, as explained in section 5.1.3. When a task is paused, the pause button becomes a resume button, in order to continue the paused task on the core.

The smart-move button will move the task to a different core, chosen by the back-end. The choice for a core is based on the current workload of the core and the amount of tasks assigned to the core. It has to be noted that the “smart-move” option does not (yet) include the characteristics of the big and LITTLE cores when determining the best core.

A kill button has been added to terminate a running task and remove it from the system. A killed task will not end up in the finished task list but its output will still be saved to a file.

(23)

CHAPTER 6

Evaluation

When evaluating the ManyMan, one could perform usability tests for the front-end. This has already been done by van der Woning when developing ManyMan for the Intel SCC. The results of these tests pointed out that ManyMan is a intuitive tool for visualization and management of many-core systems and that it looked great. This was the general opinion of both Computer Science and non-Computer Science students. As the new front-ends created for this project do not introduce any radical changes with regards to the original front-end, these test results are also assumed to apply for this project.

During the development of the new front- and back-ends, it has been noted that the Many-Man for the big.LITTLE system can easily be modified to apply to regular Linux system. Several times during the development of the big.LITTLE back- and front-end, a regular Acer Laptop, containing four CPUs and running Linux Mint 17.1, has been used to test and run the program. The reason for doing this was that the big.LITTLE board was not always accessible, for instance when working at home. The only real changes that have to be made in order to properly run ManyMan on regular Linux systems are the disabling/removing of the power monitoring func-tions in the back-end, as most systems do not provide the power sensors, and some small changes to hard-coded properties in the front-end which were implemented for the big.LITTLE (such as the layout difference between big and LITTLE cores). Besides this, providing the appropriate settings files when starting the back- and front-end will take care of most of the differences in regular system characteristics.

A tool like ManyMan is not just a nice graphic toy to visualize many-core systems, but it can also be used for research purposes. By using ManyMan, one can easily run test programs on specific cores while being able to scale the CPU frequencies with a single click. Provided with real-time feedback in form of graphs as well as values, the user can easily see and interpret the results of his actions. For example, the big.LITTLE can be tested on its power consumption when running programs on the big and/or LITTLE cores, for different frequencies. All in a user friendly manner, being spared the trouble of opening multiple terminals for running the test programs, changing cores, adjusting frequency and retrieving the power usage, and needing to know all the commands for doing this.

(24)

CHAPTER 7

Conclusions

ManyMan has been developed to offer interactive visualization and dynamic management of many-core systems. With this tool, many-core systems become more accessible and easier to test and evaluate. Relatively few to no tools are yet available that provide the user with both infor-mation of the many-core on task-, core-, and system-level, and the ability to start and manage tasks. ManyMan provides these possibilities through a intuitive, userfriendly, multi-touch sup-porting application. With the large offer of many-core systems, it has become important to test and compare different many-core systems, with regards to both research and education. Needing to use many different tools for different many-core systems is a nuisance and troublesome, as a result, a general tool is desired that can be used on different many-core systems in a consistent way. This project has set out to extend ManyMan for use on two new many-core systems, in order to set ManyMan on its path to become this general tool.

Two new back-ends have been created, one for ODROID-XU3 big.LITTLE system and one for the Parallella-16. Task migration on these systems is done via the taskset command. This command has replaced the use of the BLCR library, providing faster migration of tasks. How-ever, with this checkpointing disabled, stopped and paused tasks will remain on the CPU, and never release resources until they are either killed or finished. This can be an issue on systems with little RAM, in which case, the BLCR library could be re-implemented.

The kernel for the big.LITTLE has been slightly modified to allow for CPU frequency scal-ing. Power statistics are retrieved from the INA341 sensors, which include power consumption, current and voltage for the big cores, LITTLE cores, GPU and DRAM. However, currently only the power consumption in watts is used in the ManyMan. When changing the frequencies on the big.LITTLE, results can immediately be seen in terms of power consumption displayed in the bottom-right graph in the front-end. It has also been found that the big.LITTLE ManyMan can easily be ported to be used on a regular Linux system, such as a laptop, due to the big.LITTLE not having very special or abnormal hardware or characteristics (its power resides in efficiently scheduling of tasks between the big and LITTLE cores). Providing settings files to the front-and back-end, suited to the regular system, will take care of most of the differences.

Monitoring the Epiphany chip on the Parallella turned out to not be an easy task. In or-der to get any information about the processes running on the Epiphany chip, these processes must provided this themselves and the host program (running on the ARM A9 dual-core) most properly retrieve this information. As such, it is the responsibility of the programmer to provide process information. This makes it nearly impossible for the back-end to perform any monitoring of tasks running on the Epiphany. It also has to be noted that running incompatible programs on the Epiphany, or improperly resetting the Epiphany between program executions, cause the Epiphany to crash, along with the rest of the board.

(25)

that matches their characteristics, contributing to the intuitive and user friendly properties of ManyMan. The new front-ends have been built to support Kivy version 1.9.0, introducing some bug fixes and improving user experience through updated features. Furthermore, the ability to properly kill running tasks has been implemented, and some issues have been solved with regards to not, or incorrectly, updating widgets. However, these fixes have not yet been applied to the Intel SCC ManyMan.

Besides having a nice graphic interface, ManyMan is also suited for research purposes. Through ManyMan, test programs can be easily run, while monitoring information such as power consumption or memory usage, and scaling frequencies, all with just a coupled of clicks (or taps). This relieves a user from the need to have multiple terminals open to perform these task, and memorizing the correct commands, potentially causing the user to lose sight of what is going on.

With providing such easy-to-use and clear features to the user, and the expanding support for additional many-core systems and architectures, ManyMan is well on its way of becoming the general visualization and management tool for many-core systems and potentially even for regular PCs.

(26)

CHAPTER 8

Future work

During the development of this project, some tasks turned out to be more complicated and more work than expected. For instance the compiling of the new kernel and successfully flashing the Micro SD card took more time than necessary due to inexperience of this process. Also, the Parallella initially was unstable and rarely booted correctly, requiring a lot of reboots and power disconnections. Flashing a more recent version of the Ubuntu image to a new Micro SD card eventually solved this. As a result, time did not allow the implementation of all features wanted in ManyMan, which means there is some room for future work.

Most importantly, the monitoring of tasks on the Epiphany is currently not sufficient to the needs of the user. One would like to keep track of which program is running on which eCore, this could for instance be retrieved directly from the host program, which loads these programs on the specific eCores. This however, will require the programs to follow a certain format of providing this information, and the back-end should be able to process this information. Furthermore, one would also like to monitor the percentage of activity on the Epiphany and its memory usage. The Epiphany Resource Manager (ERM) and erm example program from the Epiphany example programs provided by Adapteva [1] are an example of tracking the activity on the Epiphany.

In the ManyMan for the big.LITTLE and Parallella, some bugs introduced by the SCC Many-Man were fixed and more recent version of Kivy has been used. The Many-ManyMany-Man for the SCC, however, has not been modified. Applying these bug fixes and updating the Kivy version in the SCC ManyMan will help ManyMan to stay up-to-date.

Currently, ManyMan consists of three separate front-ends and three separate back-ends for the three many-core systems. To keep ManyMan organized and modular, the three front-ends could be integrated into one front-end, in which the user can switch between the available many-core systems, either internally in the front-end, for instance via a drop-down list, or by supplying the settings file for the targeted many-core system. In the current front-ends, solely providing a corresponding settings file is not sufficient for switching between many-core systems.

Improvements in the smart-move function can also be made, such as taking into account the big and LITTLE core characteristics when selecting the most suitable core to run the task on. Also, a scheduler could be implemented to let the back-end switch running tasks between cores. As an example, if a task is using 100% of the CPU on a LITTLE core, the scheduler can decide to move the task to a big core.

(27)

Bibliography

[1] Epipany-examples GitHub. Online, https://github.com/adapteva/epiphany-examples [Visited June 2015].

[2] Epipany SDK Reference. Online, http://www.adapteva.com/docs/epiphany_sdk_ref. pdf [Visited June 2015].

[3] Epiphany E16G301 datasheet. Online, http://adapteva.com/docs/e16g301_datasheet. pdf [Visited June 2015].

[4] GNU Licenses. Online, http://www.gnu.org/licenses/.

[5] The gpfmon home page. Online, http://andrzejn.web.cern.ch/andrzejn/ [Visited June 2015].

[6] Hardkernel EnergyMonitor. Online, https://github.com/hardkernel/EnergyMonitor [Visited June 2015].

[7] INA231 Power Monitor, Texas Instruments. Online, http://www.ti.com/product/ina231 [Visited June 2015].

[8] Kivy Organization. Kivy - Open source Python library for rapid development of applications that make use of innovative user interfaces, such as multi-touch apps. Online, http://www. kivy.org/ [Visited May 2015].

[9] Manyman source. Online, https://github.com/FlorisTurkenburg/ManyMan.

[10] ODROID-XU3 Kernel. Online, https://github.com/hardkernel/linux/tree/ odroidxu3-3.10.y [Visited April 2015].

[11] Parallella. https://www.parallella.org/.

[12] Parallella Kickstarter project. Online, https://www.kickstarter.com/projects/ adapteva/parallella-a-supercomputer-for-everyone [Visited June 2015].

[13] Parallella-utils GitHub. Online, https://github.com/parallella/parallella-utils [Visited June 2015].

[14] S. Borkar. Thousand core chips: a technology perspective. In Proceedings of the 44th annual Design Automation Conference, pages 746–749. ACM, 2007.

[15] S. Eranian. Perfmon2: a flexible performance monitoring interface for Linux. Citeseer, 2006. [16] P. Greenhalgh. big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7. ARM White

paper, 2011. http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf. [17] P. H. Hargrove and J. C. Duell. Berkeley lab checkpoint/restart (BLCR) for Linux clusters.

In Journal of Physics: Conference Series, volume 46, page 494. IOP Publishing, 2006. [18] J. Held, J. Bautista, and S. Koehl. From a Few Cores to Many: A Tera-scale Computing

(28)

[19] J. Howard, S. Dighe, S. R. Vangal, G. Ruhl, N. Borkar, S. Jain, V. Erraguntla, M. Konow, M. Riepen, M. Gries, et al. A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling. Solid-State Circuits, IEEE Journal of, 46(1):173–183, 2011.

[20] S. Jarp, R. Jurga, and A. Nowak. Perfmon2: a leap forward in performance monitoring. In Journal of Physics: Conference Series, volume 119, page 042017. IOP Publishing, 2008. [21] Kivy. Changelog. Online, http://www.kivy.org/#changelog [Visited May 2015].

[22] A. Olofsson, T. Nordstr¨om, and Z. Ul-Abdin. Kickstarting High-performance Energy-efficient Manycore Architectures with Epiphany. arXiv preprint arXiv:1412.5538, 2014. [23] K. Olukotun and L. Hammond. The future of microprocessors. Queue, 3(7):26–29, 2005. [24] J. van der Woning. Interactive visualization and dynamic task management of many-core

systems. A case study: The Intel Single-chip Cloud Computer. 2012. http://dare.uva. nl/cgi/arno/show.cgi?fid=447352.

[25] J. van der Woning and R. Bakker. Interactive Visual Task Management on the 48-core Intel SCC. In The 6th Many-core Applications Research Community (MARC) Symposium, pages 40–45. ONERA, The French Aerospace Lab, 2012.

[26] A. Varghese, B. Edwards, G. Mitra, and A. P. Rendell. Programming the Adapteva Epiphany 64-core Network-on-chip Coprocessor. In Parallel & Distributed Processing Sym-posium Workshops (IPDPSW), 2014 IEEE International, pages 984–992. IEEE, 2014.

Extending ManyMan with additional back-ends for big.LITTLE and Parallella

Bachelor Informatica