An RTOS for embedded systems in Rust

(1)

Bachelor Informatica

An RTOS for embedded

sys-tems in Rust

Wicher Heldring

June 8, 2018

Supervisor: Drs. T.R. Walstra

Inf

orma

tica

—

Universiteit

v

an

Ams

terd

am

(2)

(3)

Abstract

Rust is a programming language that offers safety and reliability without sacrificing run-time performance. Rust can be used as a language to build operating systems. Examples of operating systems built in Rust include Tock and Redox. However, Rust has not yet been tested for its real-time qualities. This thesis will investigate the hypothesis that it is possible to write an real-time operating system (RTOS) in Rust for embedded systems. The hypothesis is tested by measuring and comparing the performance of a Rust and C scheduler. Results show that Rust scheduler exhibits similar real-time characteristics to the original C scheduler. These results make it highly probable that it is possible to build an RTOS in Rust with competitive performance to C or C++ RTOSes. Further research is required to verify that writing an RTOS in Rust is doable.

(4)

(5)

Introduction

Rust is a programming language for developing programs which are reliable and efficient [1]. It has a safe type system and a safe memory model. Decades of research to make operating systems safer have had mixed success. Two third of the bugs as mentioned in Common Vulnerabilities and Exposures in 2017 in the Linux kernel can be attributed to using an unsafe language [2]. Most safe languages are not practical to use within an operating system kernel, because these languages do not offer control when a program runs and how a program uses its memory. This requirement of where and when the code runs is how Rust differs. Rust offers safety without sacrificing control over time and memory. Rust achieves this control without sacrificing safety by moving the checks to compile time. Currently, there are a few operating system kernels that are built with Rust such as Tock [3] or Redox [4]. These OSes prove that it is possible to develop an OS in Rust.

An RTOS (real-time OS) contains a kernel which enables programs to execute within a deter-ministic time-frame. An RTOS ensures that tasks never miss their deadline. Tasks for an RTOS are developed to finish their work within this specific time-frame. If a program is running after its predefined deadline, the RTOS has failed [5]. RTOSes are used for tasks where any delays could cause failure. Examples of RTOSes usage include aerospace, aviation, life-support equip-ment, nuclear power systems and milling machines. Currently, most RTOSes are written in C or C++ because in an RTOS deterministic performance is the primary requirement. Languages that do not allow to specify when or what is running cannot be used for an RTOS since it would be impossible to ensure that deadlines are never missed. An example of a standard language feature that breaks this requirement is a garbage collector. A garbage collector runs next to the program, and on a non multi-threaded system it has to run on the same core to keep memory usage down. This garbage collector interferes with execution.

This thesis will investigate the hypothesis that it is possible to write an RTOS in Rust for embedded systems. An existing C RTOS is compared to the same RTOS where the scheduler is replaced with a Rust scheduler. The hypothesis is tested by measuring and comparing the performance of the Rust and C scheduler.

1.1 Related work

Tock and Redox are OSes built in Rust. Redox is a Unix-like operating system that takes a microkernel approach. Redox is designed to be a general-purpose OS with a focus on safety, freedom, reliability, correctness, and pragmatism [4]. It is developed to run most Linux programs with minimal changes. If there is a trade-off to be made between correctness and compatibility with Linux, Redox chooses correctness over compatibility.

Tock is an embedded operating system designed to run concurrent and distrustful applications on low-power and low-memory microcontrollers. Tock makes a distinction between the core kernel, capsules, and processes. The core kernel exists of a hardware abstraction layer (HAL),

(8)

a scheduler and some platform-specific configuration [3]. Capsules do not run in privileged hardware mode but use the safety of Rust for isolation. Capsules can access each other and the kernel by calling exposed functions and accessing exposed members. Using Rust as a safety measure allows capsules to have no memory or run-time overhead. The downside of this system is, that if a capsule hangs, it brings the entire system down, since capsules are cooperatively scheduled. Additionally, capsules cannot dynamically allocate memory. Dynamic allocation could cause a faulty capsule to starve the entire system of resources.

Tock processes are isolated from each other using a hardware memory protection unit. Pro-cesses are preemptively scheduled in a way that faulty proPro-cesses can never bring the system down. The downside is, that processes incur a memory and run-time overhead. Processes in Tock allow for dynamic memory allocation. All capsules can request a certain amount of memory to be reserved for every process. Processes are not allowed to access this extra memory and can only access their memory and stack space.

(9)

CHAPTER 2

Theoretical background

2.1 What is a RTOS?

A real-time operating system (RTOS) is an operating system that has to run in real-time. The definition of real-time is better explained by using the word deterministic. Usually, operating systems try to optimize for executing as much work as possible. A real-time operating system ensures tasks are always finished before a deadline, even if it has to sacrifice overall performance. An RTOS has two primary requirements [6]. Logical correctness: the RTOS produces correct outputs, and temporal correctness: the RTOS produces the output at the correct time. RTOSes show up in high-risk scenarios. Failure of one of these requirements could have catastrophic consequences in some applications. A typical RTOS is hard real-time. Missing the deadline of a task, not being temporal correct, is a failure. Soft real-time OSes allow occasional deadline misses. Examples of soft real-time applications include telephone switches and video games. Missing a deadline on these systems does not have catastrophic consequences [5]. In a soft real-time system deadlines should be met most of the real-time. For these systems, there is no precise definition of what most of the time means.

2.1.1 Causes of indeterminism

To have a deterministic system every single layer in a system has to be deterministic. If one layer is not deterministic every layer that depends on it will be non-deterministic as well. This is why in RTOSes it is of critical importance to be aware of all abstractions. Not only the software abstractions are essential to take into account, but also the hardware must be capable of deterministic performance up to a certain degree. If, for instance, the cooling is not good enough, this could affect the deterministic characteristics of a system because of overheating, which causes loss of timely performance. Another hardware cause of indeterminism is an energy-saving processor: a processor that starts to underclock itself as soon as it has nothing to do. As soon as the critical code has to execute after a small amount of idle time, the processor has to react that there is work to be done and will take some time to get up to speed. This temporary loss of performance could cause an RTOS to fail its task by missing its deadline.

Not only hardware can cause RTOSes to fail their deadlines, software also has to take into account that it has to run deterministically. One of the problems in writing software in an RTOS, is that memory allocation is not deterministic. It is preferred to allocate memory statically. However, it is not always possible to allocate everything statically. Depending on the choice of an algorithm, the response time to a dynamic memory allocation can be high or even unbounded [7]. FreeRTOS solves this by supplying a set of different memory allocators which possess different response times and characteristics. The memory allocator can be specified at the start of the program to get optimal performance.

(10)

2.2 Why use Rust instead of C or C++

2.2.1 Undefined behavior

One of the most common problems is that in C and C++ code is undefined behavior. Undefined behavior has the advantage that code can be optimized more easily. The compiler can assume the most efficient path in case of undefined behavior. For instance, as soon as two numbers are added together in C, they can overflow. In C, overflow is undefined on signed integers. In the following code, the output differs when optimizations are enabled.

#i n c l u d e <s t d i o . h> i n t main ( i n t a r g c , c h a r ∗ a r g v [ ] ) { i n t a = 0 ; // ’ a ’ i s s e t d y n a m i c a l l y i n a way t h a t c o m p i l e r c a n n o t o p t i m i z e // t h e c o d e away i f ( a r g c == 1 ) { // T h i s i s t h e maximum s i g n e d i n t e g e r number a = 2 1 4 7 4 8 3 6 4 7 ; } p r i n t f ( ” a = %i \n ” , a ) ; i f ( a + a < 0 ) { p r i n t f ( ” O v e r f l o w ! \ n ” ) ; } e l s e { p r i n t f ( ”No o v e r f l o w \n ” ) ; } } // Output w i t h o u t o p t i m i z a t i o n s : a = 2 1 4 7 4 8 3 6 4 7 O v e r f l o w ! // Output w i t h o p t i m i z a t i o n s (O3) a = 2 1 4 7 4 8 3 6 4 7 No o v e r f l o w !

This optimization is allowed, because overflow of signed integers in C is undefined behavior. The compiler can reason that ’a’ is either 0 or 2147483647. The compiler can reason that a + a must always be positive since adding two numbers larger than 0 always yields a number larger than 0, since overflow is undefined. Thus the a + a can never be smaller than 0, and the overflow case can never happen. Dead code elimination will cause the overflow branch to be compiled away until it just leaves the statement ’ printf (”No overflow”);’.

Rust defines signed integer overflow. The same program in Rust cannot compile the overflow case away, because overflow is defined to be the 2-complement wraparound overflow in release mode. In debug mode, Rust will automatically add checks that validate if integers overflow. In the safe subset of the language of Rust, there is no undefined behavior. The safe subset of Rust is restricted to support this. So additionally Rust has an unsafe subset. Unsafe code is code that can cause undefined behavior but allows for programs that can bypass certain restrictions. These bypasses include dereferencing pointers, since these could point to invalid memory or call C functions.

To write a program in C that does not invoke undefined behavior is hard. Rust programs can have undefined behavior, but the scope of where this undefined behavior occurs is limited to specific blocks of code which are marked unsafe. This guarantee ensures, that as long as these unsafe blocks of code are verified to cause no undefined behavior, there is no undefined behavior in the program. No undefined behavior does not guarantee no programs errors, but it will ensure

(11)

that the program will not crash by unexpected behavior during run-time.

2.2.2 Affine type system

A distinct feature of Rust is the way the language implements references. References can be used and dereferenced in the safe subsection of Rust code. To guarantee that references never point to invalid memory, Rust has a concept called ownership. Ownership ensures that any referenced object will exist as long as there is a reference pointing to it without a garbage collector. This guarantee is validated at compile time [8]. Without any run-time overhead, any safe Rust program can never have invalid memory accesses.

In Rust, there are two kinds of references, mutable and shared references. Any variable can have any number of shared references or one mutable reference. Rust will track the lifetime of references at compile time. The lifetime of a reference must be shorter than the lifetime of the original referenced variable. For instance, the following code which is valid in C or C++ code will fail to compile in Rust.

f n s i n k ( s : S t r i n g ) { p r i n t l n ! ( ” R e c e i v e d : { } ” , s ) ; } f n main ( ) { l e t a = S t r i n g : : from ( ” H e l l o ! ” ) ; l e t b = &a ; // A g e t s u s e d h e r e which w i l l i n v a l i d a t e t h e r e f e r e n c e t o b s i n k ( a ) ; // b p o i n t s t o i n v a l i d memory h e r e p r i n t l n ! ( ” { } ” , b ) ; } // C om pi le r g i v e s t h e f o l l o w i n g c o m p i l e e r r o r :

// e r r o r [ E0505 ] : c a n n o t move o u t o f ‘ a ‘ b e c a u s e i t i s borrowed

As soon as the function sink is called with ’a’, the original ’a’ does not exist anymore, since where ’a’ resides is at a different point in memory. The reference ’b’ is now pointing to a non-existent memory location which is invalid. Rust tracks the lifetime of ’a’ and ’b’ and fails to compile because of this.

In the previous code example ’a’ does not exist anymore as soon as the function sink is called. This is because Rust has an affine type system. An affine type means that any variable can be used at most once. Note that referencing a type is not using it. Using it is similar to moving in C++. While C++ leaves the original variable in a valid but unspecified state, in Rust the original variable gets destroyed. Trying to do anything with the variable after it has been destroyed, results in a compile-time error. In the previous example sink takes a string itself, instead of a reference to a string. This causes the original string to be destroyed. Another example of this affine type system occurs in the FnOnce trait.

pub t r a i t FnOnce<Args> { t y p e Output ;

f n c a l l o n c e ( s e l f , a r g s : Args ) −> S e l f : : Output ; }

Instead of taking a reference to ’self’, FnOnce takes ’self’ itself. After executing the function ”call once” the variable on which this is executed, is now used and cannot be used again.

These rules allow Rust to guarantee memory safety without sacrificing performance. Storing references and ensuring all references go out of scope before destroying the object itself might be difficult. To work around this problem, it is possible to use handles instead of references. For instance, if there is a big chunk of text and a reference needs to be stored to a part of the text,

(12)

instead of saving the reference to the exact point in the text, it is better to store an offset that points to the text. When this offset needs to be transformed to a reference again, it is possible to recalculate the offset and the text as input. Using handles allows the original text to move between owners.

2.2.3 Rust without standard library

Since Rust is competing as a system language, Rust has the option to compile without its standard library. When compiling Rust, it will automatically link to a standard library that uses features of the underlying kernel like Networking, Threading or File I/O. Supplying a no std (standard) flag to the compiler will prevent the program from linking to the default standard library. This flag stops the code from executing system calls that might not exist. This flag allows Rust to run on platforms where there is no OS or where the OS is not supported.

2.3 Scheduling methods

There are multiple ways to measure how RTOS schedulers perform. The most obvious is how long it takes till a task executes after its arrival. Another way to measure is stability under transient overload. Transient overload happens when a system has so many tasks that need to be processed at the same time, that it is impossible to meet all deadlines. A characteristic is the maximum utilization where the scheduler will never fail deadlines. Also, there are some qualitative characteristics such as how a scheduler deals with a faulty program or whether a scheduler can be configured to allow different priorities for tasks.

When analyzing the performance of a scheduler, the cost of a context switch needs to be taken into account as well. A context switch can take a non-negligible amount of time. If a sorted list has to be kept, the performance is at best O(n log(n)). Depending on the amount of time it takes to switch between tasks, sometimes it might be more optimal to finish a low priority task over immediately starting the highest prioritized task.

2.3.1 Cooperative scheduling

Cooperative scheduling is a method of scheduling where threads voluntary yield control to the kernel, so that the kernel can schedule another task. A switch of tasks only happens, if the original task yields control back to the kernel. Yielding control can be caused, for example, when the task requests a lock on a mutex that is already locked, or when the task starts sleeping.

A downside of cooperative scheduling is that sometimes threads are executing a lengthy calculation. The time before the process yields can grow so large that a higher priority task misses its deadline.

2.3.2 Preemptive scheduling

Preemptive scheduling is a method where the scheduler interrupts a running task. Instead of waiting for a task to voluntary yield back control to the scheduler, an interrupt is called that gives control to the scheduler, which can decide whether to continue running the existing task or run another task. Preemptive scheduling uses cooperative scheduling in case a task voluntarily yields control back to the kernel.

There are multiple ways how to decide what task needs to run when an interrupt triggers. A typical way is to resume the highest priority thread. The priority of a thread is assigned when the thread is created. During execution, the scheduler will always execute the highest priority active thread. This priority assignment is called Fixed Priority Preemptive Scheduling (FPPS). A downside of FPPS is the amount of context switches. Reducing the amount of context switches could alleviate this problem. A possible solution is using a slack stealing algorithm [9]. A slack stealing algorithm allows a lower priority task to continue to run if the higher priority has enough slack. The deadline is far enough away that it can run after the lower priority task finishes.

(13)

2.3.3 Round-robin scheduling

Round robin scheduling distributes time-slices in a fair way to all tasks. A time-slice is the period where a task can run between two interrupts. A round robin scheduler works like a preemptive scheduler except when a task exceeds a set amount of time, then an interrupt will fire. This interrupt will switch the active task for a new task of equal priority. If there is no task of equal priority, the task will continue executing. Round robin has a small extra run-time overhead, but multiple high priority tasks cannot starve each other.

Round-robin scheduling has the same downside as preemptive scheduling, namely that as soon as CPU utilization becomes high, it will miss deadlines of lower priority tasks.

2.3.4 Rate monotonic scheduling

When transient overload occurs, a scheduling algorithm will prefer to fail lower priority tasks over high priority tasks. However, the problem is, that if the scheduler only focuses on the highest priority tasks, lower priority tasks are likely to fail their deadline during high CPU usage. A solution to this problem is rate-monotonic scheduling [10].

Rate-monotonic scheduling schedules the task with the highest priority first. The priority is defined as the inverse of the duration of the task. So a task that takes less time has a higher priority. A task that takes a long time will be preempted by any task that takes less time.

A rate-monotonic scheduler is guaranteed to always meet all deadlines for all possible task start times if the condition holds that

n X i=1 Ci Ti ≤ n(21/n_{− 1)} _(2.1)

where Ciis the execution time of a task and Tiis the period of the task [11]. The right-hand side

of the equation starts off at 1 for n = 1 and converges quickly to around 0.69. For large n the amount of utilization can never exceed 70%, or the deadlines are not guaranteed to be met. The left-hand side of the equation calculates the total utilization of the CPU. Note that the deadlines in rate-monotonic scheduling are set to when the same task is rescheduled again.

This 70% is the worst case. The 70% worst case bound will usually never happen during typical operation of an RTOS. A more realistic bound is over 90% [10].

2.3.5 Earliest deadline first scheduling

Earliest deadline first (EDF) scheduling schedules the task that has the closest upcoming dead-line. EDF is a perfect scheduling algorithm in the sense that EDF will always be able to schedule all tasks if and only if the total CPU utilization is less than 100

n X i=1 Ci Ti ≤ 1 (2.2)

If the total CPU utilization is larger than 1 no algorithm can meet all deadlines. EDF scheduling does not see much utilization in commercial RTOSes. The reason for this is that EDF scheduling has a few major downsides [12].

• It is less predictable. When a task is running, and a task with an earlier deadline preempts it, the task will be deferred until later.

• It is less controllable since there is no way to change the priority of a task.

• There is more scheduling overhead. In EDF scheduling, the scheduler needs to keep track of the next running thread which at best performs in O(log(n)) fashion. Other scheduling can run in O(1).

(14)

A domino effect occurs when an EDF scheduler is running near 100% CPU utilization. When a deadline is missed, this deadline will affect other tasks which will also start to miss deadlines. Other algorithms might prioritize higher priority tasks, so that at least some tasks meet the deadline while lower priority tasks do not meet the deadline.

2.4 Mutexes

Sharing variables between multiple threads can cause problems because data races can occur. A data race happens when for instance two threads share a counter. Both threads will increment the counter every five milliseconds. Incrementing a counter consists of three steps. The first step is to load the existing value of the counter. The second step is to increment the value. The final step is to write the resulting value back into memory. If the second thread preempts the first task thread while it is in the middle, it will load the same value as the original thread and thus write the same result to the memory. Although the counter should have been incremented twice, it is only incremented once. This effect is called a data race. To solve this problem OSes have a primitive called a mutex (mutual exclusion). A mutex ensures, that when a thread locks it, no other thread can lock it at the same time. Implementing a mutex brings about a new problem called priority inversion.

For priority inversion to occur, there must be at least three threads, A, B and C. These threads have descending priority. If thread A and C share a mutex and this mutex gets locked by thread C, A can preempt thread C and start working since it has higher priority. As soon as it then tries to lock the mutex, it cannot do so, because C currently holds this mutex. Without priority inversion thread A will wait until the mutex gets unlocked. Thread B will now run since it has a higher priority than C. This causes thread B to run over A, since running C now will allow thread A to continue work quicker. This event is called priority inversion because a thread with higher priority is waiting for a thread with lower priority [5].

To solve the problem of priority inversion, operating systems employ a strategy called priority inheritance. When thread A starts waiting for thread C, thread C will inherit the priority of task A. Instead of B running, C will run since it has a higher priority. This strategy solves the problem of priority inversion.

2.5 Isolation

In most RTOSes isolation is limited. The overhead of adding isolation is too significant for most RTOSes. Some RTOSes such as Xenomai allow for running real-time applications in user space. Other RTOSes like ChibiOS provide a variety of safety options to help development and debugging.

Isolation helps for debugging and catching a range of errors that could cause complete system failure, since non-isolated code with a bug can cause writes to arbitrary memory. This can cause segmentation faults or undefined behavior. Isolation would prevent these errors from causing undefined behavior and allows the RTOS to continue while stopping a single thread or notify the developer of the error.

Isolation is provided by some RTOSes using the integrated memory management unit (MMU) or memory protection unit (MPU). An MPU protects memory during run-time by specifying memory regions that can be accessed during execution of a part of the RTOS. Any memory access outside the specified bounds will cause an interrupt. Using the MPU has some overhead, because the RTOS not only has to save the registers, but it also has to save the memory protection status. During a context switch, it needs to switch registers as well as update the MPU in order to protect the right memory.

An MMU virtualizes the entire address space in a way that a running process cannot access any memory outside of its process. An MMU is more complicated than an MPU, so smaller

(15)

Figure 2.1: An example of priority inversion. Task A, the highest priority task, misses its deadline, because it is waiting for a lower priority task C, that is waiting for task B with a priority between task A and C.

Figure 2.2: An example of priority inheritance. Task A, the highest priority task, makes its deadline because the lower priority task C inherits task A’s priority.

(16)

processors tend to have an MPU, whereas fast processors tend to have an MMU. An MMU also has additional overhead over an MPU [13].

2.6 Multi-threading in an RTOS

Multi-threading in an RTOS introduces new problems during the development of an RTOS. De-pending on the processor, it could either have symmetric multiprocessing (SMP) or asymmetric multiprocessing (AMP). The difference is that an SMP system shares memory between all pro-cessors while an AMP system runs a separate instance of the RTOS for every specific core. These separate processors have their memory and interrupt which are not shared between cores. An external system has to be designed to facilitate communication between the cores.

During the implementation of an SMP RTOS, a few problems need to be considered. An SMP system has the possibility of running into cache trashing [14]. Cache trashing happens when a thread switches between cores and it has to reload the internal cache for the new core. A solution to this problem is binding the threads to specific cores. This solution stops the problem from occurring, since now the process will not switch cores. Another problem is that the threads have to be appropriately distributed over the cores. A load-balancer can be developed to support moving the processes between cores, as soon as utilization on a core is high when there is another core with less utilization at the same time.

The main advantage of an AMP system over an SMP system is that an AMP system can run in a heterogeneous environment. The cores do not have to run at the same speed. An AMP system needs a system to communicate events between cores. This can lead to scalability issues as soon as there are more than two cores, since any event needs to be communicated using this system, instead of using system primitives that exist in an SMP environment. For legacy systems, an AMP system might be preferred as well, because this guarantees that the application will run like it would run in a single threaded system.

(17)

CHAPTER 3

Environment and method

3.1 ChibiOS

ChibiOS is a small RTOS optimized for performance and code size. It supports most common RTOS features such as preemptive scheduling, semaphores, mutexes as well as messages queues. Additionally, it has a hardware abstraction layer which supports a multitude of different inter-faces. It is a GPLv3-licensed project and is free to use for teaching and hobby projects. For commercial usage, it is necessary to buy a license.

3.1.1 Thread allocation

There are two kinds of possible threads in ChibiOS. Static ones and dynamic ones. For static threads, the amount of space needed for the stack must be specified at compile time. There can only be one thread running in this static space at the same time. For dynamic threads, the needed space for the thread data and thread stack is allocated on the heap.

An allocated thread consists of thread data and stack space. The thread data includes what state the thread is in, saved registers and also the thread data for the Rust scheduler. When the thread is not running, it saves the register state into this structure. Next, it loads a different set of registers from another thread. When a static or dynamic thread is allocated, an amount of stack space has to be specified. In case the thread uses more stack space than specified, the thread will overwrite the next allocated thread. ChibiOS has support for stack protection. Stack protection works by adding a few bytes at the end of the stack space. At every context switch, ChibiOS will check whether the values of these bytes are identical to how they are initialized initially. In case they are not identical ChibiOS will stop the kernel reporting the error. Using this check has a performance overhead. It is better only to use this protection during development.

3.1.2 Context switch

Whenever a thread is running, there are two ways to switch between threads, preemption and cooperative scheduling. Preemption occurs when an interrupt by a hardware subsystem such as timer happens. When this interrupt fires, it is possible to save the current state of the stack and switch to another thread. Preemption means that all pieces of code can execute with possible breaks in the middle. During critical sections of the code, the kernel disables the interrupts. In ChibiOS all scheduling procedures are critical. Other examples where interrupts are disabled, are during the locking and unlocking of mutexes.

Another way for a context switch to happen is, when a thread uses a system call that causes it to sleep. For instance when a thread tries to lock a locked mutex or when the thread is waiting for a result from a hardware subsystem.

A context switch saves the current state of the registers in the thread allocated storage. The registers of the new thread are loaded and overwrite the current registers. For the thread itself, the context switch is entirely transparent. The original thread is resumed where the context

(18)

switch happened. In case a context switch happens because a thread got preempted, the only way to notice that there was a context switch is by measuring the time between statements, or to notice that the environment changed. A shared variable could now hold a different value.

3.1.3 ChibiOS scheduler

ChibiOS has a preemptive scheduler which yields processes to a higher priority task when a higher priority task becomes available, or when the current process yields control to the kernel. Additionally, it supports an optional round-robin extension that will cause the processor to switch between threads of the same priority every n milliseconds. During the implementation of the RTOS scheduler, the preemptive scheduler is replaced with a Rust version.

The original ChibiOS implementation of the scheduler works by keeping a ready list of all active threads sorted by priority. When a tasks is added, the task will iterate through the priority list using a linear search, until it finds the first thread where the priority is higher than the next in the ready list. It will insert the thread at that point of the priority list. To keep the measurements fair, the Rust scheduler will use the same strategy.

3.2 STM32F3DISCOVERY

The STM32F3DISCOVERY is the embedded development which is used for measuring the dif-ference in performance. It has 48KB of ram running at 72MHz. Additionally it supports a ST-LINK/V2 interface. This is an interface that allows for debugging the code running on the embedded device from a connected device using serial wire debugging.

Additionally it has support for semihosting. Semihosting is a mechanism that allows com-munication over the ST-LINK/V2 interface to a debugger. For instance, in the Rust code there is a function called ’bpkt’. pub f n bkpt ( ) { match ( ) { #[ c f g ( t a r g e t a r c h = ”arm ” ) ] ( ) => u n s a f e { asm ! ( ” bkpt ” : : : : ” v o l a t i l e ” ) } , #[ c f g ( n o t ( t a r g e t a r c h = ”arm ” ) ) ] ( ) => unimplemented ! ( ) , } }

This function will cause the external attached debugger to break whenever this code is called. Semihosting also supports sending arbitrary data to the debugger.

(19)

3.3 ARM Toolchain

The GNU ARM EABI toolchain is used to compile and link the RTOS. This is an open-source toolchain maintained by ARM that allows for compilation to Cortex-M and Cortex-R ARM processors. To compile the Rust code, the Rust compiler is used. This compiler generates object files from Rust source files. These Rust object files are linked with the C object files to build the image.

(20)

(21)

CHAPTER 4

Experiments and results

4.1 Experiments

A Rust implementation of the scheduler is compared with the original C version of ChibiOS to test how the Rust scheduler performs. A few different benchmarks will be run to test the difference.

The benchmark uses an internal timer. During this test, the STM32F3 Discovery will have a generator thread and a sender thread running. The generator thread wakes up every three milliseconds. After this wake-up, it will calculate the difference between the current time and the last wake-up time and add the output to a circular buffer. There is also a sender thread running that will wake up every two milliseconds and send the contents in the circular buffer over UART to the output channel. This setup is similar to how Cyclitest works, which is built to benchmark Real-Time Application Interface Linux (RTAI).

Additionally, there are some worker threads that generate stress on the system by generating a configurable amount of random numbers. The worker threads will toggle a random light on the board depending on the result of the generated work. Using the result of the generated work ensures that the work generated does not get optimized away by the compiler.

There are multiple tests run with different workloads. The changing parameters are the number of worker threads running and how long the threads are running. The calculation that these threads are doing is running a small random number generator (RNG). As soon as this RNG completes, it will have a small chance to toggle one of the LED pins depending on the result of the RNG. The RNG takes a different RNG and executes it multiple times. It xors the results together and returns the result. The number of times this RNG executes its child RNG can be modified to increase or decrease the workload per thread.

Measurements are taken over about a 5-minute interval to create around 100.000 measure-ments. The first 100 and last 100 measurements are thrown away. The starting and stopping of the RTOS from the debug interface could influence those results.

4.2 Results

Some of the graphs are split into two parts. The difference between these parts is 1/10 of a millisecond. This difference is the tick size of the configuration of the OS. Every 1/10th of a millisecond ChibiOS will increase the tick count and execute any attached timers. Why there only is a split for some amount of stress threads is unclear.

Additionally, some banding can be noticed in the graph. This banding happens around every 250 clock cycles. This banding shows up in every measurement except for the case that there are no worker threads. It is unclear what causes the banding. One hypothesis, that the interrupt only gets called in a 250 clock cycle interval cannot explain it, since the C kernel does not show this problem. Additionally, the specification of the used architecture shows that an interrupt should take 12 clock cycles [15].

(22)

Figure 4.1: Measurements of running the schedulers with 64 workload threads. Banding can be noticed at 250 clock cycles intervals for the Rust scheduler. The C scheduler seems to have some banding as well, but this is a lot less pronounced.

Figure 4.2: Measurements of running the schedulers with 32 workload threads. There is a split where some measurements take around 3.0 ms (216000 clock cycles) and some take around 3.1 ms (223200 clock cycles). Both schedulers show this behavior.

(23)

In clockcycles Average Stdev Max 0 threads 215888 — 215890 2.85 — 3.50 215993 — 216019 16 threads 215782 — 215852 6.99 — 95.1 216087 — 216246 32 threads 215884 — 215830 80.1 — 79.3 216355 — 216397 64 threads 215887 — 215891 266 — 355 216882 — 216904

Table 4.1: Measurements of the scheduler performance using different characteristics. On the left hand side is the C measurement, on the right hand side is the Rust measurement. In case a graph consists of 2 parts, the leftmost part is chosen.

Another hypothesis is that the Rust scheduler has a loop where every iteration takes about 250 clock cycles. This loop causes the resulting measurements to show banding. Both schedulers use a loop to decide where the lower priority tasks are on the ready list, as soon as a higher priority task preempts the lower priority task. So the question stands why the C scheduler does not show similar banding. The hypothesis of whether the Rust loop causes the banding is not tested.

(24)

(25)

CHAPTER 5

Discussion and conclusion

The results show that the determinism of the Rust scheduler is comparable to the original C scheduler. However there are some minor differences. The start time of a task is more evenly distributed in the Rust scheduler. The original C scheduler has fewer measurements close to the bounds, in comparison to the Rust scheduler, which has more measurements close to the bounds. Although there are minor differences, these results show that Rust is deterministic enough to replace C for simple programs. The current implementation is not complex enough to extrapolate these results because it does not use the safety features of Rust. The question remains whether these result can be extrapolated to more complex programs.

In this thesis, the safety features of Rust have not been thoroughly used. The interoperability between C and Rust is unsafe, which means that all current operations contain an unsafe part because the scheduler is executed by C. Building an RTOS from scratch in Rust could eliminate these unsafe parts. Additionally, the Rust scheduler never has the problem of dangling references. A thread created at some point will exist until the RTOS shuts down. One of the causes of C errors, dangling pointers, that could have been solved by using Rust, is not tested since there are never any references to dangle since all references are static by nature.

Using the safety features of Rust should not significantly impact the deterministic perfor-mance of the program. Most safety checks by Rust are done at compile time. The run-time overhead of the safety features is expected to be negligible. This makes it highly probable that it is possible to build an RTOS in Rust with competitive performance to C or C++ RTOSes. Further research is required to verify that writing an RTOS in Rust is doable.

(26)

(27)

Bibliography

[1] Nicholas D Matsakis and Felix S Klock II. The Rust language. Ada Lett., 34(3):103–104, October 2014.

[2] Abhiram Balasubramanian, Marek S Baranowski, Anton Burtsev, Aurojit Panda, Zvonimir Rakamari, and Leonid Ryzhyk. System programming in Rust: beyond safety. ACM SIGOPS Operating Systems Review, 51(1):94–99, 2017.

[3] Tock embedded operating system. https://www.tockos.org/. [4] Redox - Your next(gen) OS. https://www.redox-os.org/.

[5] Jane WS Liu, Ajit Narayanan, and Quan Bai. Real-time systems. Citeseer, 2000.

[6] Joachim Wegener, Harmen Sthamer, Bryan F. Jones, and David E. Eyres. Testing real-time systems using genetic algorithms. Software Quality Journal, 6(2):127–135, 1997.

[7] Miguel Masmano, Ismael Ripoll, Alfons Crespo, and Jorge Real. Tlsf: A new dynamic mem-ory allocator for real-time systems. In Real-Time Systems, 2004. ECRTS 2004. Proceedings. 16th Euromicro Conference on Real-Time Systems, pages 79–88. IEEE, 2004.

[8] Amit Levy, Bradford Campbell, Branden Ghena, Pat Pannuto, Prabal Dutta, and Philip Levis. The case for writing a kernel in Rust. In Proceedings of the 8th Asia-Pacific Workshop on Systems, page 1. ACM, 2017.

[9] Robert I Davis, Ken W Tindell, and Alan Burns. Scheduling slack time in fixed priority pre-emptive systems. In Real-Time Systems Symposium, 1993., Proceedings., pages 222–231. IEEE, 1993.

[10] Lui Sha, Ragunathan Rajkumar, and Shirish S Sathaye. Generalized rate-monotonic scheduling theory: A framework for developing real-time systems. Proceedings of the IEEE, 82(1):68–82, 1994.

[11] C. L. Liu and James W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. ACM, 20(1):46–61, January 1973.

[12] Giuseppe Lipari. Earliest deadline first. Scuola Superiore SantAnna, Pisa-Italy, 2005.

[13] Bernhard HC Sputh and Eric Verhulst. Virtuosonext: Fine-grain space and time partitioning rtos for distributed heterogeneous systems.

[14] M. Vaidehi and TR Gopalakrishnan Nair. Multicore applications in real time systems. CoRR, abs/1001.3539, 2010.

[15] ARM. Cortex-M3 technical reference manual, ARM DDI 0337E, 2006. Also avail-able at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/DDI0337E_ cortex_m3_r1p1_trm.pdf.

An RTOS for embedded systems in Rust

Bachelor Informatica