Ethernet implementation in Clash

(1)

Supervisors:

dr.ir. A.B.J. Kokkeler ir. H.H. Folmer dr. ir. P.T. de Boer dr. ir. J. Kuper Computer Architecture for Embedded Systems Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente P.O. Box 217 7500 AE Enschede The Netherlands

Faculty of Electrical Engineering, Mathematics and Computer Science

Master Thesis

Ethernet implementation in Clash

Rik (H.W.) Strijker

January 2021

(2)

(3)

Abstract

Connecting to the Internet is becoming more common by the day, sometimes users do not even know a device is actively connected to the Internet. The way in which devices connect and communicate on the Internet is written in so called Request for Comments (RFC)’s. The RFC’s are constantly updated to ﬁx security issues or to keep up with the scale of devices connected to the Internet by introducing new Ethernet standards to be adopted by the Institute of Electrical and Electronics Engineers (IEEE).

The growing scale of devices connecting to the Internet and communicating via an Ethernet protocol makes the use of Application-speciﬁc integrated circuit (ASIC)’s and Field Programmable Gate Array (FPGA)’s more common. In general ASIC’s and FPGA’s are used to perform tasks where large amounts of data are processed using the same steps. There are many Ethernet standards implemented on an FPGA to route messages to the correct receiver or even to handle (parts of) the communication and send a complete reply. However, large parts of the communication can be handled by the FPGA, there is no complete Ethernet stack supporting multiple protocols which is still ﬂexible for future updates of RFC’s.

This project explores the possibilities of making a flexible Ethernet stack by using Clash. Designing a complete Ethernet stack is too big of a task to complete within one project, however the stack should be easy to extend and update in the future. Security plays a crucial role during development, it should be possible to detect what protocol is sent and filter all information according to the latest RFC’s. The implementation within this project focuses on both Internet Protocol version 4 (IPv4) and Internet Protocol version 6 (IPv6), although in the implementation other Ethernet types can be add as well, for example EtherCAT. The Ethernet stack is made and tested in Clash up and until Dynamic Host Configuration Protocol (DHCP).

The implementation of this project can be used to send and receive messages in both IPv4 and IPv6 using User Datagram Protocol (UDP). One important part of IPv6 are the extension headers that can be add to a message. The implementation can ﬁnd extension headers in a received message even when the extension is not implemented in the system. To keep the implantation small, custom bit encoding is used for the Ethernet type, the IPv6 extensions and the DHCP options. For extensions and options with a ﬁxed length this length is used in the data constructor of the corresponding option or extension.

iii

(4)

IV ABSTRACT

(5)

Acknowledgments

Now I have ﬁnished my MSc-thesis, ﬁrst of all I would like to thank my supervisors ir. Hendrik Folmer from the University of Twente and dr. ir. Jan Kuper from QBayLogic B.V. for their patience and for their feedback on my work and guiding me during this project. Thank you for the time you have invested in me in this process.

I would also like to take this opportunity to thank my wife Angela and my parents for reading my thesis over and over, to ﬁnd all spelling mistakes, to suggest accurate wording and to ﬁnd all correct words used in the wrong phrases and vice-versa. As a person with dyslexia I could not do without such support. Without their feedback this thesis would be much harder to read and to understand. I would also like to thank the graduation committee members dr. ir. André Kokkeler and dr. ir. Pieter-Tjerk de Boer for their valuable and friendly feedback on this thesis. And I would also like to thank the team working at QBayLogic B.V. for providing valuable advice during this project. I have had a great time working at QBayLogic B.V. and learned there how to use Clash and I learned about some of the problems you have to face within compiling technologies.

Finally I would like to thank Albert Steendam, my remedial teacher a long time ago. He has not assisted directly with this thesis, but without his efforts I never would have successfully ﬁnished my school career and my Master study. Without his help I would not have been able to read or write in English or even Dutch.

v

(6)

VI ACKNOWLEDGMENTS

(7)

List of acronyms

ASIC Application-speciﬁc integrated circuit ARP Address Resolution Protocol

CPU Central Processing Unit

DHCP Dynamic Host Conﬁguration Protocol DNS Domain Name System

ECN Explicit Congestion Notiﬁcation ESP Encapsulating Security Payload FPGA Field Programmable Gate Array HDL Hardware Description Language HLS High Level Synthesis

HOPOPT Hop-by-Hop Options LSB Least Signiﬁcant Bit

ICMP Internet Control Message Protocol

IEEE Institute of Electrical and Electronics Engineers IANA Internet Assigned Numbers Authority

I/O inputs and outputs IP Internet Protocol

IPv4 Internet Protocol version 4 IPv6 Internet Protocol version 6 IRT Isochronous Real-Time MAC Media Access Control MMU Memory Management Unit MPU Microprocessing Unit MSB Most Signiﬁcant Bit PHY physical layer

RFC Request for Comments

ix

(10)

X LIST OF ACRONYMS

OSI Open Systems Interconnection TCP Transmission Control Protocol UDP User Datagram Protocol

VHDL Very High Speed Integrated Circuit Hardware Description Language VLAN Virtual LAN

(11)

Chapter 1

Introduction

1.1 Problem statement

In the modern world connecting all sorts of electronic devices to a network has become more important, not only computers and telephones but freezers, lamps and sensors in the soil of a plant are connected to the Internet. This is not only the case in a home or office environment but in industries as well. Examples in industries are where valves are used for controlling cylinders that can report when the cylinders need maintenance. There are multiple new industrial Ethernet protocols used in machines to read or write the state of or to a motor and other inputs and outputs (I/O) [1]. Industrial Ethernet is a special kind of Ethernet used in machines but which uses the same principles and cables as regular home or office Ethernet. The above implies more complexity of development and needs flexibility for later security updates. With the large growth of devices connected to the Internet and more companies making products which can connect to these networks, it is important to stay ahead of competitors. This means the time between development and shipment to the costumer has to be even shorter, this so called ’time to market’ makes it more important to have a well tested and flexible network stack within a project.

There are multiple chips that can connect to a network. An example is a Central Processing Unit (CPU) in your PC, or a Microprocessing Unit (MPU) within embedded systems. In both systems the largest parts of the Ethernet stack is handled by software. It is easy to update parts of the software via the operating system running on the processor. This can result in an easy to conﬁgure and well updated Ethernet application. There is one practical problem when using a processor, on one single processor there can be interrupts, user applications and an Ethernet stack running all at the same time. These processors are not suitable for multiple handling tasks simultaneously. The network process will now be less predictable and could be interrupted at any point in the process.

For most applications interrupts are not a problem, however it can be for industrial networks and applications. Because lots of data is processed and predictability is not guarantied. To avoid these problems the CPU can be replaced by a so called FPGA. On an FPGA hardware can be conﬁgured for a certain task, for example to process Ethernet frames. Data within an FPGA is often processed in parallel, especially within systems where there is time critical data. One example of industrial Ethernet often implemented on an FPGA is EtherCAT [2].

To conﬁgure an FPGA a Hardware Description Language (HDL) is used, either directly written for an 1

(12)

2 CHAPTER 1. INTRODUCTION

FPGA or generated from another language. A beneﬁt of HDL is full control over the hardware place- ment, but it results in more complex code. For large projects the complexity leads to difﬁculty for maintaining or expanding the code. The alternative is to use a High Level Synthesis (HLS) language to generate HDL. Using HLS increases readability by adding a level of abstraction from the hardware.

Generally a HLS, like C, is not only intended for hardware design, which sometimes leads to complex hardware realizations. To design a multi protocol network stack running on an FPGA which is generic and easy to expand, HDL is not suitable. The HLS must give control over the bytes on the FPGA to provide a ﬂexible and well tested communication stack.

Clash, on the other hand, offers the possibility for embedded languages as data types, a feature that seems useful to express protocols. Clash’ type system provides the tools to create an embedded language easy to test and expand [3]. Clash will give full control over the generated hardware to prevent complex hardware realizations. The ultimate goal is to investigate the possibilities for data exchange which can support multiple Ethernet based protocols. Processing an Ethernet frame on an FPGA has been done before, however this thesis investigates a generic, expandable and testable multi protocol solution.

The desire for a full and extendable multi protocol solution is recognized widely, but to the author’s knowledge only partial solutions exist. Hence, it is worthwhile to investigate whether Clash offers a suitable approach for a possible solution. The research of this thesis is carry out within QBayLogic B.V., the developers of the Clash compiler.

1.2 Research questions

In the forgoing it is stated that using Clash to read an Ethernet frame could be beneﬁcial. This leads to the central question of this research:

What (dis)advantages will the Clash type system offer when investigating the implementation of a ﬂexible, expandable and generic multi protocol network interface on an FPGA?

In support of the main question the following sub-questions will be addressed.

• Is the type system of Clash suitable for building a network stack?

• Can IPv6 extension headers be handled reliable on an FPGA using Clash?

• In what way does the Clash type-checker help to handle Ethernet frames?

• How can similarities between protocols be handled in the Clash type system?

To answer the questions above a proof of concept of an Ethernet stack has to been implemented in Clash. This system should be designed in a way that new protocols can be added easily. Important elements in the design are the IPv6 extension headers. In the future there will be new extension headers or patches for security problems released. Chapter 4 contains the analysis of the Clash implementation, where the components needed to make an Ethernet stack are described.

(13)

Chapter 2

Background

There are a large number of Ethernet protocols used in different applications. In the world of embedded systems an increase of Ethernet use can be seen in all sorts of operational ﬁelds [4]. This leads to more Ethernet solutions, not only in CPU but in FPGA’s as well. The increase holds for known protocols and for industrial applications. With the increasing connectivity and the (re)use of standard building blocks there is a greater need for an easy to conﬁgure and well tested Ethernet stack.

Every message sent over for instance an Ethernet wire should comply to the RFC standard. This is the basis of Ethernet communication and describes the layout for every message and protocol. One model used to describe the position and functionality of different protocols is the Open Systems Inter- connection (OSI)-model. The OSI-model is used to describe how a computer and server communicate with each other. Ethernet communication is possible on an FPGA, to describe this in Clash knowledge about the type system is needed.

The ﬁrst part of this chapter discusses the basic functions used when visiting a website. The second part will describe how the OSI-model works, and describes the difference between home and ofﬁce Ethernet and Industrial Ethernet. The last part contains the features in Clash that can be helpful building blocks in designing a parser.

2.1 Ethernet network

To understand how a computer is connected to a network take the following example. Someone is using the computer to visit a website, for instance example.com, and types in http://example.com/ in the address bar. The computer has to ﬁnd out where the server is located (on what IP-address), and should make a connection in order to get data back to be shown on the screen. Before doing so the device must have an IP-address.

2.1.1 Getting an address

When a device connects to a network, wired or wireless, it will try to get an address, for example an IPv4-address to identify the device within the network. When there is in this same network a DHCP- server(s) present, used to assign the IPv4 address, the device will use the DHCP-server. The device and server will send messages to each other to agree on what IP-address the device will live on. This

3

(14)

4 CHAPTER 2. BACKGROUND

IP-address is a numeric address. The DHCP-servers can be seen as a municipality where one has to register when moving into a new house without an address. The municipality will take the registration and assign a unique address. In order to do so they need one’s personal code to know who will live in the house. For devices the same methodology is used, the personal code for devices is called the Media Access Control (MAC)-address. This is a global unique number stored in the network adapter of the computer. When all registration is done and the addresses are known the real browsing will start.

To get the IPv4-address from the DHCP-server the device sends a message with his MAC-address to the DHCP-server. The DHCP-server will assign an IPv4-address to the device.

2.1.2 Find the website

Lets go back to the web browser, typing in http://example.com/ will display a website. The address is plain text, however the device needs a numeric address to find the server before it can send your message. Figure 2.1 shows a flowchart of this process. The computer is connected to the Internet through a router. The device will first send a message to the Domain Name System (DNS)-server, via the router. The router contains a list of connected devices or servers, it will route messages to the destination. This is comparable to a logistics centre where packages are collected and brought to the correct truck via some complex system of conveyor belts. The truck will deliver the message, sometimes via different logistics centres.

Finally the textual address can be converted into a numeric one, this is done in the DNS-server, "the phone book" of the Internet. This number will travel back to your device, over a route via some routers and will arrive some time later back at the left side of the ﬁgure. Now all information needed by the device is present and can be used to load http://example.com/.

This process can be more complex; sometimes the DNS-server does not know where to ﬁnd the webpage one is looking for. However, it will give you the address of a server which has more information.

On the other hand most devices have their own list of addresses for servers that are most often used, like the list of contacts in a mobile phone. The numbers that are most often used are saved with some textual representation such as the name of a person.

Device

Router

DNS Router

Device Time

Figure 2.1:Sending a DNS request, with the steps and time.

2.1.3 TCP request

In order to simplify the description of the process outlined in the following example the routers will not be mentioned any more. A website is loaded via a so-called Transmission Control Protocol (TCP)

(15)

2.1. ETHERNET NETWORK 5

ﬂowchart. Figure 2.2 shows what happens over time (without showing the details). First the device will send a request, a so called synchronize (SYN) package. Resulting in an acknowledgment (SYN-ACK) from the server. Sometime later the connection will be acknowledged by the device, the requested information is sent back to the device and can be shown to you, an acknowledgment is sent back to the server. Most servers will send data in bits and pieces which have to be reassembled in the device to form one webpage.

The description above is a basic example of a server-client system, all connected via routers. This type of network is request driven, servers will only send data on a clients request. There are innumer- able servers and clients all over the world. Not all servers should reachable for all devices, to protect networks a firewall is used. Access to the network is organized with black and whitelisting in the firewall, all devices on the blacklist will be blocked, only the ones on a whitelist can get access. Network operators will manage this firewall.

Device

Server Request (SYN)

Device

SYN-Ack

Server Device

Ack Data

Server Ack

Time

Figure 2.2:Sending a TCP request, with the steps and time.

2.1.4 Master Slave Network

The server-client networks are not the only network topologies, there are also Master-Slave systems.

In this type of networks there is one Master connected to one or more Slaves. This system is primarily used in industrial system control. In contrast to a server-client network the Master-Slave topology has a cyclic behavior. When the communication starts, the Master ﬁrst discovers the number of Slaves, the physical location in the network and how much data the Slave needs and can send back.

Take for example a simple system containing one Master and two Slaves like in Figure 2.3, the ﬁrst Slave (in the middle) has 3 inputs and 5 outputs, the second has 4 inputs and 2 outputs. The Master will discover the amount of data needed by the Slaves before the communication starts, and will construct the message in a way both Slaves have exactly the space they need.

Using the example where there are 7 inputs (3 and 4 for Slave 1 and 2 respectively) and 7 outputs (5 and 2 for Slave 1 and 2 respectively) you would expect the message to have a length of 14 bits because this is the total amount needed. However, the Slave will ﬁrst read the output data and then ﬁll in the input bits. This is possible because Slaves will always send the same amount of data and Slaves can not get to another position when sending data. This has the advantage for the length of a message

(16)

because the bits for the output will be reused for inputs. The amount of data is the maximum size of the inputs and outputs of one Slave, the largest number will determine the bits needed for one Slave.

This will result in 5 and 4 bits for Slave 1 and 2 respectively because Slave 1 needs at least 5 bits to represent all outputs, Slave 2 needs 4 bits to represent all inputs. Hence the total length is 9 bits instead of 14. The flowchart in Figure 2.4 shows that the Master will send a message to Slave 1, it will take out the outputs and fill in the input fields of the message and will forward it to Slave 2. Slave 2 will drain the outputs from the message and fill in the input fields and will send it back to the Master via Slave 1. Slave 1 will not update or add any fields, and therefore has been left out in the figure.

Back at the Master all information will be processed and a new state will be calculated to repeat the process all over again. This does fundamentally differ from the server-client model where you need to send a request to every server you want information from. http://example.com/ will never send data to a client that did not ask for data. A client will only ask for data when needed, in the Master-Slave network this decision to ask for updates is made by the Master, even if there is no update there will still be messages send.

Master Slave 1:

3 bits in 5 bits out

Slave 2:

4 bits in 2 bits out

Figure 2.3:Message ﬂow in a Master Slave system, there is always one Master and there can be multiple Slaves.

Master

Slave 1

Slave 2 Master

Time

Figure 2.4:Flowchart of a message send over a network containing one Master and two Slaves, nodes are only shown when they process the data of a message.

2.1.5 Ethernet hierarchy

There are many different ways to send data over a network-cable or wireless, using all kinds of setups.

This research will only focus on Ethernet frames. However, the data is not necessarily processed in one stack. A leading part of this research is how to combine the different protocols used on the Ethernet wire in one stack. Every device connecting to a network will use (parts of) the OSI-model.

This includes clients, servers and routers. This OSI-model describes in layers how a device should interpret a message in seven abstraction levels.

(17)

2.2. OSI-MODEL 7

2.2 OSI-model

The OSI-model has seven layers, containing: Physical, Data Link, Network, Transport, Session, Pre- sentation, and Application [5]. In Figure 2.5 the OSI-model is depicted on the left, on the right there is an example depicted, both are explained below. The OSI-model is divided into three parts, the up- per three layers are application dependent. These are often taken care of by speciﬁc applications.

The lowest two layers are used by all protocols, data will always travel from one point to another, so a Physical layer is always present. Within the Data link layer the basic functionality is always being used, sometimes this layer is extended to get higher performances. For example in EtherCAT the Data linklayer is used to make a connection between the Physical and Application layer, the Data link will handle the content from an incoming message and will then construct the outgoing message by ﬁlling it with application data.

The OSI-model is best described by the following example. Take the Application layer, where there is a letter send to someone. A message is written and the name of the receiver is included. An as- sistant (the Presentation layer) will encrypt the letter to prevent others from reading it and then gives it to the mail department. All letters are sent in a box to be sure that the paper will not be damaged during transport and delivery (Session layer). When sending a box it should be clear that the receiver is at home, because there are no neighbors to take care of it. The Session layer will check this before sending the box.

All information needed by the mail department is sent via the mail company, they have a delivery point where you can bring the package and where it is weighted to select the best service (Transport layer).

There are two scenarios, either they can send the message really fast, but parts of the letter may be lost, or they send it more slowly in a more reliable manner, where after every page the receiver has to send an acknowledgment back to be sure every page is received.

The message will go to the logistics center to get it in the correct truck for delivery (the Network layer), there they will replace the name with a barcode (Ip address). The truck selected by the logistics center can be too small for your message, in that case it is cut in pieces to be reassembled at the receiver.

Next the mail man will put the packages in his truck (Data link layer) and adds the complete address of the sender and receiver (mac-address). The truck will bring the message to the receiver, there all steps are repeated but in reverse order.

The above works really well for loading a webpage or sending an e-mail. For streaming applications the checks are less important, the sender will not check if you are still home, it will just keep sending till it gets a letter back saying you moved away.

As discussed earlier, not all protocols will make use of all layers. Within some solutions where a low latency is a requirement, like industrial protocols, it is common to omit layers [6].

2.2.1 Industrial Ethernet

When using the complete OSI-model (for example when browsing to a website) many checks are performed to make sure all requested data has been sent and received correctly. These checks and retrying when data packages get lost, takes a lot of time and processing power. Within ﬁelds where

(18)

Physical Data Link

Network Transport

Session Presentation

Application

Truck Mail man Distribution center

Post ofﬁce Mail department

Assistant Letter

Figure 2.5:The seven layers of the OSI-model.

the amount of data is large and time is critical (for example in streaming audio or video) a delay can make it difﬁcult to have a conversation, whereas missing a frame is not a critical problem. When in this case the majority of frames arrive on time you will still understand the conversation. Sometimes it is important to have no delay at all (for example in industry), missing a frame can be catastrophic when in this frame an Emergence-stop is set. To make the communication more robust the Master will send an update every millisecond, where most of the content of this message will be the same.

Because of the redundancy within sending messages fewer checks are needed to still guarantee the arrival of data.

Less checks and more speed is the reason to choose for the use of less OSI layers. Let us go back to the example of sending a letter. Suppose you would like to know that it is delivered properly. But you need an ambulance and have to order it via a letter. In that case it is less important how data is represented as long as there is a guarantee that it will be delivered because in fact you only want to send an SOS. In this case, it does not matter if anyone between you and the receiver can read what the content of the message is. You will run to the mail man and give him the message, if needed it will be cut in pieces by the mail man. Because of the hurry he will even write the address on the envelop for you and put it in the truck. The downside to this approach is you will never know if the message has been received. But you can not wait for the receiver to answer your letter because the ambulance is really needed. You send the same letter over and over again until the ambulance has arrived. After all this stress you are in need for a coffee. You send a letter in the same way, asking for coffee and keep resending it till the coffee arrives. Again there is no conformation required whether the letter has been received, because at some point you will get a response in the form of a cup of coffee. This is what happens in high speed industrial Ethernet protocols. All time-consuming parts of sending information has been cut away. Messages will be short, mostly set or reset, for example a value like temperature.

Messages will always travel a short physical distance, most of the time in a machine or factory, i.e.

Within some (kilo)meters, in any case not to a server at the other side of the world.

(19)

2.2. OSI-MODEL 9

2.2.2 Properties of layers

In the Physical layer there are only variations in speed, a message could be sent over a ﬁbre optic line (for example a sports car instead of the truck) or an old telephone line (for example walking instead of using the truck), but it is still the same data, only the representation differs. In the Data link layer a few variations are possible, most packages will be in an Ethernet frame (the mail man could be a woman as well), only the package inside a frame differs. However, the Network layer offers many different implementations, there are all sorts of systems using an Ethernet frame for sending data. However, in the OSI-model only IPv4 and IPv6 are used (there are many different logistics centers, they are specialized in different types of goods). In the Transport layer there can be multiple ways to transmit data, e.g. UDP and TCP are the well known examples of protocols on the Transport layer. Above the Transport layer, shown in Figure 2.5, processing will be more complex and is done by applications to interpret data. The Physical and Data Link layer do not variate a lot in terms of implementation, the same basic hardware components are needed in an FPGA. In the implementation of the Network and Transport layer large variations are possible. This is why in Figure 2.5 these layers are between horizontal bars. However, some protocols will extend the Data link layer, for example in industrial Ethernet like EtherCAT. This is shown in Figure 2.6 where all the three layers used by EtherCAT are depicted. The Physical layer is exactly the same as a standard OSI implementation, but the Data link layer has a loop back function to create a ring bus structure and it has a synchronization mechanism [7]. The next layer used in EtherCAT is the Application layer, this layer provides the user with data or will switch the physical I/O. This is only one example of different implementations of the OSI-model where layers are omitted. The same type of modiﬁcations are done in the PROFINET Isochronous Real-Time (IRT) protocol. In most network implementations on FPGA’s only the bottom four layers of the OSI-model are being used [8], [9]. From the Session layer upwards making decisions on what to do with data will be a user or application task, in most systems this will be done by a processor.

Although there are lots of services using different protocols within Ethernet, they can all coexist on the same medium and most of them can be processed on the same hardware. Modern computers will have no problem with these kinds of multi protocol networks, with the correct software they can even be used as an industrial Ethernet Master.

Physical Data Link Application

Cable Process Data

Loop back I/O or CPU

Truck Mail man

Letter

Figure 2.6:The OSI-model used in EtherCAT communication.

(20)

2.3 Aspects of Ethernet

The OSI-model offers a way to describe network communication is seven layers of abstraction. The model is used in all kinds of devices serving different needs. The model is generic, delays or security are not embedded in the OSI-model. However, for making an Ethernet stack security and delays are important to think about in advance. There is no description for a protocol in the OSI-model, all protocols are described in so called RFC’s. The RFC’s are important guidelines when developing an Ethernet stack. The Delays, security and RFCs mentioned above are described in more details below.

2.3.1 Delays

As said, above delays in Ethernet are not a real problem, computers are really good at general Ethernet communication. However, in speciﬁc applications a standard (Embedded) computer will introduce undesirable delays or has a large power-consumption [8]. This is one of the reasons for using an ASIC or FPGA in industrial applications. There are integrated micro controller cores for EtherCAT, the core will keep a connection with the Master. The EtherCAT core will provide real-time processing of industrial protocols, although there is no clear deﬁnition of Real-time Ethernet. There are calculations for the expected delays. So when a implementation is made in an FPGA it is important to keep delays in industrial Ethernet small.

There are different ways to generate hardware on an FPGA, depending on the manufacturer, Very High Speed Integrated Circuit Hardware Description Language (VHDL) and Verilog are two widely used programming languages for FPGA’s. They are used to design large systems at bit-level. It is less convenient to use them for data-ﬂow problems. Data-ﬂow modeling on an FPGA is easier when done in a language like Clash.

2.3.2 Security

One way to keep intruders out of a network is by using a firewall, which has to be configured so not all messages can go into the network. Recent research [10] revealed problems regarding correct configuration of firewalls in combination with IPv6. More details about IPv6 can be found in Section 4.2.3. This kind of misconfiguration can lead to hacking a network. Especially with more internet users, getting past a firewall is more beneficial for hackers.

When a problem is found in the guideline for implementing IPv6 a new RFC is published. RFC’s are documents describing how a sender and receiver should behave within a network, in Section 2.3.3 this will be explained further. Writing a new RFC will take time and it has to be implemented in all running devices to make sure that the security problem has been solved, in practice this never happens because the scale of internet has become too large. When there is an implementation mistake in the FPGA or when a new RFC is published, an update of the FPGA is required. Updating an FPGA at run-time is not common for various reasons, so when there is a mistake in the Ethernet core this will be hard to ﬁx. Therefore it is better to use a design language that can be easily veriﬁed and tested. A possible design language that has these characteristics is Clash. By using Clash instead of HDL most security and testing problems should be easier to tackle; using the strong type-checker of Clash will discard extension headers when faulty or unknown. To add a new extension header to the

(21)

2.4. ETHERNET FRAMES 11

Ethernet stack only the type should be extended, all other ﬁelds can be packed in the same type. These techniques can be a step to create a more robust Ethernet implementation using Clash compared to traditional FPGA languages.

2.3.3 RFC

The RFC’s are important for implementing an Ethernet stack, Ethernet is an IEEE standard. Most of the recent RFC’s are published by the Internet Engineering Task Force. The RFC’s are written and reviewed carefully, however there is always a possibility of mistakes in (parts of) protocols [11]. Every message can be seen as a frame, built using the OSI-model, every layer has a number of possible protocols and there will be an RFC describing the function and layout. To send a frame multiple layers are used, so there is always more than one RFC involved. Not all protocols will have their own RFC, some are based on multiple RFC’s [12].

2.4 Ethernet frames

For every website we visit, E-mail we send or when starting the computer, parts of the OSI-model are used. The Physical layer of the OSI-model (at the bottom of Figure 2.5), is the layer where a message is send over a medium, either wired or wireless. To get data from this connection an ASIC is needed.

Before sending data the ASIC, known as the PHY, has to be conﬁgured. Building a physical layer (PHY) on an FPGA is not possible without using other electronics. When the conﬁguration of the PHY is done, all data given to the PHY will be sent over the medium. The PHY and cables (or antennas) together form the Physical layer. In Section 2.2 the Physical layer is compared with the truck transporting a letter. From this wire the FPGA will receive an Ethernet frame, the structure is shown in Figure 2.7.

The Ethernet frame is passed to the Data Link layer to be processed. The Ethernet frame starts with an Ethernet header followed by the payload, in Figure 2.7 an IPv4 or IPv6 is used as an example. The

"Payload (IPv4/6)" can again be split in a header (IPv4/6 Header) and payload (Payload (UDP/TCP)).

Interpreting the header is done in the Network layer, in Figure 2.7 this is the second line, starting with the IPv4/6 Header followed by a payload. Again, in the next layer a header is used to interpret the payload, the last line of Figure 2.7 shows this header and payload. This header is needed by the Transportlayer to select the correct Session layer protocol send in the payload of the Transport layer.

As can be seen, every layer needs information on how to interpret data send in the payload.

The ﬁrst header of the Ethernet frame is needed to know where a message came from and where it should be send to. Every frame sent over a wire will start with a header, regardless what protocol is used inside a frame. The header will always be present in an Ethernet frame and will be used in the Data Linklayer. This header is important, in this header the protocol used in the Payload is represented.

The Payload is where the real data is stored.

(22)

Ethernet frame Ethernet Header Payload (IPv4/6)

IPv4/6 Header Payload (UDP/TCP)

Header UDP Payload (DHCP)

Figure 2.7:Stacking of protocols in an Ethernet frame.

2.5 Clash

FPGA’s are conﬁgured (also called programmed) using a HDL. There are different FPGA’s on the market, and different HDL’s. Two standardized HDL’s are VHDL and Verilog.

Processing an Ethernet frame on an FPGA has been done before, however never by using Clash. Us- ing Clash could give a more ﬂexible solution compared to a mainstream HDL, Clash has Powerful abstraction mechanisms, which may give more dynamic to design and test an implementation. When designing hardware on an FPGA it is efﬁcient to reuse function blocks of hardware for the same calculation. This can only be achieved when the two parts of the system do not run at the same time, this is one way to get a small implementation on the FPGA. Similarities between layers can be used to achieve this goal, for example the Header_Checksum calculation of an IPv4 header and UDP frame are equal and will never be calculated at the same time.

It is common to design hardware of an FPGA using a HDL for the synthesis of the selected FPGA.

There are different FPGA vendors, most of them make use of VHDL or Verilog to synthesize hardware. In this design process Clash can be used to get from a (mathematical) Haskell specification to hardware; in Figure 2.8 this design flow is depicted. The scheme starts with a mathematical specification in Haskell, like an algorithm that can be tested in isolation and as a complete system. The Haskell program is easily translated to Clash. The hardware behavior can be tested in Clash during the design, from this program and the tests HDL is generated. The generated HDL can be both VHDL and Verilog, for both languages tools are available to test systems. The same tests written in Clash will run in the HDL as well and can run using the vendor specific tools. Then the HDL can be used to synthesize hardware, again using the tools provided by the FPGA vendor. Using this flow makes it easier to find mistakes in the algorithm because all parts can be tested for functional correctness in isolation before being used as a system, where in HDL the design is about the timing in hardware.

Clash has the advantage of a strong type solver, where a function will only produce the speciﬁed type.

In Ethernet this can be useful, for example in IPv4 and IPv6 overlapping ﬁelds can be stored in the same type, in hardware this will result in using fewer registers. For example time_to_live and hop_limit have the same practical use case.

(23)

2.5. CLASH 13 Haskell/Mathematical speciﬁcation

Clash

Synthesize

Hardware

Figure 2.8:Design ﬂow of hardware when using the Clash language.

2.5.1 Types

The frames described in Section 2.4 can have different EtherTypes, this speciﬁcation will be used to only pass frames with a known protocol. This behaviour can be adopted in Clash. We can make a Clash type for Ethernet using the type value as an encoding for the data constructor, applying this the frame can be parsed without writing a parser. When a frame is unknown it will be discarded by the system using a wildcard. This behavior will help to avoid problems in wrongly parsing of IPv6 as described in Section 4.2.3.

The same structure will be used when handling options, the code used in the RFC to encode an option will be used in the data constructor as well. ’Packing’ the Code with the Length and Data will result in a complete parser. In the case of an unknown option the RFC speciﬁcation of dealing with this unknown option will be implemented as a fallback.

There are some important types in Clash to keep all types at a ﬁxed length, V ec is one of them, it has a built-in length. Below are described the types often used in the realization of this Ethernet core. The type system of Clash allows the user to create his own types, when the new type follows a basic set of rules it can be translated to a HDL for programming the FPGA. Creating custom types makes the type system of Clash ﬂexible, where the powerful strictness of the type checker will help to keep the design synthesizable. It is part of this research to investigate whether the Clash type system is useful for dealing with protocol handling

Maybe

One type used a lot in Haskell is the Maybe type, the Maybe type can have the value Just A or Nothing.

In Clash it is common to use this when sending data in combination with a valid ﬂag. For valid data Just data is used, when there is no valid data the sender will send Nothing. Using this data type makes creating a data bus much easier.

(24)

Undeﬁned

One special value a type can have is undefined, in the simulation this will be represented by X, in hardware this means that there could be some voltage or non at all. When a system starts there will be a unknown state so all lines will be undefined, when a reset is applied the state will be known and will propagate through the system. There can be one more cases where an undefined value is used, namely when a type has more fields as there is data loaded into a system. Assume we know there are either 1 or 3 fields of valid information. This amount is determined by some number sent before the data: a kind of header telling the length. This will result in a structure like depicted in Figure 2.9, a length field followed by 1 or 3 bytes of data.

This Input data type in Clash will result in a type of 4 bytes and one bit for encoding A and B, the data constructor. In HDL there will be 33 bits allocated even if the incoming data has length 1. Assume the information stream into the system will come in portions of 4 bytes, not all information has to be valid, this depends on the first field. In this case the stream of data can be ’packed’ in this type. Let us describe two cases for the information stream like the one in Figure 2.10. The first example is a message with length 3 and data a b and c. All fields have valid information because the length is 3.

Packing this data all fields will be used and stored in type B. The information in the second case is the same, except the length is 1 so only field a is valid. Both examples have a box around the valid fields. When ’packing’ the information coming from the first stream in the presented type it will fill all fields. The filled fields are shown at top of Figure 2.11 where the length is set to 3, the type is B and the data is stored in the data fields. For the second example there is only one field of data needed, only the length and data a are stored. The other data is not used. The first two fields will not contain valid data, so they are undefined. In the Clash simulation the values stored in these fields are not visible, in most simulators within HDL it will be an undefined value represented by a X.

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Length Data

B 3 Data Data Data

A 1 Data

Figure 2.9:At the top, the input stream is depicted. Next is the schematic representation of the data type Input, built out of two entries, A with one length ﬁeld and one byte of data and B with one length ﬁeld and three bytes of data.

Custom encoding

Clash can ’pack’ bits to other data types, this facilitates data transformation from one data type to another. Packing data can be used in combination with custom encoding, this is an important step in designing an embedded language. For the last example in Section 2.5.1 where there is an Input data

(25)

2.5. CLASH 15

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

3 a b c

1 a

Figure 2.10:Two types of input data, one has a length of three, and one with a length of one.

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

B 3 a b c

A 1 X X a

Figure 2.11:Undeﬁned data stored in a variable.

type that can either have the length of 1 or 3 bytes, Clash will use 1 bit for encoding the type and 32 bits for the data, so a total of 33 bits. In this case the encoding will result in 0 = A and 1 = B. However, in the example, Input data type has a length field, the value of this field is unique for all possible fields in this data type. There are two different lengths in this example, the length field can encode up to 255 fields in the data type, so custom encoding can be used. The custom encoding will result in a type of 32 bits where 1 = A and 3 = B. In this example there is only one bit saved, however when there are 255 fields in a type there will be a saving of 8 bits. The second benefit is the direct translation from a bit patron to a Clash type, when the patron in the input has the value 3 constructor B will be used.

Wildcard

When using the custom encoding with less fields than the maximum number of possible encodings (255 − 2 = 253), all other codes can be redirected to a wildcard. For example the Input data type can be extended with field C having 3 fields of data; in Figure 2.12 this extended type is shown. The fields Aand B are still represented by 1 and 2 respectively but C is represented by all other values. This can be used as some kind of fallback to detect errors in the data.

Records

The types described above can be used in functions within Clash. A function will always have a number of input types and one output type. To get data from one type to another a function is needed. One way to make a function for a speciﬁc type is to use a record. Listing 1 is a minimum example of a record. There is a data type called Ip (line 1) with a record called version. When the information stored in this Ip type is needed, the function version can be used, this will return the BitVector 4. The function version can also be used the other way around, when there is a type Ip and a BitVector 4 the version

(26)

16 CHAPTER 2. BACKGROUND 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

B 3 Data Data Data

A 1 Data

C Others Data Data Data

Figure 2.12:A schematic representation of the data type Input, built out of three entries, A with one length field and one byte of data, B with one length field and three bytes of data and at last the wildcard named Other with one length field and three bytes of data.

1 data Ip =Ip { version :: BitVector 4}

Listing 1: Example of a record called version in the datatype IP.

function can be used to update the Ip type.

The records are used a lot in the implementation of the Ethernet core as aliases for the fields of the different protocols on the different layers of the OSI-model. Updating one field of a type is easier when using this record syntax. In Listing 1 there is only one record field, in the implementation all fields of a protocol will have their own record.

2.5.2 Parsing

As discussed in section 2.3.3 RFC’s are the building blocks describing the layout of Ethernet frames and protocols used on a network. However, an incoming message is a stream of bits of a certain length. When the stream of bits is interpreted using the RFC’s a chain of protocols can be identiﬁed, one protocol is used for every layer of the OSI-model. The types in Clash can be made in the same way as the bit stream has to be interpreted. The translation between the bit stream and the Clash type is called parsing.

Writing a parser in any HDL can be complex. This is especially the case for large systems, as there will be a lot of cases and exceptions. When a new option is added, all paths leading to this path should be updated. When using Clash, parts of the parser can be encoded in the data type; adding an option or frame will only be an extra line in the data type to specify the length and data. The custom encoding requires an extra line to make the translation between the input stream an the Clash type. By adding this line in the encoding the extra option will be parsed automatically and stored using the speciﬁed data type. Of course functionality has to be added before the system can deal with a new option. So the parser in Clash will arise automatically, it has to be written only once. Adding the option only works with a ﬁxed maximum length.

The above is a great advantage in IPv6, where currently extension headers are still under development, so they are likely to be adapted in the future. When a new extension header will be published, the Clash implementation can be extended and tested fast. As in HDL all bits from the bit stream have to be checked one by one.

(27)

Chapter 3

Related work

In this chapter other research concerning three aspects of network connectivity will be reviewed. Start- ing with the hardware architectures that are used for connecting to a network and the way different solutions are conﬁgured. The second part will discuss two FPGA-based solutions, one for home and ofﬁce networks and the second for Industrial Ethernet. The last part of this chapter will discuss a HLS used for many Ethernet stacks, not only for FPGA’s but in routers and switches as well.

3.1 Network communication

Devices all over the world are connected via Ethernet and all modern operating systems can communicate via Ethernet [13]. There are all sorts of devices connected to the Internet for the purpose of reading data from sensors or turning things connected to the network on or off. In some cases CPU’s are replaced by FPGA’s to get more computational power in the system [14].Replacing CPU’s by FPGA’s is common when the amount of data that needs to be processed is large.

Using FPGA’s for processing and sending data over a network is possible, for example in a sensor network [15], [16] where the FPGA is used for processing and sending data. The problem with sending data over a network via an FPGA is the configuration of the network core in the FPGA, this configuration has to match with the configuration of the network. Devices sending data over a home or office network need an Internet Protocol (IP)-address. In most research the IP-address is configured by a controller or CPU. Configuration of the IP-address via a controller or CPU means there is still an FPGA and a CPU needed in one application. In all examples the FPGA is only running in the core of the network to process and send the data. At the user side of the network a micro controller or CPU is used to collect and show the data. This same approach is found in particular data centers, where the connection is managed by the FPGA [17].

For connecting to an Ethernet based network a number of IP-cores are available. The IP-core will connect the user application in the FPGA to a network interface. An example of a IP-cores for connecting to a Ethernet based network is the Intel Triple-Speed Ethernet IP-core [18], this IP-core can be used when connecting to a home or ofﬁce network and will handle all Ethernet problems. All data from the user application connected to the IP-core will be send using an IPv4 or IPv6 frame. The core is made for two Ethernet protocols: UDP and a limited TCP implementation. The core can not handle IPv6 extensions, adding an implementation for this is not possible. The Intel Triple-Speed Ethernet IP-core

17

(28)

18 CHAPTER 3. RELATED WORK

is a good option for most applications, however when you want to have an Industrial Ethernet application you need another Intel IP-core [4]. With this IP-core the largest Industrial Ethernet protocols are covered, however the Intel IP-core for Industrial Ethernet applications can not be used for sending frames over a home or ofﬁce network. The IP-core is used in combination with the Nios processor on the Intel FPGA, the IP-core is used for the real-time communication while the processor provides the conﬁguration and data.

The combination of hardware for Industrial Ethernet in combination with a processor is used in ded- icated chips as well. An EtherCAT example where an ARM Cortex-M4 is combined with EtherCAT hardware in a single chip is the XMC series from Infineon [19]. The EtherCAT core is connected to the ARM via the internal databus. This bus is used for configuration and data exchange between the ARM and the EtherCAT core. The combination of the ARM and EtherCAT core has the advantage that an ARM Cortex running at 144MHz can communicate over a real-time industrial protocol where a cycle time is less then 100 µs without much load on the controller. There are more FPGA based solutions where this approach is taken. For example in the Softing core [20] where there is a micro-controller needed for configuration and to provide data to the Industrial Ethernet application. In this IP-core there is some overlap between the hard real-time Industrial Ethernet and more soft real-time EtherNet/IP or MODBUS TCP. The downside of this IP-core is that the user has to pick one protocol at the start of the project and can not work with two protocols at the same time.

In the Industrial Ethernet example the configuration is done using a micro controller, this gives the user flexibility to configure the protocol without using HDL. There are also measurements done on EtherCAT networks where the frame size is optimized [21], [22]. In some cases the optimization of the frames will give a higher throughput. As described there are all kinds of solutions to connect an FPGA to a network.

However, there is a problem with the more generic IP-core found, the support for TCP or DHCP is missing. There is a lot of coding needed to get TCP or DHCP to work. There are cores supporting DHCP and TCP, one example is found in the open-cores website [23], the IP-core can request an IPv4 address and make a TCP connection to a server. For TCP only one connection at a time can be made by the hardware implementation. The TCP or UDP frame will be send over SPI to a micro controller putting it in an IPv4 frame and sending it to the destination. All examples above resolve some part of an implementation for a complete Ethernet stack. However, non can handle both Industrial and home or ofﬁce network communication at the same time. Most cores are connected via the Avalon bus or a custom bus. So there is space for a core where the networking is done in the user project and where expanding the stack is easy.

From the above we can conclude that there are advantages in using an FPGA in network applications.

Especially when there is a lot of data processed in this network or when a low latency is crucial. In the given examples there is a trade off being made between ﬂexibility and expandable.

3.2 Hardware-based solutions

In this research the Ethernet core is intended to be ﬂexible and expandable. There has been research done on ﬂexible and expandable Ethernet cores by others as well. Processing of UDP is done on an

Ethernet implementation in Clash

Faculty of Electrical Engineering, Mathematics and Computer Science

Master Thesis

Ethernet implementation in Clash

Rik (H.W.) Strijker

Abstract

Acknowledgments

Contents

List of acronyms

Chapter 1

Introduction

Chapter 2

Background

Chapter 3

Related work