• No results found

CoCoNutMan: Configuration Management for CoCoNut - A novel DSL-based framework for defining, documenting, and parsing program configurations

N/A
N/A
Protected

Academic year: 2021

Share "CoCoNutMan: Configuration Management for CoCoNut - A novel DSL-based framework for defining, documenting, and parsing program configurations"

Copied!
36
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Bachelor Informatica

CoCoNutMan: Configuration

Management for CoCoNut

A novel DSL-based framework for

defining, documenting, and parsing

program configurations

Hendrik Huang

June 17, 2019

Supervisor(s): dr. Clemens Grelck

Inf

orma

tica

Universiteit

v

an

Ams

terd

am

(2)
(3)

Abstract

Implementation of program flow for defining and interpreting configuration is often very similar and boilerplate-heavy. Doing so usually involves writing code which defines the configuration structure, parsing code for both the command line and configuration files, as well as documentation in the form of usage text and other documents like webpages. This is in particular an issue for compilers, which tend to have many options (e.g. GCC 8.0.3 has 717 architecture-independent options for pure C), and are nearly always offline tools without graphical configuration possibilities. A metacompiler framework in development for the UvA Compiler Construction (CoCo) course named CoCoNut (Compiler Construction in a Nutshell) presents an opportunity to develop and implement a better solution for configuration management.

This thesis presents a unique configuration framework which makes use of a compiler for a domain-specific configuration definition language to simplify writing configuration code. Also presented is the domain-specific language (DSL) which supports and extends the GNU

getoptcommand-line standard, as well as offering simple documentation, automatic pars-ing of options, consistency checkpars-ing, and easy definition of common parspars-ing patterns in a minimal language.

Specifically, in this thesis are presented: the process of designing this language based on requirements of the CoCo course as well as an analysis of the GCC, Clang, OpenJDK javac, and Single-Assignment C (SAC) compilers; an overview of the DSL and surrounding framework; and an in-depth look at the framework design and logic.

(4)
(5)

Contents

1 Introduction 7 2 Background 9 2.1 CoCoNut . . . 9 2.2 CLI conventions . . . 10 2.2.1 Getopt . . . 11

2.3 Supported compiler options . . . 11

2.3.1 Flags . . . 12

2.3.2 Options . . . 13

2.3.3 Optimisation levels . . . 13

2.3.4 Sac2c special options . . . 13

3 Framework Features 15 3.1 Language design . . . 15

3.1.1 Data types . . . 16

3.2 Generated code . . . 16

3.3 Parsing behaviour . . . 17

3.3.1 Configuration file parsing . . . 18

3.3.2 Command-line parsing . . . 18 4 Domain-specific Language 21 4.1 Basic configuration . . . 21 4.2 Type-specific configuration . . . 23 4.3 Special options . . . 25 5 Related Work 27 5.1 Docopt . . . 27 5.2 GCC option language . . . 28

5.3 Getopt and others . . . 29

6 Discussion 31 6.1 Limitations . . . 31 6.2 Features . . . 31 6.3 Evaluation . . . 32 6.4 Language . . . 32 7 Conclusion 33 7.1 Future work . . . 33

(6)
(7)

CHAPTER 1

Introduction

Most user-facing programs have extensive configuration possibilities, often using program defaults and interfacing with the user via command-line interface (CLI), configuration files, or graphical user interface (GUI). In many cases the code used to specify, interpret, and set these options follows much of the same structure. Implementation of this code is often tedious and boilerplate-heavy, especially when writing programs with many configuration possibilities. Frequently the solution to problems like this is to use a framework which handles most of the boilerplate code. Compilers are a particularly interesting instance of this problem. They naturally tend to expose a rich amount of configuration, for example through extensive sets of flags, some of which are aggregate or interpret complex sequences. Additionally, compilers do not usually have GUIs that would allow for more configuration. Graphical interfaces that do exist are usually simple frontends for the CLI options.

A framework named CoCoNut (Compiler Construction in a Nutshell) is currently being de-veloped for the Compiler Construction (CoCo) course at the University of Amsterdam (UvA). Over a period of 8 weeks, students following this course design and implement a complete com-piler for a model language based on C called Civilised C (CiviC). This implementation is done with the help of a so-called metacompiler framework, of which CoCoNut is one.

Metacompilers or compiler-compilers are tools used in the development of compilers to gener-ate program code from a specific model describing part of the compiler. Such a model is usually written in a domain-specific language (DSL) specifically designed for describing compilers. The CoCo course previously used parts of a different compiler framework developed for the Single Assignment C (SAC) project [1].

Currently, development of CoCoNut has progressed to include a DSL describing the abstract syntax tree (AST) and traversals upon the AST, code generation from the DSL, leak detection, garbage collection [2], traversal optimisation, and serialization of the AST [3]. However, as of now the CoCoNut compiler still lacks a way to handle configuration management.

This thesis presents a novel configuration management framework that handles the boilerplate associated with configuration specification and interpretation, and is designed as an extension of the CoCoNut project. The framework interprets a DSL designed for specifying configurations and their user interfaces. This is done through a metacompilation process resulting in program code for the definition of the configuration as well as interpretation of the end user input.

Like CoCoNut, the configuration management framework is written in and for C as a widely used, high-performance, low-level language. To further minimise dependencies not used by Co-CoNut, both Flex and Bison are reused for the internal metacompiler of the management frame-work. Finally, the DSL design is heavily based on the CoCoNut DSL and uses many of the same structures and semantics.

This thesis aims to design a DSL with minimal syntax that covers the most common use cases and option patterns. To help achieve this the following research questions are defined:

1. Which configuration patterns are used, particularly in compilers? 2. Which of the patterns are common and should be supported?

(8)

3. What code do we generate to support these patterns?

4. What information is needed from the developer to generate this code?

The answers to these questions allow a syntax to be constructed which requires the minimal amount of information from the developer. Additionally, it is also possible to use shorthand to further minimise the syntax for the most common cases.

The rest of the thesis is structured as follows. We start with background regarding the CoCoNut framework and POSIX and GNU CLI conventions in Chapter 2 to help establish the context of the developed framework. The same chapter discusses command-line options commonly seen in compilers and highlights some patterns for which support will be implemented, answering research questions 1 and 2.

After this the work of this thesis is presented, starting with the language design as well as the structure of the framework as a program in Chapter 3. Also discussed in the same chapter are the program behaviour and generated code needed to support the chosen configuration patterns, answering research question 3 directly and 4 as a consequence.

Chapter 4 then gives an introduction to the DSL designed with this knowledge. This is done mainly by showing examples of the language syntax along with the corresponding generated program code, divided up by major language features. The thesis is then wrapped up through an overview of related work in Chapter 5, and a discussion and conclusion in Chapters 6 and 7.

(9)

CHAPTER 2

Background

2.1

CoCoNut

CoCoNut [2] [3] is a metacompiler framework written in C that parses a domain-specific language (DSL) and generates C code describing an abstract syntax tree (AST) as well as traversals upon that tree. Abstract syntax trees are a model widely used in compiler development to describe program code in a structured and logical way.

In general, compilers operate by parsing a source language into an intermediate representation (IR) using the AST model. This AST is then analysed for correctness, operated upon, and finally translated again into a destination language. Examples of common operations performed upon an IR by many compilers might be simplification for later phases, or optimisations to improve the performance of the represented code. Analysis and modification of the AST is generally done using a traversal on that tree as a way to reach and process each node.

The design for the DSL used in CoCoNut is based on three guiding principles [2]: • Consistent syntax across all DSL code.

• All code written should be forward compatible. • Code syntax should ideally be self-explanatory.

CoCoNut DSL code is based on defining a number of root-level structures, most of which

define some node to be used in the IR of the target compiler. Each structure includes an

identifier, type, and a list of properties. Each property has a specific meaning associated with the type, and may have a value. An example of the DSL blueprint is given in Figure 2.1, and a concrete example is given in Figure 2.2.

1 [ <m o d i f i e r s>] <type> <name> { 2 [i n f o = " < e n t r y i n f o r m a t i o n > ",] 3 <p r o p e r t y> = <value> , 4 ... 5 <p r o p e r t y> { 6 ... 7 } , 8 ... 9 <p r o p e r t y> 10 ... 11 };

Figure 2.1: DSL blueprint used in CoCoNut.

Valid datatypes in the CoCoNut DSL are enums defined in the DSL code, links to specific structure types, strings, booleans, integer numbers, and floating point numbers. Specifically,

(10)

1 n o d e B i n O p { 2 c h i l d r e n { 3 E x p r L e f t E x p r { c o n s t r u c t o r , m a n d a t o r y } , 4 E x p r R i g h t E x p r { c o n s t r u c t o r , m a n d a t o r y } 5 } , 6 a t t r i b u t e s { 7 B i n O p E n u m op { c o n s t r u c t o r } 8 } 9 };

Figure 2.2: Example of a node definition written in the CoCoNut DSL. In this case,Expr and

BinOpEnumare previously defined types.

integer numbers make use of the fixed-width (unsigned) integer types defined instdint.h, and

floating point numbers use either of the nativefloat anddouble types conforming to the IEEE

754 standard.

Once an AST and its traversals has been defined using this language, the CoCoNut meta-compiler can be used to translate the definition to compile-able C code. This generated code can then be combined with user-written program code to form the final compiler, as illustrated in Figure 2.3.

AST Source Files CoCoNut Compiler Internal Code AST Definition External APIs User Code Library Code C

Compiler Compiler Executable

Figure 2.3: Conceptual view of the compiler development process using the CoCoNut program-ming model.

2.2

CLI conventions

Different ways of handling CLI options exist. POSIX [4] prescribes that all options should have

a name which is exactly 1 alphanumeric character prepended by a hyphen. The capitalWoption

is reserved for vendor options. In this way argumentless options (flags) can be chained together with a single hyphen, and options with arguments can have the argument as a separate token

or appended to the option name. For example, -abc is equivalent to -a -b -c, and -ofoo and

-o foo are equivalent. The single hyphen token is interpreted as a regular argument, denoting standard input or output by convention. The double hyphen has a special meaning denoting the end of option parsing, and each argument following it is interpreted as a regular argument. Arguments to options must also not be optional and all options must precede the arguments of the program. In strict POSIX parsing mode, the first non-option argument signals the end of option parsing, and each argument thereafter is interpreted as an argument even if it follows option naming.

GNU convention [5] extends this definition by allowing options and arguments to appear in any order relative to each other, and allows the definition of long options preceded by two hyphens. Arguments to long options are appended separated by an equals sign and no spaces, and option arguments are allowed to be optional. For long options the equals sign signals the

(11)

existence of an argument, while for short options the argument must be appended directly to

the option token or left out entirely. For example, --name does not have an argument, while

--name=value does. The user can also abbreviate long options as long as the abbreviation is not ambiguous.

These conventions are not always used universally. In particular, older programs often have support for legacy option notations to avoid breaking backwards compatibility. For example,

tar allows the chaining of short options to end with an option taking a flag. Another example

is sort, which also allows option prefixing with a plus sign instead of a hyphen. Aside from

legacy support, some programs such as git also adopt a ‘hybrid’ convention, which takes a

‘sub-program’ as a first argument.

2.2.1

Getopt

Both the POSIX and GNU behaviours are implemented in GNU C library functionsgetoptand

getopt_longrespectively, included ingetopt.h. An example usage ofgetoptis shown in Figure 2.4.

As illustrated, getopt provides support for scanning the command line for options and saving

each next one encountered before returning control flow to the main program. The program code defining the configuration variables as well as handling each option is still required to be written by the developer.

1 w h i l e (( c = g e t o p t ( argc , argv , " abc : ") ) != -1)

2 s w i t c h ( c ) 3 { 4 c a s e ’ a ’: 5 a f l a g = 1; 6 b r e a k; 7 ... 8 c a s e ’ ? ’: 9 if ( o p t o p t == ’ c ’) 10 f p r i n t f ( stderr , " O p t i o n -% c r e q u i r e s an a r g u m e n t .\ n ", o p t o p t ) ; 11 e l s e if ( i s p r i n t ( o p t o p t ) ) 12 f p r i n t f ( stderr , " U n k n o w n o p t i o n ‘ -% c ’.\ n ", o p t o p t ) ; 13 e l s e 14 f p r i n t f ( stderr , 15 " U n k n o w n o p t i o n c h a r a c t e r ‘\\ x % x ’.\ n ", 16 o p t o p t ) ; 17 r e t u r n 1; 18 d e f a u l t: 19 a b o r t () ; 20 }

Figure 2.4: Example of getopt from the glibc manual [6].

As an extension of getopt, getopt_long also provides support for programmatically defining

which long options map to which short options, as well as whether they require an argument. It also provides explicit support for setting flags by relating command-line options to memory pointers and a corresponding value to set. An example of this is shown in Figure 2.5. Like

getopt, getopt_long does not provide support for creating variable locations or setting values

automatically.

2.3

Supported compiler options

Since this framework is designed specifically for use in a compiler, it is important that commonly used option patterns in compilers are supported by the framework DSL. This section discusses the option specifications of a number of compilers, and highlights the patterns the framework intends to support. The main compilers considered are two C compilers: the GNU Compiler Collection (GCC) [8] C compiler and the Clang [9] C compiler. In addition to that, we will also consider the OpenJDK [10] Java compiler [11] (javac) as another common language inspired by C. The reason we consider these compilers is that they are relatively large projects with many

(12)

1 w h i l e (1) 2 { 3 s t a t i c s t r u c t o p t i o n l o n g _ o p t i o n s [] = 4 { 5 /* T h e s e o p t i o n s set a f l a g . */ 6 {" v e r b o s e ", n o _ a r g u m e n t , & v e r b o s e _ f l a g , 1} , 7 {" b r i e f ", n o _ a r g u m e n t , & v e r b o s e _ f l a g , 0} , 8 /* T h e s e o p t i o n s don ’ t set a f l a g . 9 We d i s t i n g u i s h t h e m by t h e i r i n d i c e s . */ 10 {" add ", n o _ a r g u m e n t , 0 , ’ a ’} , 11 ... 12 {0 , 0 , 0 , 0} 13 }; 14 /* g e t o p t _ l o n g s t o r e s the o p t i o n i n d e x h e r e . */ 15 int o p t i o n _ i n d e x = 0; 16

17 c = g e t o p t _ l o n g ( argc , argv , " abc : d : f : ",

18 l o n g _ o p t i o n s , & o p t i o n _ i n d e x ); 19

20 /* D e t e c t the end of the o p t i o n s . */

21 if ( c == -1) 22 b r e a k; 23 24 s w i t c h ( c ) 25 { 26 ... 27 } 28 }

Figure 2.5: Example ofgetopt_longfrom the GNU C Library manual [7].

options, and due to their similarity to the CiviC language which the students will build a compiler for. Finally, we will also consider some options from the Single-Assignment C (SAC) [1] compiler (sac2c) as a research compiler similar to the target CiviC compilers. Due to its widespread use and extensive configuration, GCC will be considered as the main target for DSL design.

2.3.1

Flags

- f a g g r e s s i v e - loop - o p t i m i z a t i o n s - falign - f u n c t i o n s [= n [: m :[ n2 [: m2 ] ] ] ] ...

- fipa - ra - f v a r i a b l e - e x p a n s i o n - in - u n r o l l e r - fvect - cost - m o d e l - f v p t - f w e b - fwhole - p r o g r a m - f w p a - fuse - linker - p l u g i n

- - p a r a m n a m e = v a l u e

- O - O0 - O1 - O2 - O3 - Os - O f a s t - Og

Figure 2.6: Excerpt from the GCC manual option summary showing optimisation options. A large portion of command-line configuration in GCC is done through its so-called feature flags. Feature flags are options resembling GNU short and long options, but with the second hyphen replaced with a prefix letter. The prefix distinction is mostly mnemonic and does not necessarily indicate how the option is handled internally. The GCC 8.3 option summary [12] lists 717 different options relevant for writing machine-independent plain C. This number counts each feature flag as a separate option, and does not include developer, machine-specific, and dialect language options.

From these 717 options, 602 are flags, meaning that they take no argument and directly set a value as long as the option appears. This number includes all of the previous options which did not explicitly use a metavariable to denote an argument. A paired flag exists for an unlisted amount of these flags. The behaviour of these is to disable the associated flag, intended for usage with Makefiles or other building tools. Examples of these feature flags are shown in Figure 2.6.

(13)

2.3.2

Options

A smaller subset of options is made up of proper options which take an argument and feature flags

with arguments. The majority of these accept a value which is a natural number, such as

-fmax-errors=nor -Warray-bounds=n. A smaller subset of options take one of a specific set of strings, for

example -fdiagnostics-show-location=[once|every-line] or -fdiagnostics-color=[auto|never|always].

Several options take strings, such as to specify paths or type names. Finally, a few remaining options take lists of values, for example to include or exclude files.

2.3.3

Optimisation levels

A well-known feature of GCC is its optimisation levels option -O. The option has at least 8

possible values as of the version mentioned above, with more possibly being defined on different platforms. The option’s main functionality is to enable large amounts of feature flags at once, specifically flags related to code optimisation. These flags are roughly grouped into three broad tiers increasing in compilation time and code speed, which can be activated through the opti-misation level flag. Aside from these three, special values exist to optimise for different reasons such as optimal debugging experience, minimal code size, or speed to the point of breaking stan-dard compliance. A similar pattern exists for feature flags related to warnings, where multiple

warnings can be activated in groups with flags such as-Wextraor -Wall.

2.3.4

Sac2c special options

Both the Clang compiler and OpenJDK javac exhibit similar patterns albeit using fewer options. Sac2c on the other hand implements some powerful options not seen in the other compilers due to it being a research compiler. Two such options from the 1.3.3 build of sac2c are highlighted in this section.

The first is an option which takes as an argument a sequence of character tokens. The program reads each token separately, and sets for each token encountered a corresponding flag. Example usage for one such option is shown in Figure 2.7. This behaviour allows the end-user to easily set a number of flags without polluting the overall flag namespace of the program.

- c h e c k [ a c g t b m e h d i ]+ I n c o r p o r a t e r u n t i m e c h e c k s i n t o e x e c u t a b l e p r o g r a m . The f o l l o w i n g f l a g s are s u p p o r t e d : a : I n c o r p o r a t e all a v a i l a b l e r u n t i m e c h e c k s . c : P e r f o r m c o n f o r m i t y c h e c k s . g : P e r f o r m GPU e r r o r c h e c k s . t : C h e c k a s s i g n m e n t s for t y p e v i o l a t i o n s . b : C h e c k a r r a y a c c e s s e s for b o u n d a r y v i o l a t i o n s . m : C h e c k s u c c e s s of m e m o r y a l l o c a t i o n s . e : C h e c k e r r n o v a r i a b l e u p o n a p p l i c a t i o n s of e x t e r n a l f u n c t i o n s . h : Use d i a g n o s t i c h e a p m a n a g e r . d : P e r f o r m c h e c k s for the d i s t r i b u t e d m e m o r y b a c k e n d . ( C h e c k t h a t t h e r e are no i l l e g a l a c c e s s e s to d i s t r i b u t e d a r r a y s .) i : Use d i a g n o s t i c h e a p m a n a g e r for d i s t r i b u t e d m e m o r y b a c k e n d .

Figure 2.7: Usage for the check option in sac2c.

Another special option included in sac2c is the target mechanism. Targets are collections

of configurations defined inside the sac2crc configuration file. Each target includes text in the

configuration file syntax, and can inherit from other targets. By specifying the -t or --target

option on the command line one of these targets can be called, and its configuration applied at that point in the argument parsing execution. An example of target definitions can be seen in Figure 2.8. Using this feature, a programmer using the sac2c compiler can define and enable aggregate options similar to GCC’s optimisation levels. A similar feature is present in GCC

(14)

this requires targets to be split over multiple files and necessitates that all options are accessible on the command line, this is not implemented in favour of the SAC target mechanism.

t a r g e t c u d a : B A C K E N D := " C u d a " CC := " no " C C L I N K := " - l c u t i l _ i 3 8 6 - l c u d a r t - l c u b l a s " C C F L A G S := " no - - m a x r r e g c o u n t 20 - X c o m p i l e r - W a l l - X c o m p i l e r - Wno - u n u s e d " " - X c o m p i l e r - fno - b u i l t i n " C C I N C D I R := " - I $ S A C 2 C B A S E / i n c l u d e / - I $ C U T I L - L $ C U T I L _ L I B " C C L I B D I R := " - L $ S A C 2 C B A S E / lib / " L D _ P A T H := " - X l i n k e r - L % p a t h % - X l i n k e r - r p a t h =% p a t h %" C E X T := " cu " G E N P I C := " - X c o m p i l e r - f P I C " t a r g e t c u d a 6 4 :: c u d a : C C L I N K := " - l c u t i l _ x 8 6 _ 6 4 - l c u d a r t - l c u b l a s " t a r g e t c u d a h y b r i d :: c u d a : B A C K E N D := " C u d a H y b r i d "

(15)

CHAPTER 3

Framework Features

3.1

Language design

While the GNUgetoptfunctions provide great functionality for the parsing of the command line,

they do not cover declaration of eventual configuration files, parsing of argument string value, or generation of the documentation. It is the goal of the framework to handle all of these aspects of configuration, in particular from a single source of truth. The framework bases itself around the idea of “making the common case fast”. Doing so, it aims to reduce the amount of time the developer spends writing a configuration interface which follows conventions, and allows them to spend more time on the development of the parent application.

To help accomplish this a DSL is used, designed based on the same core principles as the CoCoNut DSL. Usage of DSL allows for increased readability of each definition compared to direct

structand macro definitions in C, while also preserving brevity compared to more general-purpose

languages data languages like YAML [13]. This section presents the general characteristics of the framework’s DSL.

Instead of approaching configuration from the command line, the DSL bases itself around the declaration and exact structure and type of the configuration. Each field in the configuration must have a clear type from which parsing and option logic can be derived. In each field is thus collected not only information about its features but also the ways in which it can be set. Special options which are associated with multiple fields or no fields at all are defined separately from the overall configuration structure. A basic definition in the DSL thus consists of a root configuration definition with a list of fields or nested configurations.

Compared to the CoCoNut DSL, the framework DSL has an increased focus on nested defi-nitions and a decreased focus on attributes of each definition. The framework DSL differs from the CoCoNut DSL in this respect. In the CoCoNut DSL, attributes may have either a single value or a list of values. No syntactic distinction is made between nested definitions and lists of attributes. An example of this is shown in Figure 3.1.

1 e n u m B a s i c T y p e {

2 p r e f i x = BT ,

3 v a l u e s {

4 int , bool , float , s o m e v a l

5 }

6 };

Figure 3.1: Example of an enumdefinition in CoCoNut. Thevalueskeyword opens a scope with

a list of identifiers.

While this is appropriate for CoCoNut, where attributes exist mostly as children of the root structures, it may lead to confusion in the configuration framework. To disambiguate and distinguish between the two kinds of lists a set notation is added, where instead of multiple

(16)

identifiers, a single set is used as the right-hand side value of an attribute. Doing so also hints at the fact that no element should repeat. In practice, this simply means the inclusion of an additional equals sign between the attribute and its value, as shown in Figure 3.2. With this, a configuration can be seen as containing a set of fields.

1 e n u m B a s i c T y p e {

2 p r e f i x = BT ,

3 v a l u e s = {

4 int, bool, float, s o m e v a l

5 }

6 }

Figure 3.2: Example of an enum definition in the configuration language. Here values is an

attribute whose value is a set of identifiers.

3.1.1

Data types

The data types available for each field are boolean, unsigned integer, integer, double-precision

floating point, and string; as well as two additional complex types in enums and lists. As

highlighted in chapter 2, both lists and enumerable strings are often used as option values. Due to restrictions in the target language C, enums must be defined within the framework by the developer to prevent name clashes, while lists are generated automatically by the framework.

Contrary to CoCoNut, neither specific fixed-width integer types nor single-precision floating point numbers are used in this framework. This is largely due to the fact that each field in the configuration only exists once and thus less care needs to be taken about field size. This is unlike the typed fields in CoCoNut which may exist hundreds of times in memory as an AST is created. In addition, unlike values inside a compiler’s IR, configuration values do not generally need to be translated to types in another language.

Unsigned integer types are an exception to this. In practice, integer numbers in configuration are most frequently used ordinally and are thus limited to natural numbers. To provide explicit support for this common case unsigned integers are included as a type separate from generic integers. Explicating this case makes user code less error-prone and allows for some amount of inherent value validation. Additionally, most numbers in configuration inherently possess a limited valid range. For this reason ranges can be defined on number types to generate validation code in addition to parsing code.

3.2

Generated code

The framework consists of a single program which takes a configuration specification written in a specialised DSL as input and generates the necessary program code, similar to the Flex and Bison tools. The generated code consists of a header file, C code files, and documentation files. All parts are illustrated in Figure 3.3. The framework is used by including the header in the target program and compiling and linking all C source files. Additionally, the CoCoNut-lib library [14] must be built and linked as well.

The header file contains all type, function, and global declarations required to include the generated code in a program. The main declaration is a single global variable of an anonymous structtype that represents the configuration and its fields, along with anyenumtypes used in the

declaration. Three functions are also declared: ccnm_print_usage, ccnm_parse, andccnm_free. The

ccnm_freefunction prints the generated program usage text, and may be called at any point by

the developer. The ccnm_parsefunction serves as an entry point to the option and file parsing

functionality. These two parsers are consolidated as one, both to further increase ease of use as

well as to guarantee the correctness of the target mechanism. Finally, theccnm_free function is

used to free memory used by the framework lists and should be called at the end of the parent program.

(17)

DSL Compiler

Header File Confguration File Parser Command-line

Parser Documentation Files DSL Source Files

Figure 3.3: Overview of generated components.

The generated entry-point function takes a file pointer representing the configuration file in

addition to the program argc and argv variables. Upon being called the function completely

parses both the command line and configuration file, validating the final configuration before returning control to the surrounding program. Should an error occur during parsing or as a result of validation appropriate warnings are printed, and the function returns with an integer representing the stage in which the error occurred. A part of the header file as well as an example of code using the function are shown in Figures 3.4 and 3.5 respectively.

1 t y p e d e f e n u m C C N M _ R E S U L T { 2 C C N M _ S U C C E S S = 0 , 3 C C N M _ I O _ E R R O R , 4 C C N M _ F I L E _ E R R O R , 5 C C N M _ C L I _ E R R O R , 6 C C N M _ V A L I D A T I O N _ E R R O R 7 } C C N M _ R E S U L T ; 8 9 s t r u c t { 10 b o o l h e l p ; 11 b o o l v e r s i o n ; 12 } g l o b a l s ; 13 14 C C N M _ R E S U L T c c n m _ p a r s e (int argc , c h a r * a r g v [] , F I L E * fp ); 15 v o i d c c n m _ p r i n t _ u s a g e (v o i d); 16 v o i d c c n m _ f r e e (v o i d);

Figure 3.4: Example of a framework-generated header file. In this case, the configuration contains

two fields: help andversion.

The usage printing function can be called separately from the main parse function as a result of the specific configuration. In this way the programmer is free to define under which conditions the usage is printed, and can choose to print additional information both before and after the function is called.

3.3

Parsing behaviour

The final program configuration is built from three cascading layers: program defaults, configura-tion file settings, and command-line opconfigura-tions. Program defaults are defined inside the framework by the developer, and are applied first to serve as a base configuration. If no configuration is done by the end user, this is the configuration the program will run on. Next, configuration file settings are read from a file and applied on top of the default values. The file is in a location defined by the developer, and is meant for settings which should be saved across executions. The configuration file may also hold target definitions, which represent groups of settings that can be

(18)

1 int m a i n (int argc , c h a r * a r g v []) { 2 F I L E * fp = . . . ; 3 C C N M _ R E S U L T r e s u l t = c c n m _ p a r s e ( argc , argv , fp ); 4 5 if ( r e s u l t != C C N M _ S U C C E S S && r e s u l t != C C N M _ V A L I D A T I O N _ E R R O R ) { 6 ... 7 } 8 9 if ( g l o b a l s . h e l p ) { 10 c c n m _ p r i n t _ u s a g e (); 11 } 12 ... 13 14 c c n m _ f r e e (); 15 }

Figure 3.5: Example of the framework entry-point function being called.

enabled on the command line. Finally, options from the command line are parsed and applied. Figure 3.6 conceptually illustrates this process of parsing the configuration file and command line to create the final configuration.

Configuration Defaults

Parse Configuration

File Settings Final Configuration

Parse Command-Line Argument Option or Argument Target Option Parse Targets Parse Value Finished? Apply Target Saved Targets

Figure 3.6: Conceptual view of parsing process which creates the final configuration.

3.3.1

Configuration file parsing

The first stage of configuration parsing is the configuration file. Settings are read from the configuration once per line and applied to the configuration in order. Additionally, any target definitions read are saved in memory for possible target specifications that might appear on the command line.

Since configuration files are generally written once and used many times, brevity of syntax is not a necessary requirement. Additionally, since command-line options can only operate on fields and can always be mimicked by setting values directly, no support is needed for directly writing options in the file. Settings in the configuration file thus use a simple format consisting of the name of the field, an equals sign, and the value the field should be set to.

Targets in the configuration file are defined by thetargetkeyword. Target definitions include

every subsequent setting appearing after the initial target declaration until another target is

defined. Targets can inherit from previously defined targets by using theextends keyword. An

example of a configuration file is shown in Figure 3.7. When referenced on the command line through specially defined flags, each setting belonging to the target and its parents is applied to the configuration.

3.3.2

Command-line parsing

The framework supports the creation of option interfaces following the POSIX and GNU

conven-tions. Short and long options can be defined, which are then internally passed to thegetopt_long

(19)

1 v e r b o s i t y = 0 2 o p t i m i s a t i o n s . s o m e _ o p t i m i s a t i o n = f a l s e 3 4 t a r g e t c u d a 5 c o m p i l e r = n v c c 6 7 t a r g e t s o m e _ t a r g e t e x t e n d s c u d a 8 s o m e _ f i e l d = " s o m e v a l u e "

Figure 3.7: Example of a possible configuration file in the configuration framework.

left-to-right. Options appearing multiple times are overwritten by the value of the last appear-ance. Doing so allows the user to explicitly overwrite fields previously set by aggregate options. Likewise, special targets defined in the configuration file are applied at the position in which the option specifying them appears.

In addition to the base GNU parsing mode, the framework includes three other modes: exact,

extended, and autoexpand. In exact mode, only exact occurrences of options are matched. The extendedmode mixes the exact and GNU modes and first attempts to match an option exactly

before passing control to getopt. Finally, autoexpand mode acts like exact mode but allows for

abbreviation up to ambiguity, similar to GNU. These modes can be enabled in the framework compiler should the programmer wish to deviate from or extend GNU. Only the GNU parsing mode is implemented as of the time of writing.

(20)
(21)

CHAPTER 4

Domain-specific Language

The previous two chapters highlighted what configuration behaviour the framework aims to sup-port and in which way the framework and DSL are used to achieve that. The current chapter describes in detail each individual feature of the DSL and its resulting generated code. The chap-ter is structured similarly to a tutorial, and describes the features mostly in order of increasing complexity. A complete tutorial in more informal style can be found at the source repository [15].

4.1

Basic configuration

4.1.1

Defining configurations

The most basic configuration can be defined by using theconfigkeyword with name and a set of

fields, where each field consists of a type and name. From this definition is generated a global structwhose fields can be accessed through the config and field names. Only one root-level config may exist. Both the DSL and resulting C code are illustrated in Figure 4.1.

1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 u i n t v e r b o s i t y , 4 s t r i n g o u t f i l e 5 } 6 } Header file 1 s t r u c t { 2 u n s i g n e d int v e r b o s i t y ; 3 c h a r * o u t f i l e ; 4 } g l o b a l s ;

Figure 4.1: Config definition and resulting header code.

Available types arebool, uint, int, float, and string. The uint, float and string types map

to the C types unsigned int, double, andchar * respectively. A field can also be another nested

config, or be anenumorlist type, which will be elaborated on later.

Each field can be extended with aninfo attribute as well as a default value. When set with

a string value, the info will be used for generating usage text as well as documentation pages.

Configurations may also be given an info string which is printed before its members. Defaults are set by appending the field with an equals sign and a value, as shown in Figure 4.2. If no default is

supplied, depending on type the default valuesfalse,0, andNULLare used instead. These defaults

(22)

1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 u i n t v e r b o s i t y = 3 { 4 i n f o = " V e r b o s i t y of o u t p u t " 5 } , 6 s t r i n g o u t f i l e = " - " { 7 i n f o = " L o c a t i o n of o u t p u t f i l e " 8 } 9 } 10 }

Figure 4.2: Config definition with defaults and info.

4.1.2

CLI options

Each field in the configuration can be associated with one or more command-line options. By

doing this, any option which appears on the command line has its argument parsed and written directly in the associated field. By default, options are automatically generated by applying a prefix and case conversion to the field of any field which does not have an explicit option definition. The exact prefix and case used can be configured as an option to the DSL compiler.

The default prefix and case conversion are--and conversion tokebab-case, which is a case using

lowercase words separated by hyphens. If instead no option is desired, option generation can be explicitly silenced by specifying an empty set of options. Option specification is illustrated in Figure 4.3. 1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 u i n t v e r b o s i t y = 3 { 4 o p t i o n s = { v , v e r b o s i t y } // o p t i o n s - v and - - v e r b o s i t y 5 } , 6 s t r i n g o u t p u t _ f o l d e r = " . ", // auto - g e n e r a t e s o p t i o n - - output - f o l d e r 7 s t r i n g o u t f i l e = " - " { 8 o p t i o n s = {} // c a n n o t be set by CLI o p t i o n 9 } 10 } 11 }

Figure 4.3: Example config definition with option specifications.

4.1.3

CLI arguments

Instead of an option, a field may also be a proper program argument. Arguments are specified

by using theargumentattribute in a field definition. Arguments that appear on the command line

are parsed and set according to the order of field definitions in the config. Multiple arguments may be defined, and no options are generated for arguments. Strict POSIX scanning can be

enabled as a compiler option or by using thePOSIXLY_CORRECT environment variable in the same

way asgetopt.

4.1.4

Configuration file settings

By default, fields in the config are accessible from the configuration file through their name.

Fields can explicitly be disallowed from the configuration file by setting theconfigfileattribute

tofalse. This can be done on the level of an individual field, or toggled for configurations as a

whole. An example configuration and a possible configuration file is given in Figure 4.4. Any field which is allowed in the configuration file can be accessed by the end user through their paths from the root. This is the field name prepended by names of parent configurations except for the root configuration, separated by period symbols. These fields can be set using values in the same formats as the default values in the DSL.

(23)

1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 u i n t v e r b o s i t y , 4 c o n f i g o p t s { 5 c o n f i g f i l e = false, 6 f i e l d s = { 7 b o o l d o _ s t r i p _ m i n e { c o n f i g f i l e = t r u e } , 8 b o o l d o _ l o o p _ u n r o l l 9 } 10 } 11 } 12 } Configuration file 1 v e r b o s i t y = 3 2 o p t s . d o _ s t r i p _ m i n e = t r u e 3 o p t s . d o _ l o o p _ u n r o l l = t r u e // t h i s is d i s a l l o w e d

Figure 4.4: Example of a config definition and possible configuration file. In this case, opts.

do_loop_unrollcannot be set in the configuration file due to its parent configuration disabling the

functionality.

4.2

Type-specific configuration

4.2.1

Enum types

Enum types can be defined outside of the main configuration and used as a type for its fields.

Enums are defined through theenumkeyword with aprefixand a list ofvalues. Enum identifiers

and prefixes must be unique globally, and no value may appear twice within the same enum. An example of an enum definition is given in Figure 4.5. Enum types defined in this way become

available in the code as a typeCFG_<identifier>, and its values are made up of the specified values

prepended with the prefix and an underscore. An example of an enum definition is shown in Figure 4.5.

1 e n u m b i t _ w i d t h {

2 p r e f i x = BW ,

3 v a l u e s = {8 , 16 , 32 , 64}

4 }

Figure 4.5: Example of an enum definition.

Defined enum types can be used as a type for fields in the configuration. When this is done, the default value, command line, and configuration all accept values which appear in the value list of the enum. If no default value is specified, the first value of the enum is used. Usage of an invalid value for the enum will result in an error in the appropriate part of the process.

4.2.2

List types

It is possible to use lists of values instead of single values by adding thelistkeyword to any basic

or enumtype. Default values and configuration file values for lists are defined using curly braces

along with commas, similar to arrays in C. When no default is defined, the default value for a list is the empty list. Defining a field as a list modifies the command-line behaviour to append

to the list instead of overwriting the field. To prevent ambiguity, noargumentfield may be a list

other than the final one. Additionally, an optionalseparatormay be defined for lists used on the

command line. Doing so allows multiple values to be added to the list in a single argument. An example of a list definition is shown in Figure 4.6.

(24)

1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 s t r i n g l i s t i n f i l e s { a r g u m e n t } , 4 s t r i n g l i s t e x c l u d e _ f i l e s { 5 s e p a r a t o r = " ; " 6 } , 7 u i n t l i s t a l i g n _ f u n c t i o n s = {32 , 7} { 8 s e p a r a t o r = " : " 9 } 10 } 11 }

Figure 4.6: Example of a list field definition. In this example,align_functionsis a list field with

default values32and7.

List fields are represented in the generated code as a pointer to a standard C array data structure, representing the values of the field after parsing is complete. The length of this array is available in a separate field whose name is constructed by appending length to the field name, formatted according to the configured case conversion. An example of the generated header file is shown in Figure 4.7. Header File 1 s t r u c t { 2 c h a r ** i n f i l e s ; 3 s i z e _ t i n f i l e s _ l e n g t h ; 4 c h a r ** e x c l u d e _ f i l e s ; 5 s i z e _ t e x c l u d e _ f i l e s _ l e n g t h ; 6 u n s i g n e d int * a l i g n _ f u n c t i o n s ; 7 s i z e _ t a l i g n e d _ f u n c t i o n s _ l e n g t h ; 8 } g l o b a l s ;

Figure 4.7: Example generated header code for three list fields.

4.2.3

Number ranges

Valid ranges for numbers can be defined by using the range attribute of a field. A range is

specified by its lower and upper bounds, separated by an->arrow. The bounds are enclosed by

square brackets for closed intervals or round brackets for open intervals, and either bound can be omitted to denote an infinity. An example of a range is shown in Figure 4.8.

1 c o n f i g g l o b a l s { 2 f i e l d s = { 3 u i n t v e r b o s i t y = 3 { 4 r a n g e = [0 - > 3] 5 } 6 } 7 }

Figure 4.8: Example of a range definition on an unsigned integer field. In this case, the

config-uration is only valid if theverbosity field has a value between 0 and 3, inclusive, at the end of

parsing.

4.2.4

Boolean flags

In addition to theoptionsattribute, boolean fields take an additionaldisableattribute

(25)

to true, while flags indisable set the value tofalse. Neither set of options take any arguments.

Automatically generated options for booleans include an additional string between the prefix and case-converted field name. The exact string used depends on the polarity of the flag and can be

configured in the compiler. By default these are empty for enabling flags and no- for disabling

flags.

4.3

Special options

4.3.1

Multioptions

Multioptions can be defined to create an option which sets several fields at once, either to

predetermined values or from a shared argument. To define one, themultioptionkeyword is used

with an options attribute and a list of fields. Each field must be the valid path from the root

config to an existing field and have either a value associated with it, or be set to the special ?

token representing the command-line argument. All fields which are set to the option argument must be of the same exact type. An example of a multioption definition is given in Figure 4.9.

Like fields, multioptions may be given aninfo string for documentation.

1 m u l t i o p t i o n { 2 o p t i o n s = gpu - freq , 3 f i e l d s = { 4 a l l o w _ c l o c k i n g = true, 5 c o r e _ f r e q = ?, 6 h 2 6 4 _ f r e q = ?, 7 i s p _ f r e q = ?, 8 v 3 d _ f r e q = ? 9 } 10 }

Figure 4.9: Example of a multioption definition. In this example, using --gpu-freq with a

unsigned integer value would set theallow_clockingfield to true, andcore_freq,h264_freq,isp_freq,

andv3d_freqto the specified value.

4.3.2

Optionsets

Optionsets can be used to configure many options at once by interpreting tokens in its argument

as individual multioptions. A set can be defined with the optionset keyword, a set of options,

and a set of tokens. Each token has a name which is matched in the optionset argument, and a

list of fields similar to a multioption. As the token itself is part of the optionset argument, fields

appearing in a token must be set to a constant value. Aseparatorcan optionally be defined that

must appear between tokens, and must be defined if any token has a length greater than one. An example of an optionset definition can be seen in Figure 4.10. Like fields and multioptions, optionsets can have an info attribute.

1 o p t i o n s e t { 2 o p t i o n s = opts , 3 t o k e n s = { 4 s = { s o m e _ o p t = t r u e } , 5 a = { a n o t h e r _ o p t = t r u e } , 6 f = { f i n a l _ o p t = t r u e } 7 } 8 }

Figure 4.10: Example of an optionset definition. In this example, using --opts sa would set

(26)

4.3.3

Target options

The options used for targeting targets defined in the configuration file are configured by using the targetoptionskeyword at the root level with a set of options. For example,targetoptions = { -t, --target }would define -t and --target as ways to apply a target using the target mechanism. Thetargetoptionskeyword is optional and not generated automatically.

(27)

CHAPTER 5

Related Work

5.1

Docopt

Docopt [16] is a framework which generates option and argument parsing based on a written help text. Docopt was originally developed for the Python [17] language, but has since been ported to a multitude of languages including a limited implementation in C. It is based on longstanding conventions in writing help texts, as well as the heuristic that “a good help message has all necessary information in it to make a parser” [18].

A help text in the docopt language consists of two parts: a usage pattern, and a list of option descriptions. The usage pattern consists of the program name followed by a list of arguments, options, or commands. Special syntax is used to denote optional, required, mutually exclusive,

or list versions of these. Option descriptions contain the option name, possible arguments,

descriptions, and default values. An example of a help text is shown in Figure 5.1.

1 N a v a l F a t e . 2 3 U s a g e : 4 n a v a l _ f a t e . py s h i p new < name > . . . 5 n a v a l _ f a t e . py s h i p < name > m o v e <x > <y > [ - - s p e e d = < kn >] 6 n a v a l _ f a t e . py s h i p s h o o t <x > <y > 7 n a v a l _ f a t e . py m i n e ( set | r e m o v e ) <x > <y > [ - - m o o r e d | - - d r i f t i n g ] 8 n a v a l _ f a t e . py ( - h | - - h e l p ) 9 n a v a l _ f a t e . py - - v e r s i o n 10 11 O p t i o n s : 12 - h - - h e l p S h o w t h i s s c r e e n . 13 - - v e r s i o n S h o w v e r s i o n . 14 - - s p e e d = < kn > S p e e d in k n o t s [ d e f a u l t : 1 0 ] . 15 - - m o o r e d M o o r e d ( a n c h o r e d ) m i n e . 16 - - d r i f t i n g D r i f t i n g m i n e .

Figure 5.1: Example of a help text in the docopt language for a Python program named Naval Fate.

When used in Python, docopt uses the help text and program argument array to return a dictionary holding the options and their values. Flags are formatted as booleans, while other

values are set as strings or lists of strings. An example of a return dictionary is shown in

Figure 5.2. When used in C, an include file must first be generated by running docopt on a separately defined help text. The framework can then be used to parse the arguments in a

similar way, and returns the values in a generatedDocoptArgsstruct.

In summary, docopt is a very capable framework that is able to generate much functionality from a single canonical description. However, usage of it can complicate a development process as each change of the command-line interface necessitates a rewriting of the help text, and simple prototyping without writing usage is not possible. It is also unable to do automated parsing

(28)

1 {’ - - d r i f t i n g ’: False , ’ m i n e ’: False ,

2 ’ - - h e l p ’: False , ’ m o v e ’: True ,

3 ’ - - m o o r e d ’: False , ’ new ’: False , 4 ’ - - s p e e d ’: ’ 15 ’, ’ r e m o v e ’: False ,

5 ’ - - v e r s i o n ’: False , ’ set ’: False ,

6 ’ < name > ’: [’ G u a r d i a n ’] , ’ s h i p ’: True ,

7 ’ <x > ’: ’ 100 ’, ’ s h o o t ’: False ,

8 ’ <y > ’: ’ 150 ’}

Figure 5.2: Example of a return dict from the docopt entry function.

beyond some language-specific conversions. Finally, it does not handle documentation outside of the help text and is unable to describe configuration file interactions.

5.2

GCC option language

Similar to the framework described in this thesis, GCC internally makes use of a special option definition language to define options available on the command line and their associated variables. This language is used in multiple places within GCC for both common and target-specific options. The language is structured as a list of records which can define variables, enums, enum values, options, and other records. The full specification of the format can be found in the official repository [19].

Much like the framework DSL, the GCC language supports explicit definition of enums as a commonly used pattern for option values. Enums are defined as a single language record, and their values are defined separately as illustrated in Figure 5.3. Variables may also be defined and

used in the language, and are aliased to fields in a globalstructin the final C code.

1 E n u m 2 N a m e ( d i a g n o s t i c _ p r e f i x i n g _ r u l e ) T y p e ( int ) 3 4 E n u m V a l u e 5 E n u m ( d i a g n o s t i c _ p r e f i x i n g _ r u l e ) S t r i n g ( o n c e ) V a l u e ( D I A G N O S T I C S _ S H O W _ P R E F I X _ O N C E ) 6 7 E n u m V a l u e 8 E n u m ( d i a g n o s t i c _ p r e f i x i n g _ r u l e ) S t r i n g ( every - l i n e ) V a l u e ( D I A G N O S T I C S _ S H O W _ P R E F I X _ E V E R Y _ L I N E )

Figure 5.3: Example of an enum definition in GCC’s option definition language. In this case

Nameis the identifier for the enum andTypeindicates the type of options using this enum. Values

part of the enum are defined asEnumValuerecords, withEnumreferring to an enum identifier,String

indicating the value on the command line, andValuegiving the value in C corresponding to the

string.

The main focus of the language are command-line options. The language allows the definition of options with their properties and help texts. Examples of properties are: the optionality and amount of expected arguments; valid ranges and unsigned integer restrictions; which variables are used to store the option’s values; aliases to other options; option interactions such as negations and enables from combinations; as well as categories used when printing the help text.

Help text is defined at the end of each option record. Normally this text is printed to the right of the associated option, separated by a tab. This behaviour can be overridden by including a tab character in the help text, at which point the text to the left of the tab character is used as the left-hand text. A few examples of options are shown in Figure 5.4.

Compared to the framework language which groups related definitions at the same location in code, the GCC language consists entirely of separate records which refer to each other. Advan-tages of the GCC language are increased modularity of records especially when used in multiple places, as well as a flatter code profile. Information of various types in regards to the final C code can also be grouped together, such as variable definitions. Advantages of the framework

(29)

1 f d i a g n o s t i c s - show - l o c a t i o n =

2 C o m m o n J o i n e d R e j e c t N e g a t i v e E n u m ( d i a g n o s t i c _ p r e f i x i n g _ r u l e )

3 - f d i a g n o s t i c s - show - l o c a t i o n =[ o n c e | every - l i n e ] How o f t e n to e m i t s o u r c e l o c a t i o n

at the b e g i n n i n g of line - w r a p p e d d i a g n o s t i c s . 4 5 - v e r b o s e 6 D r i v e r A l i a s ( v ) 7 8 O 9 C o m m o n J o i n e d O r M i s s i n g O p t i m i z a t i o n

10 -O < number > Set o p t i m i z a t i o n l e v e l to < number >.

11

12 W e r r o r

13 C o m m o n Var ( w a r n i n g s _ a r e _ e r r o r s )

14 T r e a t all w a r n i n g s as e r r o r s .

Figure 5.4: Some cherry-picked examples of options definitions in GCC’s option definition lan-guage. Each options consists of a name used on the command line (excluding the leading hyphen), a set of properties, and a help text.

language are decreased verbosity in some cases such as enums, and easier to see relations between options and various code aspects like variables. The locally grouped structure also allows for natural grouping of logically related options.

The two languages also differ in terms of capabilities. For example, the GCC language is capable of defining relations between options, such as allowing certain combinations to enable or disable other options. This also allows it to do cross-validation, something that the framework

does not provide. It also includes more specific functionality, such as transforming text to

lowercase or masking values. Including of custom structures is also possible. In contrast, the framework DSL infers and extrapolates more information from fewer definitions and generates more code based on that, making the most common cases shorter to define. In general, the gcc language is capable of defining more functionality more concisely, while the framework language focuses on supporting a few core patterns and clarity at a glance.

5.3

Getopt and others

In addition to the two language-based code generators mentioned above, many generic command

line parsing frameworks also exist. The previously mentioned GNUgetoptutility is an example of

these, as is theargparse[20] module included in the standard Python library. Some configuration

file parsing utilities also exist, see the Pythonconfigparser[21] module.

The main difference of these utilities from the two language-based generators described above is that all configuration opportunities must be defined inline in the code. Due to this such utilities are intrinsically unable to generate documentation outside of the standard program help text. They also require the specification to be defined in the same language as the target program, which may be unnecessarily verbose or preclude syntax checking opportunities. However, they may have more direct access to code objects or be defined non-linearly as they are generally represented as a code object themselves.

As the naming in this section suggests, these utilities provide less functionality than a com-plete framework and require the programmer to do more of the work. In return, they are generally

more flexible and extensible, as is the case withgetopt. Thus these utilities may be useful if the

(30)
(31)

CHAPTER 6

Discussion

6.1

Limitations

While the described framework includes support for field value validation, it is unable to do cross-validation of multiple fields. To achieve this, it is recommended that the user defines the configuration as normal and adds any additional consistency checks in a special code pass as required. The framework also does not support custom functionality, neither as a custom parser nor using custom structures or callbacks. For this too it is recommended that the user defines a configuration, then custom parses any strings or checks flags for callbacks themselves. Finally, the framework is not built to explicitly handle the previously named ‘hybrid’ convention which takes a ‘sub-program’ argument. To circumvent this, it is recommended to use a frontend program in strict POSIX parsing mode together with a string list for ‘sub-arguments’ which should be passed to a second instance of the framework.

The framework does not implement a parsing mode mixing the extended GNU andautoexpand

modes. This is mainly due to ambiguity occurring when options starting with a hyphen are defined alongside POSIX short options, as parsing would then have to decide between interpre-tation as POSIX short forms and literal abbreviation. Thus, support for this would increase the amount of logic required, leading to an increase in development time as well as the running time of the parser.

6.2

Features

The framework allows for the generation of parsing code for both command line and configuration file, as well as value validation and documentation from a single DSL source. In this respect it is different from other tools which implement similar behaviour. Such tools may lack support for certain features such as a separate configuration file as is the case of GCC’s internal language, or string parsing behaviour in the case of docopt, or they may be rather more minimal tools which

confine themselves to a single aspect like getopt.

An alternative solution to using the framework is to combine one or more of these tools to achieve the desired result. By doing this it may be possible to take advantage of possible complex features of more specialised tools, at the cost of more code duplication. Implementation of a complex crossover feature like the provided target options may be mimicked in a manner

similar to the@fileflag in GCC.

As seen, the framework chooses to forgo implementation of features allowing increased cus-tomisation and control in order to generate the code that it does. By doing so, it is able to generate all code related to configuration from a single definition. The framework strives to cover common use cases in spite of this by designing the feature set based on real-world applica-tions.

(32)

6.3

Evaluation

It is difficult to evaluate and compare the performance of the framework to a more manual solution, in a software development sense. For this a large scale user survey would be needed, ideally developing and comparing multiple programs using both solutions. It is also difficult to define and measure the individual components of this performance, as well as the way they are combined for an overall result. For these reasons such an evaluation is not included as part of this thesis.

A smaller scale evaluation for the coverage of the framework may be possible by comparing the possible configuration definitions to real-world programs. However, much of this is dependent on the exact behaviour of the program in question and the exact implementation of its interaction with its configuration. It may also be possible that the configuration of a program is designed in one way due to legacy reasons, but could have been designed in a more-or-less equally difficult to implement but different way which is more or less suited to the framework. Due to this such an evaluation would have to be done on a subjective case-by-case basis, and thus has also not been included as part of this thesis.

6.4

Language

The focus of this thesis so far has been on not only the capabilities of the framework but also the definition of the language. The main reason for the use of a DSL is to help balance code readability with brevity. It is however not strictly necessary to make use of a DSL. An alternative solution might have been to use a general-purpose data serialisation language such as YAML.

YAML is a data serialization language that aims to be human readable. The two main data structures used in YAML are associative arrays and lists, and a typical file written in YAML consists of a list of key-value pairs, both of which may themselves be lists. Its syntax is designed as mainly consisting of whitespace, using indentation for nesting similar to Python, and short symbols such as hyphens denoting lists. It is also capable of automatic type interpretation depending on the format of a value. An example of YAML syntax is given in Figure 6.1. The full syntax of YAML can be found at its homepage [13].

Interestingly, YAML is a superset of another data-serialization language, JSON [22]. This means that all JSON definitions are also valid YAML definitions. JSON syntax is also based on lists and associative arrays. However, it does not include automatic type conversion and requires explicit symbols to denote said data structures and strings. It can be inferred that YAML syntax is generally shorter than JSON, and in general can be said to be very short compared to other common data-serialisation languages such as XML.

1 f i e l d s : 2 - i n f i l e s : 3 i n f o : L i s t of i n p u t f i l e s 4 t y p e : s t r i n g l i s t 5 d e f a u l t : [ ’ file1 ’ , ’ f i l e 2 ] 6 a r g u m e n t : yes

Figure 6.1: Example of a YAML definition.

Usage of YAML allows definitions which are both short due to the syntax, and readable at a glance due to the key-value structure used. The main advantage the usage of a DSL has over YAML is to allow special shorthand of certain very commonly used keys, such as default values or types. In YAML, each of these common values would require its own key, thus increasing the size of even basic configurations by a large amount. Some of YAML’s features as a generic language are also unnecessary or even detrimental; for example, arbitrary keys should not be used in places where attributes are expected.

(33)

CHAPTER 7

Conclusion

The framework described in this thesis allows for the automatic generation of parsing code for program configuration on both the command line and from file, as well as the corresponding documentation, all from a single source defined in a custom DSL. This allows the programmer to define all aspects of the program’s configuration in a single location and removes the overhead associated with writing the code and documentation allowing for faster development. By using C as the target language for code generation and supporting POSIX and GNU conventions, the framework aims to be universally adoptable. It also aims to cover as many possible programs as possible by including commonly used features like validation and enumeration.

To determine which exact features are common enough to warrant inclusion a short research is done on existing compilers. From this is determined that direct value-setting flags and options are most common. Ways to set multiple commonly combined options at once also tend to be included in some way. Three such ways are implemented as groupflags, optionsets, and the target mechanism. The research and chosen features form the answers of research questions 1 and 2 respectively.

It is also found that in many cases the parsing and validation code surrounding the option

interface is similar. Support for parsing is implemented by limiting support to the bool, int,

float,string, andlisttypes and requiring the type to be explicitly declared. Validation features

are implemented in the form of uint and enum types, as well as number ranges. Thus structure

defining code, parsing and validation code, and documentation code should be generated to support these patterns, answering research question 3. Based on this it also becomes clear what information needs to be provided by the developer, answering research question 4.

With this knowledge a DSL is designed with a minimal syntax requiring only the absolutely necessary information, allowing it to be concise for the most common cases while maintaining expressiveness and readability for more complex cases. Shorthand is used to further minimize the necessary characters for a minimal definition, while usage of keywords for less common features increases code readability and allows for more complex features. In this way a minimal DSL is designed after answering the four research questions posed.

7.1

Future work

The framework in its current state is fairly barebones, supporting only the most common options in compilers. Considering the vast amount of configuration specifications in use, a number of things could be done to extend the framework in the future. Some of these may be targeted, but most should generally follow an evaluation performed on the applicability of the features of the framework on real-world use cases.

Examples of evaluations which could be performed might be the evaluations mentioned in the previous chapter. In particular, the evaluation concerning the framework’s ability to model real-world program configurations for the consideration of features to implement. For this it would be necessary to define metrics to use, such as the amount of configuration which is explicitly

(34)

supported, as well as the amount of code needed to be written to model the behaviour, both in the DSL and the resulting program.

Following this evaluation it would be possible to implement specific features to model pat-terns commonly used but not currently supported. An example of a feature which could be implemented is explicit support for ‘subprogram’ patterns. While a reasonably accurate way to model this behaviour is proposed in the previous section, this requires more work from the devel-oper and is somewhat limited as the framework is unable to detect relations between subprogram configurations. By providing explicit support for this pattern it would be possible to implement specific features using these relations.

Another possible feature extension is support for cross-validation of options. Examples of these are configuration fields which are disabled by other fields or enabled by logical combinations of other fields, as in the GCC language. Specific extensions like these would require extensions of the language syntax as well as implementation of new generating code.

(35)

Bibliography

[1] index [SaC-Home]. url: http://www.sac-home.org/doku.php (visited on 05/29/2019).

[2] M. Timmerman. “CoCoNut: A metacompiler-based framework for compiler construction

in C: Scalability, modularity, space leak detection and garbage collection”. In: (2017). BSc informatica, University of Amsterdam.

[3] L. Coltof. “CoCoNut: a Metacompiler-based Framework for Compiler Construction in C:

High-productivity, Traversal Optimization, and AST Serialization”. In: (2017). BSc infor-matica, University of Amsterdam.

[4] Utility Conventions. url: http : / / pubs . opengroup . org / onlinepubs / 9699919799 /

basedefs/V1_chap12.html (visited on 05/01/2019).

[5] Argument Syntax (The GNU C Library). url: https://www.gnu.org/software/libc/

manual/html_node/Argument-Syntax.html (visited on 05/01/2019).

[6] Example of Getopt (The GNU C Library). url: https://www.gnu.org/software/libc/

manual/html_node/Example-of-Getopt.html (visited on 05/29/2019).

[7] Getopt Long Option Example (The GNU C Library). url: https : / / www . gnu . org /

software / libc / manual / html _ node / Getopt - Long - Option - Example . html (visited on 05/29/2019).

[8] GCC, the GNU Compiler Collection - GNU Project - Free Software Foundation (FSF).

url: https://gcc.gnu.org/ (visited on 05/02/2019).

[9] Clang C Language Family Frontend for LLVM. url: https://clang.llvm.org/ (visited

on 05/02/2019).

[10] OpenJDK. url: https://openjdk.java.net/ (visited on 05/02/2019).

[11] The Compiler Group. url: https://openjdk.java.net/groups/compiler/ (visited on

05/02/2019).

[12] Using the GNU Compiler Collection (GCC): Option Summary. url: https://gcc.gnu.

org/onlinedocs/gcc/Option-Summary.html (visited on 05/02/2019).

[13] The Official YAML Web Site. url: https://yaml.org/ (visited on 06/03/2019).

[14] leegbestand/CoCoNut-lib: Library for the CoCoNut framework. url: https : / / github .

com/leegbestand/CoCoNut-lib (visited on 06/05/2019).

[15] Airgetfrog/CoCoNutMan. url: https://github.com/Airgetfrog/CoCoNutMan (visited

on 06/15/2019).

[16] docopt—language description of command-line interfaces. url: http://docopt.org/

(vis-ited on 06/04/2019).

[17] Welcome to Python.org. url: https://www.python.org/ (visited on 06/04/2019).

[18] docopt/docopt: Pythonic command line arguments parser, that will make you smile. url:

https://github.com/docopt/docopt (visited on 06/04/2019).

[19] gcc.gnu.org Git - gcc.git/blob - gcc/doc/options.texi. url: https://gcc.gnu.org/git/?p=

gcc.git;a=blob;f=gcc/doc/options.texi;h=1c83d24148818133f0854020c0d7f9c8dbba7798; hb=HEAD (visited on 06/03/2019).

(36)

[20] argparse — Parser for command-line options, arguments and sub-commands. url: https: //docs.python.org/dev/library/argparse.html (visited on 05/01/2019).

[21] configparser — Configuration file parser. url: https://docs.python.org/3/library/

configparser.html (visited on 05/01/2019).

Referenties

GERELATEERDE DOCUMENTEN

The information derived from the investigation of social spaces on and off the Bloemfontein campus shows that the space in which social activity is deployed has implications

This moderator ‘effort’ is expected to have an effect on the relation of leaflet design and consumers’ WTP in a loyalty program, which is distinguished in three variables;

Performance improvement of operational power plants is the most cost-effective way to increase the energy producing capabilities of a utility while improving the overall

In dit onderzoek zijn de gevolgen van de introductie van een ERP-systeem voor drie aspecten van de finan- ciële functie binnen een multinational onderzocht: de omvang van de

In de tweede analyse wordt bekeken of de niet-finan- ciële maatstaven voorspellende waarde hebben voor toekomstige financiële maatstaven wanneer voor de huidige financiële maatstaven

The employees have received, recently or in the past, open-ended customer feedback (complaints, compliments, or suggestions) on their creative ideas, products or services..

(Natuurlijk alleen dan als u nog niet heeft ingeschreven.) Speciaal voor leraren bij andere takken van on&amp;rwijs, die met de moderne wiskunde nader wensen kennis te maken

Wanneer nu in het s-nivo een ver- koopprognose is verwerkt die gebaseerd is op de gemiddelde vraag gedurende de afgelopen 10 perioden, moet men er zich van bewust zijn dat door