First-Class Support for Resugaring in Rascal

(1)

Master’s Thesis

Wouter Nederhof

Supervisor: dr. Tijs van der Storm Centrum voor Wiskunde en Informatica

Faculty of Science University of Amsterdam

The Netherlands January 5, 2016

(2)

Abstract

In this Master’s thesis, we aimed to investigate how capable resugaring techniques are for real-world use. As part of our research, we created a prototype based on a literature study and improved this artefact using observations from different case studies.

We started our research by studying two papers of J. Pombrio and S. Krishnamurthi on resugaring to build our initial prototype[1][2]. During our research, we observed that the techniques described in ”Lifting Evaluation Sequences through Syntactic Sugar” were neither efficient nor expressive enough for our case studies. The techniques described in the paper ”Hygienic resugaring of compositional desugaring” had sufficient performance capacities, but were too constrained for resugaring terms.

Following the literature study, we integrated the techniques from these papers into Rascal. During the implementation, we met two notable problems. First, we observed that patterns containing ellipses can break symmetry. As a solution for this problem, we devised a technique in which the lengths of ellipses are fixed after a transformation. Second, we had to work around a constraint in the techniques of the latter paper, which required transformation functions to consume and produce patterns. We found this to be a major restriction, since Rascal did not have a pattern data type. This meant that we had to devise new techniques to obtain a similar level of expressivity.

Although our prototype was based on techniques by J. Pombrio and S. Krishnamurthi, adapt-ing and extendadapt-ing their techniques to Rascal ultimately led to a significantly different design. In order to sustain our design decisions, we sketched how we could formally address most of them.

Finally, we demonstrated that our finished prototype was capable of desugaring and resug-aring multiple different cases. Some of these cases could not be addressed using the original techniques and required a change in our design. We showed that our prototype was able to desugar and resugar different categories of syntactic sugar in ES6. Additionally, we were able to build an evaluation stepper that desugars a functional programming language into the Lambda Calculus, step through the code and resugar intermediate results, including Church numerals. Lastly, we demonstrated that our prototype is capable of desugaring and resugar-ing terms over 50.000 characters in size in just over a second, and how the different techniques perform using a performance benchmark.

(3)

Introduction

Krishnamurthi calls for arms for more research on the topic of desugaring, an essential tool to reduce the size of programming language implementations[3]. What is desugaring, and why is it important that desugaring gains more attention?

Desugaring is the process of eliminating ”syntactic sugar”. Syntactic sugar is syntax that is used to make a programming language easier for humans to read and write without increasing the language’s functionality[4]. Desugaring transforms constructs containing syntactic sugar into semantically equivalent constructs fundamental to a language processor.

The main advantage of this technique is that a language processor can remain relatively small, because it only needs to contain semantics for the desugared code. However, since the original program is transformed into another representation, a language processor may unintentionally produce output that is foreign to what the user has typed.

This problem becomes apparent during debugging. When a user debugs a program, the user is often interested in certain aspects of the state of that program at specific moments during its execution. In a functional programming language, for example, the user may be interested in the reduction steps taken by the evaluator. If the initial program was desugared, then these reduction steps may look very different from what the user has typed. This strains the user to debug a program after it is desugared.

To address this problem, J. Pombrio and S. Krishnamurthi introduced a technique called ”resugaring”[2][1]. Resugaring is the act of reconstructing the surface level representation (what the user typed) from the core level representation (what the language processor uses). This way, desugared terms can be represented in the language of the user.

In this Master’s thesis, we research if the resugaring techniques found throughout literature are sufficiently capable to handle real-world problems, and what the requirements and obstacles are to integrate practical support for resugaring into a readily existing meta-programming language.

(8)

1.1 Context

The scope of our research is limited to the resugaring techniques found in two papers by J. Pombrio and S. Krishnamurthi, as we are unaware of other formalizations. For addressing real-world problems, we focus on expressiveness, performance and language-related challenges. We do not address hygiene1[5] in full detail. Furthermore, we restrict our research to the use of resugaring techniques within a programming language, as altering an existing language processor to produce trackable2 desugared terms is already discussed[1]. We use Rascal as our subject of study to research the challenges associated with implementing resugaring into a readily existing programming language.

1.2 Motivation

We believe that resugaring is a promising technique to address the problem that desugared terms are often foreign to the user (we discuss other techniques in the Related Work). It is also a novel technique, as it was first formally addressed in 2014[1].

We study resugaring techniques for real-world applications because the papers discussing these techniques do not provide much evidence to show that their techniques are sufficiently capable to address practical real-world problems. Instead, they are more theoretical in nature. Furthermore, we study how resugaring can be integrated into a readily existing language be-cause we are unaware of any programming language that integrates language-level support for resugaring. Although both papers by J. Pombrio and S. Krishnamurthi present a prototype of their resugaring techniques, both are essentially very small domain-specific languages instead of a fundamental part of a larger programming language. We believe that integrating resugar-ing into a larger meta-programmresugar-ing environment opens up many doors, because it allows for these techniques to interplay with other tools for constructing programming languages. This makes Rascal a perfect fit for our study. Rascal is a mature meta-programming work-bench that attempts to integrate all the tools necessary for analyzing and constructing pro-gramming languages[6][7][8]. By using Rascal as our subject of study, we uncover the practical challenges related to resugaring instead of just the theorical ones.

1.3 Research Questions

Central to our study are the following research questions:

• Are the resugaring techniques found throughout literature expressive enough for prac-tical applications?

• Are the resugaring techniques found throughout literature efficient enough for practical applications?

1

Preventing the accidental capture of variables or other identifiers

2

(9)

• What are the challenges related to integrating expressive and efficient support for re-sugaring into a readily existing meta-programming language?

1.4 Research Method

Our approach to study the practical challenges related to resugaring was to build a prototype based on a literature study. We tested our prototype against different use cases to determine whether or not our prototype met our goals. We gradually improved upon our prototype until our prototype allowed us to meet our goals. When our prototype was complete, we evaluated our prototype using the cases we used to develop that artefact. Finally, we sketched a number of proofs to support our design. We based our answers to the research questions on our observations and the results of the evaluation. We used the following case studies to build and evaluate our prototype:

Performance

1. Desugaring and resugaring a functional programming language to and from the Lambda Calculus respectively

2. Performance benchmarking to measure the differences in the techniques we used Expressiveness

1. Adding support for resugaring to three different cases of Rmonia’s ES6 to ES5 desug-aring mechanisms

2. Construction of an evaluation stepper with support for resugaring for another functional programming language that desugars into the Lambda Calculus

1.5 Contributions

The main contribution of this thesis are:

• An analysis regarding the practical use of different desugaring and resugaring techniques • An analysis of the engineering challenges regarding the integration of resugaring into a

readily existing programming language

• A design for extending Rascal with first-class support for resugaring, supported by a proof sketch of its correctness

• A fully functional prototype for resugaring in Rascal

1.6 Related Work

Our work is based on two papers on resugaring by J. Pombrio and S. Krishnamurthi[1][2]. The basis of our work is to a large extent an evaluation and extension of their techniques in a practical context. We discuss their techniques in more detail in Chapter 4.

(10)

One of the key elements of resugaring is tracking the origin of terms. This is similar to origin tracking as described by A. van Deursen, et al.[9]. Origin tracking refers to relating a transformed term’s subterms to the respective subterms prior to the transformation. The paper by van Deursen, et al. presents a formalization of origin tracking for use in a term rewriting system during evaluation. However, while their work is focused on tracking origins during evaluation, resugaring is focused on the relation between evaluated terms and syntactic sugar.

Another technique to relate source code that originates from a transformation to its initial term, is source mapping. Source maps are simply maps from locations in transformed source code to their respective locations prior to transforming. Mozilla Firefox’s Javascript Debugger, for instance, allows the use of source maps to relate minified or transpiled Javascript code to their original source3. This is essentially a subset of the problem we study, because in their case, terms are only related to other terms, whereas resugaring attempts to reconstruct these terms.

In other cases, such as in the Lambda Calculus Evaluator from Michael I. Schwartzberg4, terms are simply translated to another representation without origin tracking. The benefit of this technique is that reconstruction is relatively simple and straightforward: simply parse the core term and return the semantically equivalent surface term. However, since terms are not tracked, there is no way of telling whether the output term reflects the input language of the user.

The techniques from J. Pombrio and S. Krishnamurthi are essentially well-behaved lenses. A lens is a bidirectional transformation used to create a ”view” from an object that can be updated, thereby consistently updating the original representation as well. A lens consists of two functions, get and put[10]:

l % (C) ⊆ A (Get)

l & (A × C) ⊆ C (Put)

And is called well-behaved when it adheres to the Lens Laws:

l & (l % c, c) v c for all c ∈ C (GetPut)

l % (l & (a, c)) v c for all (a, c) ∈ A × C (PutGet)

Where C is the source, A is the target and f (a) v b is defined as f (a) = ⊥ ∨ f (a) = b (where ⊥ means that there is no valid value). Essentially, the first law for well-behaved lenses states that whenever a view c is updated to its own view value, the value remains the same or is invalid. The second law states that whenever a value is changed in the view, getting that value from the view yields that same value or is invalid.

Throughout this thesis we name different benefits of resugaring regarding debugging. Among the techniques that could potentially benefit the most from resugaring, is tracing. Tracing is simply producing information about the state of a program during different moments in its execution. It is sometimes called ”printf debugging”, called after the command in C that is often used to trace. Essentially, printing an evaluation sequence is a form of tracing (which

3

See: https://developer.mozilla.org/en-US/docs/Tools/Debugger/How to/Use a source map

(11)

we use in one of our case studies). Similar to tracing is postmortem debugging[11], which is essentially debugging a program after it has crashed. As we will see in one of the examples in Chapter 3, postmortem debugging may also benefit from our techniques.

Resugaring is one of different research topics on desugaring. As we noticed during the evalu-ation of our prototype, desugaring may lead to an enormous overhead in terms of the amount of produced code. As it turns out, Krishnamurthi named two techniques to address this problem: ”Shrinking output in a semantics-preserving way”[3] and ”Shrinking output by al-tering semantics”[3]. The first technique is straightforward: simply replace bloated and/or inefficient terms with semantically equivalent, smaller and more performant terms. The sec-ond technique is similar to the first, but ignores some very specific edge cases such as the potential use of Javascript’s eval function and Java’s reflection mechanisms. The benefit of ignoring these cases is that there is much more room for deflating terms, but comes at the price of possible semantical inconsistencies. These inconsistencies, however, may easily be avoided when the user is aware of these transformations. A case study by Junsong Li, et al. demonstrates this technique in practice[12].

1.7 Outline

This thesis has the following structure:

• In Chapter 2, we discuss concepts that are relevant for understanding this thesis. • In Chapter 3, we explain desugaring and resugaring in more detail.

• In Chapter 4, we summarize the different resugaring techniques found in literature. • In Chapter 5, we discuss the design of our prototype from a user perspective.

• In Chapter 6, we discuss how our prototype was built and what we observed during this process.

• In Chapter 7, we present a proof sketch of our design’s correctness. • In Chapter 8, we proceed to evaluate our prototype.

• In Chapter 9, we discuss our findings and observations. • Finally, we conclude this thesis in Chapter 10.

(12)

Chapter 2

Background

In this chapter, we discuss concepts that are relevant to understanding this thesis.

2.1 Concrete and Abstract Syntax

Throughout this thesis, we occasionally refer to concrete and abstract syntaxes. By con-crete syntax we mean syntax containing information about the textual representation (e.g. whitespaces and layout). By abstract syntax we mean syntax that does not contain this information.

We follow Rascal’s syntax for denoting terms. We denote concrete terms by wrapping ‘ and ‘ around the textual representation for untyped terms and (Type)‘ and ‘ for typed terms. For abstract terms, we denote arrays using [ and ], nodes or function calls using n(p1, ..., pn) and

maps using {key1 → value1, ..., keyn→ valuen}.

2.2 Pattern Matching and Substitution

Every desugaring and resugaring technique discussed in this thesis is based on pattern match-ing and substitution. Pattern matchmatch-ing is the act of strictly matchmatch-ing a term against a pat-tern, producing a map of pattern variable names to the respective subterms. Substitution in this context is the act of replacing the variables in patterns by values, thereby producing a term.

Throughout this thesis, we use patterns that are similar to terms, but contain pattern variables that may be bound to respective (sub)terms after matching a term. We use untyped pattern variables that match arbitrary terms, typed pattern variables that match equally-typed terms and (typed) ellipses that match zero or more terms.

We follow Rascal’s syntax for denoting patterns. We denote concrete patterns using (Type)‘ and ‘, untyped pattern variables in abstract patterns using italics, and constants in abstract patterns using double quotes (””). We denote constants in concrete patterns using italics,

(13)

since we do not consider untyped pattern variables in concrete syntaxes. We use the asterisk-token (*) to denote ellipses allowing zero or more terms, and the plus-asterisk-token (+) to denote ellipses allowing one or more terms. Typed variables in concrete patterns are written using ‘<Type name>‘ and in abstract patterns using Type name. Finally, we denote concrete ellipses with a delimiter as {Type ”constant”} followed by either a + or a ∗.

We now provide some examples of patterns to illustrate how we denote patterns throughout this thesis and how they can be matched against terms.

• (Exp)‘<Exp e1> + <Exp e2>‘ matches the term ‘1 + 1‘ if ‘1‘ is of the type Exp, after which both e1 and e2 are bound to 1.

• (Exp)‘<Exp e1>+<Exp e2>‘ does not match the term ‘1−1‘, because the minus-token is different from the plus-token in the pattern.

• [”a”, ”a”, v] is an abstract pattern containing an array of two constants ”a” and a pattern variable v. This pattern matches the term [”a”, ”a”, ”a”] where v is bound to ”a”, but does not match the term [”a”, ”a”, ”a”, ”a”].

• [”a”, v∗, ”a”] matches the term [”a”, ”a”, ”a”], where v is bound to [”a”], and also matches the term [”a”, ”a”, ”a”, ”a”], where v is bound to [”a”, ”a”].

• ‘f unction({Var ”, ”}+ var)‘ matches the term ‘f unction(arg)‘ (if arg is of the type Var), matches the term ‘f unction(arg1, arg2)‘, but does not match the term ‘f unction()‘. Finally, we denote pattern matching using t/P , meaning that the term t is matched against a pattern P . We denote substitution as P σ, in which the pattern variables in P are replaced by variables in the variable map σ (a variable map is a map from variable names to variable values).

2.3 Rascal

Throughout this thesis we provide examples using Rascal, and explain which mechanisms we altered to support resugaring. The most important facets of Rascal we discuss are Rascal’s typing system, syntax declarations, function declarations and annotations. As such, we pro-vide a brief overview of the topics we discuss throughout this thesis, and illustrate how they apply to Rascal using simple examples.

2.3.1 Static Typing

Rascal is designed to be statically typed. However, this is currently not enforced, and types are checked dynamically.

2.3.2 Syntax Declaration and ADTs

Rascal allows the user to define concrete syntax declarations and abstract datatypes (ADTs). Concrete syntaxes are defined using a syntax declaration. For example:

(14)

1 syntax Exp = plus: Num n1 "+" Num n2;

Lexical tokens are defined using a lexical declaration. For example:

1 lexical Num = [0-9]*;

These two simple definitions allow the user to write a term that adds two numbers, for example (Exp)‘10 + 2’.

Abstract data types are defined using the data declaration. For example:

1 data Exp = plus(Exp, Exp) | num(int n);

This abstract data type allows for multiple numbers to be added, e.g. plus(plus(num(1), num(1)), num(2)).

2.3.3 Functions

The simplest way in Rascal to define a function is using an expression function declaration. For example:

1 int add(int m, int n) = m + n;

We can also add so-called when-conditions to expression functions, which are conditions that have to be met prior to the function’s execution. For example:

1 int add(int m, int n) = m + n 2 when m > 0, n > 0;

Another way to define a function is by using a statement block. For example:

1 int add(int m, int n) { 2 if (m > 0 && n > 0) { 3 return m + n;

4 }

5 fail; 6 }

This function is semantically equivalent to the function in the previous example.

A function’s signature may contain patterns that have to be matched prior to calling a func-tion. For example (note the plus and num constructors in the function’s signature):

1 int add(plus(num(int m), num(int n)) { 2 if (m > 0 && n > 0) {

3 return m + n;

4 }

5 fail; 6 }

Here we use the previously defined ADT of plus. However, we can do the same for concrete patterns. For example:

(15)

1 int add((Exp)‘<Num e1> + <Num e2>‘) {

2 int m = toInt("<e1>"); int n = toInt("<e2>"); 3 if (m > 0 && n > 0) {

4 return m + n;

5 }

6 fail; 7 }

when-conditions may also contain matching conditions. For example:

1 int add(p) = m + n

2 when plus(int m, int n) := p;

Here, := is the matching operator that returns true if a match is succesful and false otherwise (using the pattern on the left-hand-side and an expression producing a term on the right-hand-side). If a match is succesful, it will bind the variables found through matching to the environment.

2.3.4 Annotations

Annotations are essentially ”transparent” datatypes that can be attached to a term. By transparent we mean that programs in Rascal are oblivious to annotations unless specifically targeted. Furthermore, annotations are completely ignored during pattern matching.

Prior to annotating a term, the annotation attached to a type needs to be declared using the anno-token. For example:

1 data Exp = plus(Exp, Exp) | num(int n); 2 anno str Exp @ label;

We can now attach the label annotation to terms of the type Exp. Annotations can be set on a value using variable[@key = value], and can be retrieved from a term using variable@key. For example:

1 Exp p = plus(num(1), num(1)); 2 p = p[@label = "1 + 1"]; 3 println(p@label);

This example produces the string ”1 + 1”. Note that we have to set p after assigning the annotation to p because all data in Rascal is immutable.

2.4 Lambda Calculus

Throughout this thesis, we provide examples in terms of the Lambda Calculus. Further-more, we evaluate our design using two artefacts that desugar and resugar into the Lambda Calculus.

(16)

The Lambda Calculus is a calculus invented by Alonzo Church in 1936. Using the Lambda Calculus, he was able to prove that there is no general solution for the Entscheidungsproblem (the decision problem)[13][14]1.

The Lambda Calculus consists of expressions that are composed of Lambda functions, ap-plications and variables. Lambda functions are functions that take a number of arguments and a Lambda expression, and are written as: ‘λp1...pn.E‘, where E is the function’s internal

expression and p1...pn are its parameters (which are variables).

Applications are written as ‘E1 E2‘, in which E2 is ”applied” onto E1. An application on

a Lambda function (i.e. a reducible expression or redex ) means that every occurence of the leftmost parameter in the Lambda function’s expression can be substituted by E2 and then

removed from the list of variables. This process is called a β − reduction. For example, the Lambda expression (λx.x)y becomes y after β − reduction since x-s are substituted by y. However, (λxy.x)x becomes λy.x, since (λxy.x)x is essentially a shorter way of writ-ing (λx.λy.x)x and (λx.(λy.x))x. (Note that expressions are always bound to their closest Lambda function.)

Note that if we β − reduce (λxy.x)y, we get λy.y, for which the inner y is wrongly bound to parameter y of the Lambda function. This is called variable capture. To avoid variable capture, Lambda Calculus relies on capture avoiding substitution, in which substition may only occur if no variable capture can occur. To allow a redex to be reduced that cannot be reduced due to capture avoiding substitution’s constraints, a process called α-renaming may be used to rename variables (i.e. variables which are used by ancestor Lambda function parameters). In the example above, we can rename the right-most y to z, giving us (λxy.x)z (since y was a free variable), which can then safely be reduced to λy.z. We could also change the variables in the Lambda function, but this requires that all bound variables in that expression must be changed as well. For example, λx.x can be changed to the α-equivalent expression λy.y, whereas two expressions are called α-equivalent when they can be made equivalent using α-renaming.

Finally, note that the Lambda Calculus only contains variables, functions and applications - and as such does not contain numbers, booleans, nodes, etc. However, data can simply be encoded into the Lambda Calculus using Church encoding. For example, whole numbers can simply be represented using ”Church Numerals”, in which λf x.fnx represents the whole number n.

1

This was independently proven in the same year by Alan Turing using the so-called Turing Machine, which turned out to be the foundation of modern world computers[15].

(17)

Chapter 3

Desugaring and Resugaring

In the Introduction, we explained why we need desugaring and resugaring. In this chapter, we address this matter in more detail using different examples. The goal of this chapter is to provide a better insight into the problem domain.

3.1 Desugaring

Desugaring is essentially the process of transforming a term in the surface level representation to a term in the core level representation. By surface level representation we mean the representation of semantics in terms of the language that the user typed. By core level representation we mean the representation of semantics used by a language processor. Syntactic sugar can be used to extend a language or to build a language on top of another language. For example, we can add syntactic sugar to Javascript to allow for an unless-statement.

1 unless (i == 0) {

2 console.log("i is not 0"); 3 }

If the Javascript processor does not contain semantics for processing unless-statements, then we can use desugaring to transform this term into a semantically equivalent term that our Javascript processor does understand.

1 if (!(i == 0)) {

2 console.log("i is not 0"); 3 }

Similarly, if our hypothetical Javascript evaluator is unable to process for-loops, then we could use desugaring to rewrite for-loops (shown in the left side of the following block) into while-loops (shown in the right side of the following block):

(18)

1 for (var i = 0; i < 5; i++) { 2 doSomethingWith(i); 3 } 1 var i = 0; 2 while (i < 5) { 3 doSomethingWith(i); 4 i++; 5 }

Less trivial, CoffeeScript may be considered syntactic sugar for Javascript. In the following example, we see code written in Coffeescript on the left side and the semantically equivalent ”transpiled”1 code on the right side.

1 class Car 2 start: -> 3 alert "Vroomm!!" 1 var Car; 2 3 Car = (function() { 4 function Car() {} 5 6 Car.prototype.start = function() { 7 return alert("Vroomm!!"); 8 }; 9 10 return Car; 11 12 })(); 13 14 // ---15 // generated by coffee-script 1.9.2

3.2 Benefits of Desugaring

If we look back at the previous examples, we can notice a pattern. Whenever a language processor does not contain semantics for constructs that are expressible using other constructs, we can use desugaring to implement support for them. This means that desugaring can sometimes be used instead of semantics code, ultimately leading to higher maintainability and lower production costs[16].

Note that we claim that desugaring thus leads to less code than adding semantics to a language processor. Although this claim may not be true in every case, consider the following example: extending a programming language to support octal numbers. If we allow for a language to use octal numbers besides decimal numbers without utilizing desugaring, this means that we have to add semantics for every operation in a similar way as decimal numbers. As such, we have to implement semantics for addition, subtraction, multiplication, division, etc. However, if we would implement octal numbers using desugaring instead, we would only have to implement a term rewriting system[17] from octal numbers to decimal numbers.

Now, consider an even simpler example. Say we want to add support for ‘T‘ and ‘F‘ to be used in a language next to ‘true‘ and ‘false‘. If we would implement semantics for these constructs without utilizing desugaring, we would have to add support for these constructs

(19)

for the and-, or-, xor-, conditional branching, and many other operations. However, if we would use desugaring instead, it could be as simple as implementing the transformations T → true and F → f alse.

3.3 The Need for Resugaring

Until now, we have only demonstrated what desugaring is and why it can be useful. We will now provide different scenarios to illustrate why resugaring is needed to utilize desugaring effectively.

When we extend a programming language using desugaring, the terms based on desugaring are transformed into another presentation. For example, if we extend Javascript with the ‘unless (condition)‘-statement, then this term can be transformed into ‘if (!(condition))‘. As such, if we try to run the program:

1 unless (x) {

2 console.log("y"); 3 }

The interpreter will throw an error message in terms of the transformed programming lan-guage:

1 /home/wouter/master-thesis/example-desu/test.js:1

2 (function (exports, require, module, __filename, __dirname) { if (!(x)) {

3 ^

4

5 ReferenceError: x is not defined

6 at Object.<anonymous> (/home/wouter/master-thesis/example-desu/test.js:1:69) 7 at Module._compile (module.js:434:26) 8 at Object.Module._extensions..js (module.js:452:10) 9 at Module.load (module.js:355:32) 10 at Function.Module._load (module.js:310:12) 11 at Function.Module.runMain (module.js:475:10) 12 at startup (node.js:117:18) 13 at node.js:951:3

Notice that the error is referring to ‘if(!(x))‘, while this is not what the user has typed. Now, in this case, the user may simply mentally relate ‘if(!(x))‘ to ‘unless(x)‘) as it is obvious where this term came from. Or, we could simply use source maps (as discussed in the Related Work) to solve this problem.

However, consider the following example. We use Michael I. Schwartzberg’s Functional pro-gramming language to Lambda Calculus evaluator2 _{to evaluate the term let x = 5 in if}

(true) x else 0 (the \ represents a Lambda function).

1 echo "let x = 5 in if (true) x else 0" | java Lambda -compile | java Lambda

-evaluate -trace

2 (\x.(\xy.x)(\a.xa)(\a.(\fx.x)a))(\fx.f(f(f(f(fx))))) 3 ->

(20)

4 (\xy.x)(\a.(\fx.f(f(f(f(fx)))))a)(\a.(\fx.x)a) 5 -> 6 (\ya.(\fx.f(f(f(f(fx)))))a)(\a.(\fx.x)a) 7 -> 8 \a.(\fx.f(f(f(f(fx)))))a 9 -> 10 \ax.a(a(a(a(ax)))) 11 -> 12 \ax.a(a(a(a(ax))))

In this case, we get a very different representation than what the user has typed. It is much more difficult for the user to mentally relate the output terms to the input than in the previous example. The user probably wanted to see the following evaluation sequence instead:

1 let x = 5 in if (true) x else 0 2 if (true) 5 else 0

3 5

Note that in this case, we cannot use a source map for relating the terms in the evaluation sequence above to the input terms. The x in the original term is replaced by 5 in the second term in the evaluation sequence. Therefore, no subterm of let x = 5 in if (true) x else 0 is equivalent to if (true) 5 else 0.

As such, we need a different solution to address this problem. Our options are:

1. Instead of using desugaring, implement semantics for each of the expressions that could not be evaluated;

2. Naively transform desugared terms to the surface-level representation;

3. Reconstruct the surface level representation from the core level representation using origin data.

Implementing semantics for the new commands yields the problem that it requires a lot of effort to implement and maintain as we illustrated in the previous section. As for considering the second option, let us illustrate why this will not work using a simple example: transforming decimal numbers into Lambda Calculus encoding. If we would simply transform terms of the form λf x.fn_{x to n, then if a user provides the input ‘4‘, this will work just fine. However,}

if the user types ‘λf x.f (f (f (f x)))‘ (which is the equivalent desugared term of ‘4‘), then the output still yields 4, which is again foreign to what the user has typed.

This leaves us with the third option: reconstructing the surface level representation from the core level representation using origin data. Note that this is resugaring. Using resugaring, we only need to specify the desugaring transformations under certain constraints (as we will demonstrate throughout this thesis). We do not need to add any semantics and we eliminate the problem that terms are foreign to what the user has typed.

(21)

Chapter 4

Resugaring Techniques

In this chapter, we provide a short summary of the two resugaring techniques by J. Pombrio and S. Krishnamurthi, as they form the basis for our research and prototype.

4.1 Resugaring: Lifting Evaluation Sequences through

Syn-tactic Sugar

Central to J. Pombrio and S. Krishnamurthi’s approach in their first paper on resugaring, there are three properties to which their techniques adhere to, defined as:

1. Emulation Each term in the generated surface evaluation sequence desugars into the core term which it is meant to represent. - [1]

2. Abstraction Code introduced by desugaring is never revealed in the surface evaluation sequence, and code originating from the original input program is never hidden by resugaring. - [1]

3. Coverage Resugaring is attempted on every core step, and as few core steps are skipped as possible. - [1]

The techniques described in this paper are based on pattern matching and substitution, and operate on a list of rules rs of the form P1 → P2. The essence of their desugaring technique is

that for each subterm in a term T (traversed from the top-down), this subterm is recursively replaced by its expansion until this term can no longer be expanded. This is illustrated by Figure 4.1.

Expansion is the operation of trying to apply a rule in the transformation rule list rs onto the provided term: If there is a rule P1 → P2 in rs for which P1 matches this term, select

the first rule P1 → P2 for which this is the case. Then, match the rule’s P1 with this term.

This induces an environment σ in which the variables in P1 are bound to the corresponding

subterms of the matched term. This means that if a term 5 + 2 is matched against a pattern v1+ v2, this induces the environment σ = {v1 → 5, v2 → 2}.

(22)

h g f Desugaring f . h g f0 Expanded f → f0, desugaring f0. h g f00 Expanded f0 → f00, desugar-ing f00. h g f00

Expansion of f00failed, desug-aring g.

h g

f00

Expansion of g failed, desug-aring h.

h g

f00

Expansion of h failed, desug-aring succesful.

Figure 4.1: Illustration of the traversal of desugaring f (g, h) with transformation rules rs = [f → f0, f0 → f00_{]. With ”failed” we mean that a term cannot be expanded.}

After matching a pattern P1 in rs to a term, the induced environment σ is then applied onto

P2. This way, the variables in P2 are substituted by the variables in σ, thereby producing

the desugared term. Expansion then returns a term in which every subterm is tagged by an indicator that this term is the result of desugaring (the so-called Body-tag). The root of this term is tagged by the rule index (using the so-called Head -tag) and the original term. By storing the index-variable, the algorithm ensures that the algorithm eventually resugars using the right transformation rule.

The process of resugaring (see Figure 4.2) consists of traversing the input term from the bottom-up. For each term that can be replaced by its unexpanded form, the term is unex-panded. Unexpansion, then, is the process of applying the rule rsindex (for which index is

stored in the Head -tag) onto the term in reverse. The variable map found by matching the term with P2 is thus used for substituting the variables in P1 to produce the resugared term.

In some cases, P2 contains a strict subset of the variables found in P1. In this case, variables

that are present in P1 but not in P2 are found by matching the original term with P1, and

adding the missing variables to the variable map used for resugaring.

When the result does not contain any tags, it is considered to be succesfully resugared, since no terms from the desugared term are present in the resugared term. As such, the paper describes how an evaluation stepper can be obtained. The evaluation stepper desugars the original input term and while this term can be reduced, the stepper attempts to resugar the reduced term. If resugaring of the reduced term is possible, the result is emitted.

(23)

h g f00 Resugaring h. h g f00

Unexpansion of h failed, re-sugaring g.

h g

f00

Unexpansion of g failed. Re-sugaring f00. h g f0 f00 unexpanded to f0. Resug-aring f0. h g f f0 unexpanded to f . Resugar-ing f . h g f

Unexpansion of f failed. Re-sugaring succesful.

Figure 4.2: Illustration of the traversal of resugaring f00(g, h) with transformation rules rs = [f → f0, f0 → f00]. With ”failed” we mean that a term cannot be unexpanded.

4.2 Hygienic resugaring of compositional desugaring

In the second paper of J. Pombrio and S. Krishnamurthi, ”Hygienic resugaring of composi-tional desugaring”, the approach to desugar and resugar terms is different.

First of all, most of the properties on which this paper is based are different from the properties in the first paper. Most notably, the properties in the first paper are more generalized, whereas the properties in the second paper are more specified to the techniques themselves.

The first property (Emulation) is similar to the first property of the first paper, stating: Every surface term desugars to (a term isomorphic to) the core term it purports to represent. - [2]

Here, isomorphic refers to a morphism (essentially a structure-preserving map of objects to other objects as defined in category theory) that is invertible (meaning that it is possible to ”undo” the transformation).

The second property (Abstraction) states:

If a term is shown in the reconstructed surface evaluation sequence, then each non-atomic part of it originated from the original program and has honest tags. (Assuming that evaluation does not modify tags.) - [2]

Terms have honest tags if each tagged subterm is unexpandable (i.e. unexpansion does not fail).

The third and fourth properties are more formal in nature. The third property (Hygiene) states that if two terms are α − equivalent, then they are also α − equivalent after desugaring

(24)

and after resugaring. Finally, the fourth property (Coverage) is basically a formalization of the Coverage property found in the previous paper.

Another major difference is that this technique allows for desugaring using Turing-complete functions. These functions consume and produce a pattern. The consumed pattern C is the pattern matched during expansion, and the produced pattern C0 is the pattern representing the desugared term.

‘<Num i> := ‘12‘ ‘<Exp x> + <Exp y>‘ := ‘(5 + 6)‘

‘<Exp x> + <Exp y>‘ := ‘(5 + 6) + 12‘

12 plus(5, 6)

plus(plus(5, 6), 12)

Figure 4.3: Illustration of desugaring using the techniques from ”Hygienic resugaring of com-positional desugaring”.

After each expansion, the consumed and produced patterns are stored in a tag (C → C0), and the pattern variables in the produced pattern are substituted by the desugaring of the respective subterm they match. Whenever a term cannot be expanded, the subterms of that term are then desugared. As such, desugaring proceeds in a top-down fashion (see Figure 4.3).

Also resugaring proceeds in a top-down fashion (see Figure 4.4). For each term that is tagged with the tag C → C0, the term is unexpanded by matching the term against C0, resugaring the bound variables and substituting the result with C. When a term cannot be unexpanded, the subterms of that term are then resugared.

int i := 12 plus(x, y) := plus(5, 6) plus(x, y) := plus(plus(5, 6), 12) ‘12‘ ‘(5 + 6)‘ ‘(5 + 6) + 12‘

Figure 4.4: Illustration of resugaring using the techniques from ”Hygienic resugaring of com-positional desugaring”.

Furthermore, the techniques described in this paper are hygienic. By this we mean that when a term is desugared, first the term is resolved. This means that the abstract syntax representation (AST) is transformed into an abstract syntax directed acyclic graph (ASD) representation.

An ASD is similar to an AST. However, in contrast to an AST, every variable unambiguously refers to the location that the variable is declared and every node in the ASD has identity. Since each node has identity, there is no ”node capture” (accidentally matching the wrong node) during pattern matching.

(25)

possibility of variable capture, the algorithm renames the variable. Resolution and unreso-lution are based on Romeo’s binding algebra[18] and a set of scoping rules. As hygiene is outside of the scope of this thesis, we do not go into depth about this topic.

(26)

Chapter 5

Resugaring in Rascal

In this chapter, we present our design for extending Rascal with support for resugaring. More information about the implementation can be found in Appendix A. We accompanied unit tests for our prototype in Appendix B.

5.1 Desugaring and Resugaring

We extended Rascal to allow for desugaring using the desugar-all<f, e> command. This command takes a function name f and an expression e and traverses the term t found after interpreting e in a top-down fashion. Whenever a term is expandable using f , it will apply f on this term and stop traversing the current path. As such, f will then become responsible for desugaring the current term and all of its subterms.

Resugaring in Rascal works in a similar way as desugaring. The resugar-all<e> command traverses the term t found after interpreting e in a top-down fashion. Whenever a term is annotated with an unexpansion function, this function is called using the current term. Whenever this function fails to execute, the current execution stops and the original input term t is returned. Whenever a term is succesfully resugared, resugaring will stop traversing its current path, again making the called function responsible for resugaring the current term and all of its subterms.

We used named functions for desugaring to allow for different syntactic sugar declarations throughout a program. For example, a typechecker may require a different set of transfor-mations than a compiler. As such, a typechecker may use desugar-all<typeCheckSugar, ...>, whereas a compiler may use desugar-all<compilerSugar, ...>.

5.2 Sugar Function Declarations

We allow users to define three different types of syntactic sugar transformations, depending on the required strategy:

(27)

2. Compositional sugar function declarations; 3. Custom sugar function declarations.

The first two types of sugar function declarations use a syntactical notation similar to that of Rascal’s expression function declarations, as we will see in the following sections. The third type of sugar function declaration is simply a Rascal function.

5.2.1 Intermediate Sugar Function Declarations

Intermediate sugar functions are functions similar to J. Pombrio and S. Krishnamurthi’s first technique in which each term is repeatedly expanded. Intermediate sugar functions can be declared as follows1_:

1 CorePatternType func(SurfaceExpPat) * ⇒ CoreExpPat;

Here, func is the name of the function and CorePatternType represents the type of the output of the desugaring transformation. SurfaceExpPat and CoreExpPat are patterns (which can be matched) that also act as expressions (which can be evaluated to a term). Note that this notation represents the basic notation for declaring intermediate sugar functions. We discuss extensions of this notation later throughout this chapter.

When this function is called during desugaring, the current term is matched against Surface-ExpPat. If this term succesfully matches this term, this will induce an environment in which the variables of the pattern are bound to the respective subterms. The variables in Core-ExpPat are then substituted by the variables in the environment, creating the desugared (intermediate) core term2. This term is annotated with a function that allows for inverting the transformation. Finally, the desugar-all command is called on the output term, which will try to expand the output term once more or proceed with the subterms if expansion is not possible.

5.2.2 Compositional Sugar Function Declarations

Compositional sugar functions transform the input term in a similar way as J. Pombrio and S. Krishnamurthi’s second technique.

Compositional sugar functions can be declared as follows:

1 CorePatternType func(SurfaceExpPat) ⇒ CoreExpPat;

Note that the only difference between the basic notation of this declaration and the basic notation of intermediate sugar function declarations is the absence of the asterisk before the ⇒-sign.

When compositional sugar functions are used, SurfaceExpPat is matched against the input term after which the variables in SurfaceExpPat are bound to the environment. It then

1

Implementation Detail: We also support different levels of module visibility, similar to expression functions.

2

In some cases, CoreExpPat contains a subset of the variables used in SurfaceExpPat. When that happens, the original values of the variables in SurfaceExpPat are used to fill in these gaps.

(28)

proceeds to desugar the variables in CoreExpPat using desugar-all. When these variables are desugared, the variables in CoreExpPat are substituted by the values in the desugared variables, yielding a term. This term is finally annotated with a function that allows for inverting the transformation.

5.2.3 Custom Sugar Function Declarations

In some cases, the previously defined sugar functions simply do not suffice. As such, it is important to allow users to tailor their own strategies. Therefore we allow Rascal functions to be used for desugaring as well, provided that users are well-aware that they are themselves responsible for adhering to the properties central to resugaring. Nevertheless, we do prove that custom functions are capable of performing consistent and correct transformations in Chapter 7.

Custom sugar functions should return a term annotated with an inverse function. The anno-tation for this function is @ resugarFunction. This function accepts a term as its first and only argument. For example:

1 resugarable Exp; 2

3 ... 4

5 LambdaData sugar(original:(Exp)‘a‘) { 6 str nodeId = arbitraryIdentifier(); 7 return (Exp)‘b‘[@__resugarFunction = 8 (Exp) (desugared:(Exp)‘b‘) {

9 if (desugared@__nodeId != nodeId) fail; 10 return (Exp)‘a‘ <<< original;

11 }][@__nodeId = nodeIdentifier];

12 }

This function accepts an expression containing ‘a‘ and returns ‘b‘. When it is resugared, it accepts ‘b‘ and returns ‘a‘.

It is difficult to understand what is going on in this code by simply reading it. As such, let us take a closer look at this code example.

First of all, we use the ”resugarable Exp” declaration. This declaration allows Exp to use the annotations that are used for desugaring and resugaring: nodeId and -resugarFunction. Second, we manually set and check the values for nodeId annota-tions. This annotation is used during resugaring to check if the (sub)term’s identities match. Finally, we use the <<< operator. This is an operator we added to Rascal to match a term (on the right-hand side) against a pattern (on the left-hand side) and substitute the pat-tern’s variables with variables in the environment. This way, all terms that are not bound to a pattern variable remain unchanged, thereby ensuring that no information is lost during

(29)

resugaring.3

5.3 Fixing the Lengths of Ellipses

Recall that ellipses are variables consisting of zero or more terms. As we observed during the evaluation of resugaring ES5 to ES6, ellipses may cause ambiguity problems during re-sugaring (we discuss this problem in more detail in the following chapter). As such, we present a mechanism to ensure that the length of ellipses can be constrained to a certain length. This mechanism can simply be used by prepending a sugar function declaration with @fixedLength{name}. This fixes the length of ellipses after the expansion of a term. For example:

1 @fixedLength{bef} 2 @fixedLength{rest}

3 Function functionSugar((Function)‘function (<{Param ","}* bef>, <Id pr> =

<Expression defVal>, <{Param ","}* rest>) { <Statement* body> }‘)

4 * ⇒ (Function)‘function(<{Param ","}* bef>, <Id pr>, <{Param ","}* rest>) {

5 ’ <Statement initBody>

6 ’ <Statement* body>

7 ’}‘

8 when Statement initBody := defaultParameter( pr, defVal,

size((Params)‘<{Param ","}* bef>‘) );

This example comes from the ES6 to ES5 case study (see chapter 8.2). In this case, the length of both ellipses bef and rest are fixed. Note, however, that (in this case) it is sufficient to only fix one of the lengths of the ellipses, because if one ellipsis’ length is not fixed, its length can be derived from the other (fixed) ellipsis.

5.4 When-conditions

The techniques described in J. Pombrio and S. Krishnamurthi’s second paper[2] assume that we have full control over patterns. Unfortunately, this is not the case in Rascal. To overcome this limitation, we allow for so-called when-conditions, similar to when-conditions in Rascal’s expression functions. This allows us to gain a similar level of expressiveness without breaking abstraction. We discuss this design decision in more detail in the following chapter.

when-conditions allow users to conditionalize a desugaring transformation. For example, we can define the transformation from a concrete addition to its abstract form as:

3_{Implementation detail: While Rascal also has a field-update operation in which a single field of a term}

can be updated, this operation is not able to traverse through a term’s subterms. This makes it difficult to adjust complex terms. Furthermore, syntaxes containing terms that can be addressed using field-update must be labeled, requiring the user to alter their syntax definitions or look up these labels every time they need to perform a field-update. These practical issues do not apply to the substitution operator, since the user can simply specify the pattern containing the variables that need to be substituted. However, the substitution operator can be removed without any further consequence, by simply removing the substitution operator from src/org/rascalmpl/library/lang.rascal.syntax/Rascal.rsc and bootstrapping the syntax (i.e. regenerating the parser).

(30)

1 ExpData func((Exp)‘<Exp e1> + <Exp e2>‘) ⇒ plus(ExpData e1, ExpData e2) 2 when transformToAST;

In this case, we allow this function to be called if and only if transformToAST is set to true.

There are other interesting cases that we can use when-conditions for as well. For instance, we can use when-conditions to inject variables into the core pattern/expression using the matching operator:

1 @ensureUnchanged{tmp}

2 Exp func((Statement)‘swap <Exp e1>, <Exp e2>‘)

3 ⇒ (Statement)‘{ <Exp tmp> = e1; e1 = e2; e2 = <Exp tmp>; }‘; 4 when Exp tmp := uniqueName();

In this case, tmp in the statement on the right-hand side of the sugar function is set to a vari-able generated by the uniqueName() function. Note that we use the @ensureUnchanged-tag to ensure that tmp cannot be altered after desugaring (we discuss this requirement in more detail in Chapter 9.2.3).

5.5 Resugaring Fallback Mechanism

Some terms may not be resugared in contrast to the author’s intention. For example, if an interpreter accepts the term ‘2 + 2‘, and desugars and executes this term using the Lambda Calculus, the interpreter may reduce this term to ‘λf x.f (f (f (f x)))’. Since the original term consisted of ‘2‘ and ‘+‘ tokens, the term cannot be resugared to ‘4‘, since it cannot be expressed using terms in the input. Therefore, we added support for a Turing-complete fallback mechanism for handling such cases.

The mechanism is similar to exception handling mechanisms found in languages such as Java. The user may declare a throws-clause and the names of the ”exceptions” it throws in sugar function declarations. These exceptions are automatically thrown whenever a term cannot be resugared. For example:

1 Exp func((Exp)‘plus(2, 2)‘) throws MaybeInteger ⇒ ...

Now say we have a sugar function that desugars 4 into the Lambda encoding equivalent.

1 Exp func((Exp)‘4‘) ⇒ (Lambda)‘\fx.f(f(f(fx)))‘

By simply adding ”catch MaybeInteger”, we allow for transformations that catch the excep-tion to process the term in reverse. As such, the following declaraexcep-tion is able to catch a term with a Lambda Encoding semantically equivalent to 4, and resugar that term into ‘4‘:

1 Exp func((Exp)‘4‘) catch MaybeInteger 2 ⇒ (Lambda)‘\fx.f(f(f(fx)))‘

Whenever this is insufficient, for example if we want arbitrary Lambda-encoded numbers to be resugared, we can also use custom fallback functions.

(31)

Here, the first argument is the name of the exception and the second argument is the input term. convertLambdaToExp represents the function to convert a Lambda encoding to the surface representation.

5.6 Break-out functions

Different sets of syntactic sugar transformations may be relevant to different parts of a pro-gram. Therefore, we allow sugar functions to ”break out” to other functions.

Both intermediate and compositional sugar functions allow for break-out functions. For in-termediate sugar functions, we can specify its break-out function as follows:

1 CorePatternType func(SurfaceExpPat -> Breakout) * ⇒ CoreExpPat;

After an expansion is succesfully performed using an intermediate sugar function with a break-out function, the function specified in the Breakout-argument is used for subsequent desugarings (relative to the expanded term). For example:

1 Exp sugar((Exp)‘<Exp e1> + <Exp e2>‘ -> numberSugar) * ⇒ plus(e1, e2)

In this example, we use numberSugar to desugar the term found after expansion.

Break-out functions for compositional sugar functions are specified in a different way. In-stead of specifying one function for subsequent desugarings, the user may specify a break-out function for each of the pattern variables that are desugared.

1 CorePatternType func(SurfaceExpPat | v1 -> f1, v2 -> f2, ... ) ⇒ CoreExpPat;

Here, v1 and v2 refer to the respective pattern variables, and f 1 and f 2 refer to the respective functions. For example:

1 Exp sugar((Exp)‘<Exp e1> + <Exp e2>‘ | e1 -> numberSugar1, e2 -> numberSugar2)

* ⇒ plus(e1, e2)

In this example, e1 and e2 are desugared using numberSugar1 and numberSugar2 re-spectively.

5.7 Usage Examples

Now that we have specified our design, we will demonstrate how our prototype can be used in practice using simple examples. The purpose of this section is to gain some familiarity with the syntax of our prototype before we discuss the evaluation of our prototype in later chapters.

Furthermore, we begin to illustrate what the differences are between intermediate and com-positional sugar functions, as it is important to understand why we use both. In the following chapter, however, we discuss their differences in more detail.

(32)

5.7.1 Basic Usage

As we discussed earlier, the notation for sugar function declarations are similar to expression function declarations. The main difference is that, instead of using an ”=”-sign between the declaration and the output expression, we use either ⇒ or ∗ ⇒ for respectively a compositional or intermediate sugar function declaration. For example, say we want to desugar i + + to i = i + 1. We can write this as the combination of a syntactical definition and a compositional sugar function as:

1 syntax Num = [0-9]+; 2 syntax Const = [a-zA-Z]+; 3

4 syntax Exp = Const c "++"

5 | Const c "=" Const c "+" Num n; 6

7 Exp sugar((Exp)‘<Const c>++‘) ⇒ (Exp)‘<Const c> = <Const c> + 1‘;

However, we can also use an intermediate sugar function instead, by replacing the last line with:

1 Exp sugar((Exp)‘<Const c>++‘) * ⇒ (Exp)‘<Const c> = <Const c> + 1‘;

Using either of these declarations, we can now desugar (Exp)‘i++‘ using desugar-all:

1 rascal> term = desugar-all<sugar, (Exp)‘i++‘>

Which results in:

1 Exp: ‘i = i + 1‘

Finally, we can simply resugar this term using resugar-all:

1 rascal> resugar-all<term>

Which results in:

1 Exp: ‘i++‘

5.7.2 Intermediate Versus Compositional Desugaring

The desugaring strategies used by intermediate and compositional sugar functions are funda-mentally different. To illustrate this difference, consider the following example:

1 syntax Exp = "a" | "b" | "c";

Using compositional desugaring, if we define the transformations a → b and b → c using:

1 Exp sugar((Exp)‘a‘) ⇒ (Exp)‘b‘; 2 Exp sugar((Exp)‘b‘) ⇒ (Exp)‘c‘;

We get:

1 rascal> desugar-all<sugar, (Exp)‘a‘>

(33)

While if we use intermediate desugaring:

1 Exp sugar((Exp)‘a‘) * ⇒ (Exp)‘b‘; 2 Exp sugar((Exp)‘b‘) * ⇒ (Exp)‘c‘;

We get:

1 rascal> desugar-all<sugar, (Exp)‘a‘> 2 Exp: ‘c‘

The difference in the output is caused by the fact that intermediate desugaring repeats the process of desugaring, whereas compositional desugaring simply proceeds to the variables in the pattern. As we will discuss later throughout this thesis, compositional desugaring is generally faster than intermediate desugaring. Therefore, it is sensible to use compositional desugaring whenever possible. In many cases, compositional sugar functions can be rewritten to yield the same output as intermediate sugar functions. For example, we can simply rewrite the rules above to:

1 Exp sugar((Exp)‘a‘) ⇒ (Exp)‘c‘; 2 Exp sugar((Exp)‘b‘) ⇒ (Exp)‘c‘;

The outcome of these transformations are equivalent to the transformations of the interme-diate sugar functions described above. In some cases, however, it is not possible to use com-positional desugaring, as we will discuss in Chapter 6 and demonstrate in Chapter 8.2.

5.7.3 Desugaring Using Different Types

Another useful use case of compositional sugar functions is that we can desugar terms into terms of a different type. In fact, we can even desugar a concrete term into an abstract representation. For example, given the declarations:

1 syntax HelloWho = "World" | "Rascal"; 2 syntax Exp = "Hello, " HelloWho g; 3

4 data AHelloWho = aWorld() | aRascal(); 5 data AExp = aHello(AHelloWho who); 6

7 AExp sugar((HelloWho)‘Rascal!‘) ⇒ aRascal(); 8 AExp sugar((HelloWho)‘World!‘) ⇒ aWorld();

9 AExp sugar((Exp)‘Hello, <HelloWho h>‘) ⇒ aHello(AHelloWho h);

We can now desugar this into an abstract representation:

1 rascal>desugar-all<sugar, (Exp)‘Hello, World!‘> 2 AExp: aHello(aWorld()[ 3 @__nodeId="0ac9a2a3-37de-4e6c-9024-53ff54d26aa2", 4 @__resugarFunction=func( 5 value(), 6 [adt( 7 "AHelloWho", 8 [])], 9 [],

(34)

10 {},origin=|project://UsageExamples/src/Usage.rsc|(243,6,<10,33>,<10,39>)) 11 ])[ 12 @__nodeId="280e62ca-3b60-4f09-8a48-a4a15b096ea9", 13 @__resugarFunction=func( 14 value(), 15 [adt( 16 "AExp", 17 [])], 18 [], 19 {},origin=|project://UsageExamples/src/Usage.rsc|(301,11,<11,48>,<11,59>)) 20 ]

Notice that all (sub)terms are annotated with the two annotations we described earlier. If we remove these annotations, we can see that the resulting transformation is equivalent to the output transformation in our definition:

1 rascal>delAnnotationsRec(desugar-all<sugar, (Exp)‘Hello, World!‘>) 2 AExp: aHello(aWorld())

We can succesfully resugar this term to the surface level representation:

1 rascal> resugar-all<desugar-all<sugar, (Exp)‘Hello, World!‘>> 2 Exp: ‘Hello, World!‘

Unfortunately, however, this will not work using intermediate desugarings due to typing issues we discuss in Chapter 6.

5.7.4 When-Conditions

In the following example, we desugar ‘Hello, {Name ”,”}*! ‘ into an AST representation. This requires the concrete names to be translated into strings. We can do this using when-conditions, which are called before desugaring:

1 syntax Exp = "Hello," {Name ","}* names "!"; 2 data DExp = hello(list[str] names);

3

4 @ensureUnchanged{outNames}

5 DExp sugar((Exp)‘Hello,<{Name ","}* names>!‘) ⇒ hello(outNames) 6 when outNames := ["<n>" | n ← names];

Here, we collect the concrete names and translate them into a list of strings using:

1 when outNames := ["<n>" | n ← names];

As such, if we desugar (and remove the annotations of) the concrete expression: ‘Hello, World, Rascal!‘, we get:

1 rascal>delAnnotationsRec(desugar-all<sugar, (Exp)‘Hello, World, Rascal!‘>) 2 DExp: hello(["World","Rascal"])

(35)

1 rascal>resugar-all<desugar-all<sugar, (Exp)‘Hello, World, Rascal!‘>> 2 Exp: ‘Hello, World, Rascal!‘

Note that the variables in the when-conditions need to be constant. For example, if we would change ”Rascal” in the AST representation using the substitution operator:

1 rascal> s = "Readers";

2 rascal>delAnnotationsRec(resugar-all<hello(["World", s]) <<< desugar-all<sugar,

(Exp)‘Hello, World, Rascal!‘>>)

3 DExp: hello(["World","Readers"])

The term is not resugared. Let us break down what we just executed: 1. We set s to ”Readers”;

2. We desugared the term (Exp)‘Hello, World, Rascal!‘ using desugar-all; 3. We then substituted the second element of the list in the desugared term by s using the

substitution operator;

4. We then resugared this term (which failed and therefore yielded the original term); 5. Finally, we removed the annotations of the failed resugaring.

If we change s back to ”Rascal”, however, this will succesfully resugar:

1 rascal>s = "Rascal"

2 rascal>resugar-all<hello(["World", s]) <<< desugar-all<sugar, (Exp)‘Hello, World, Rascal!‘>>

(36)

Chapter 6

Implementation and Observations

In this chapter, we explain what we observed during the construction of our prototype and what decisions and trade-offs we had to make for our design. The goal of this chapter is to provide a better understanding of why we made certain design decisions.

6.1 Initial Design and Implementation

Our initial prototype was based on the paper ”Resugaring: Lifting Evaluation Sequences through Syntactic Sugar”[1]. We implemented their desugaring and resugaring techniques into Rascal using Rascal’s internal pattern matching algorithm. For substitution, we tailored our own algorithm that traverses a match result, i.e. a pattern matching a term, that re-places the variables with variables in the environment in the term when the (sub)pattern is a variable.

We extended the syntax of Rascal to support the definition of resugaring transformation rules. Since Rascal did not have a pattern datatype available to the user, we used overloaded functions to represent transformation rules. Note that this meant that a user could add trans-formation rules that are incompatible with the three properties defined in the paper. However, this turned out to be quite useful as we will illustrate later throughout this chapter.

For our initial implementation, we ignored Body-tags (tags indicating whether or not a term originates from desugaring), since this prototype was sufficient to evaluate whether or not this algorithm would be sufficiently expressive and efficient for our goals. We decided to utilize Rascal’s annotations to represent Head -tags (tags indicating how desugaring took place), since annotations are ”transparent” datatypes. With this we mean that programs are generally oblivious to annotations unless they are specifically targeted, and techniques such as pattern matching ignore annotations completely.1

1_{Implementation detail: we could not use ”keyword parameters” instead of annotations, since concrete}

syntaxes use annotations and keyword parameters can not be used in conjunction with annotations, which would mean that we could not use concrete syntaxes. Currently Rascal is transitioning away entirely from annotations, but this problem is present in the current version. When this transition is done, it should be fairly easy to replace our use of annotations by keyword parameters.

First-Class Support for Resugaring in Rascal

Contents

Chapter 1

Introduction

1.1

Context

1.2

Motivation

1.3

Research Questions

1.4

Research Method

1.5

Contributions

1.6

Related Work

1.7

Outline

Chapter 2

Background

2.1

Concrete and Abstract Syntax

2.2

Pattern Matching and Substitution

2.3

Rascal

2.4

Lambda Calculus

Chapter 3

Desugaring and Resugaring

3.1

Desugaring

3.2

Benefits of Desugaring

3.3

The Need for Resugaring

Chapter 4

Resugaring Techniques

4.1

Resugaring: Lifting Evaluation Sequences through

Syn-tactic Sugar

4.2

Hygienic resugaring of compositional desugaring

Chapter 5

Resugaring in Rascal

5.1

Desugaring and Resugaring

5.2

Sugar Function Declarations

5.3

Fixing the Lengths of Ellipses

5.4

When-conditions

5.5

Resugaring Fallback Mechanism

5.6

Break-out functions

5.7

Usage Examples

Chapter 6

Implementation and Observations

6.1

Initial Design and Implementation