FunCon4J

(1)

FunCon4J

Reusable programming language implementations

through FunCons implemented with Object Algebras

(2)

This thesis was written in the context of graduating

from the master of science program in software

engi-neering at the University of Amsterdam.

Title: FunCon4J

Author: Jelle van AssemaSupervisor: Tijs van der Storm Email: Jelle.van.Assema@gmail.com, storm@cwi.nl Host: CWI

Host group: Software Analysis and Transformation

Curriculum: Master Software Engineering - Universiteit van Amsterdam Number of Credits: 24 ECTS

(3)

Introduction

Programming languages are continuously being developed for a wide range of applications. These languages include domain specific languages (DSLs) all the way up to general purpose languages (GPLs). Even though programming languages often share many similar constructs, reuse of these constructs in other languages is limited. Reuse of said constructs is hindered by poor modularity, and extensibility in the implementation of the language. Language engineers thus face themselves having to re-implement and redefine constructs which have already been implemented and defined in other languages.

This thesis takes up the challenge to provide a modular, extensible and reusable style of language engineering. The idea is to take a modular and reusable approach to formal semantics (FunCons), and make this executable using a modular and extensible design pattern (Object Algebras). The ex-periment is a framework (FunCon4J ) that makes said formal semantics executable. The framework is evaluated on basis of a case study of implementing the dynamic semantics of Caml Light. Results show that the implementation is indeed modular and extensible. It even supports the development of language variants. Performance is roughly a 100 times slower than a native OCaml (latest version of Caml) compiler, and roughly five times slower than Python. Python is used here as a reference as its implementation is a dynamically typed interpreter, which is similar to the implementation of Caml Light provided with this thesis. All this is achieved in native Java 8, by making heavy use of its new features such as functional interfaces and lambdas. The choice of Java opens up the framework to a wide audience of software engineers.

A discussion of previous work on reusable language implementations can be found in chapter 2 alongside reasoning on why the combination of FunCons and Object algebras can provide reusable language implementations. In chapter 3 the reader finds an introduction to FunCons and Object Alge-bras. Chapter 4 explains the design behind FunCon4J and shows how FunCon4J can be extended and reused. Chapter 5 demonstrates how languages can be defined on basis of FunCon4J . The case study on FunCaml, that is the complete dynamic semantics of Caml Light implemented with FunCon4J , can be found in chapter 6. This thesis concludes with a discussion and conclusion in chapters 7 and 8 respectively.

(6)

(7)

Chapter 2

Reusable Language Implementations

Many high level programming languages exist. These languages allow developers to build software more easily than would otherwise be possible in a low level programming language. In order to sup-port software written in higher level languages their needs to exist an implementation of the language for the computer it is run on. Such an implementation could be a compiler that translates a program written in the programming language to for instance machine code, which can be run directly on the computer. Another option is an interpreter that directly interprets the program and simulates its intended behaviour.

New high level programming languages are continuously being developed[4][1]. This develop-ment ranges from fully fleshed out general purpose languages (GPLs) to domain specific languages (DSLs). A GPL is a language for general purpose without context or domain. Such a language pro-vides the developer generic tools, and effectively resembles a one size fits all solution. A DSL focuses on a domain such as for instance banking, database communication, medical prescriptions[11], or even meta-programming[18]. A DSL can provide domain specific forms of abstractions and poten-tially allow domain experts to understand and develop programs[27]. DSLs have much to gain from reusable language implementations. Being able to reuse parts of well tested existing languages could help mitigate the start up cost of developing a DSL. The ease of building prototypes that reusable language implementations would provide can help boost the development of DSLs. DSLs are often quite small, and do not necessarily have a strong focus on performance. As such development does not need to move far from the prototyping phase. This then allows language engineers to very quickly create DSLs by prototyping new DSLs build on well tested features of existing languages.

In this chapter the reader will find background information and previous work on the topic of reusable language implementations, and the challenges faced in this domain. These challenges in-clude dependencies among common programming constructs, and the entities that constructs may require to function, such as environments and stores. Given an approach to reusable and extensi-ble formal specification (FunCons) and a design pattern that strives for modularity and extensibility (Object Algebra), the question is posed: "Can a modular, extensible and reusable implementation of FunCons be used to create reusable language implementations?". Of interest is how easily the result-ing language can be extended, how the modules (FunCons) on which the language is build can be reused, and whether the resulting language is usable in terms of performance.

(8)

2.1 Motivation

Reusing programming implementations is challenging. One of the reasons is the existence of subtle dependencies among language constructs. For instance, the evaluation of a loop structure may be dependent on the existence of a break statement. If the implementation interweaves the existence of a break statement with that of a loop structure, one cannot reuse the one without the other. The lack of modularity on an implementation level hinders reuse. Zooming out to programming language specifications, that is documentation of a programming language design. Syntax can be well and formally captured in a programming language specification through means of a grammar. Semantics however is generally described in a natural language like English. This leaves room for ambiguity and imprecise specification.

Reuse of programming language constructs across programming languages is a goal worth pursu-ing. As several constructs are found amongst many high level programming languages, and are often similar in their behaviour. Consider for instance conditional expressions, looping constructs, meth-ods, recursion and binding. Currently a language engineer has to re-implement all of the constructs that she wants to use in the new language. In an ideal scenario the language engineer would be able to simply pick and choose which constructs she would like to reuse from existing languages. This would increase the speed with which new languages are developed, make languages extensible, and provide opportunities for building language variants. For instance building a language variant of an existing language like Java or C# for education purposes. One could simply choose not to compose features from the language to better match the level of the students using the language, and extend this language with new features as the students progress. Another example would be that of security. One could desire a language that can produce no side effects, which can be achieved by only composing those features which can not produce side effects.

2.2 Challenges faced: ambiguity and lack of modularity

The question then arises: "How can reuse of programming language constructs be promoted in a new language?". This question requires answers on both documentation and implementation level. 2.2.1 Language documentation

Concerning documentation, attempts have been made to formally capture semantics[17][24][21][13]. Besides defining an unambiguous definition of the language, formal semantics also provide the op-portunity to proof correctness of the language. To demonstrate the importance of such an unambigu-ous definition consider the following expression: method1()+ method2(). Here the results of two method calls are summed up, or maybe concatenated if the results were strings. Now consider that the language this expression is written in has exceptions, or some kind of program ending statement. All of a sudden it becomes important to know which part of the expression is evaluated first. If the call tomethod1 results in an exception, method2 may never by called. If subtle details like this are left undefined, multiple implementations of the same language can exist in which one callsmethod2 and the other does not. This is the case in C, where depending on the compiler one may get different results[9].

Various approaches to formally capture semantics exist. These can be categorised over three major classes: axiomatic semantics[13], denotational semantics[24] and operational semantics[24]. Where axiomatic semantics describes the semantics of a construct in a language by describing assertions on the program state. Denotational semantics, originally named mathematical semantics, tries to capture

(9)

semantics by providing meaning in terms of mathematical objects, such as truth values and functions. Whereas operational semantics describes the meaning of a program in sequences of computation steps. Frameworks for formal specification are not all necessarily bad in terms of modularity, but often lacking in the area of reuse. For instance, even though ASF+SDF[3] Meta-Environment, an IDE that combines syntax definition formalism and algebraic specification formalism, emphasises modularity in algebraic specification, a form of denotational semantics, it does not allow for reuse between programming languages. A trend found across frameworks is coupling to concrete syntax instead of abstract syntax[6]. Which hinders reuse of semantics in new languages as a new language may want to reuse the meaning of a construct but not its concrete syntax.

2.2.2 Language implementation

Modular language implementations bring with them many challenges. Language constructs can in-teract with each other, the presence of one may affect the operation of another. The simplest example here is exceptions. Exceptions interrupt the normal flow of the program, which can affect the opera-tion of any language construct. Besides interacopera-tion between language constructs, language constructs also require different semantic entities. For instance, a binding statement necessarily requires access to an environment. Whereas an addition operator has no need for any semantic entity. Explicit propa-gation of semantic entities poses a serious risk to extensibility. Consider a simple arithmetic language containing only integers and arithmetic operators like + and −. If such a language were to be ex-tended with binding to for instance facilitate storage of previously computed results. What would be needed is a mechanism to pass the environment along the several language constructs. If this is done explicitly, the existing language constructs need to be reimplemented to pass the environment along.

Modular language implementations are not new. For instance, Hudak[14] proposes so called Domain Specific Embedded Languages (DSELs) to lower the start-up costs for building DSLs. DSELs are DSLs which are embedded in an existing higher-order and typed language. The result is the ability to reuse syntax, semantics, implementation and tools. It is also noted that the "look and feel" is necessarily reusable, as one can never shake the feeling of the language in which the DSEL is embedded. Which can be viewed as a good thing, as one can then more easily transfer existing knowledge of one DSEL to another. There are a few major downsides to the approach. Firstly, embedding necessarily means that the DSEL looks rather similar to the language it is embedded in. Which is not always wanted, consider for instance DSLs which are meant to be used by non technical personnel. Secondly, embedding grants access to the remainder of the language, which is something one might want to restrict. Finally, the DSEL approach is demonstrated in Haskell, and requires a language with rather advanced type features. This restricts the target audience of the approach, as languages like Haskell are not commonly used in practice. Finally, syntax is necessarily reused, one cannot freely choose the syntactical structures of the language which limits the customisability of the language.

Ekman et al. [8] propose the JastAdd System for building modular and extensible compilers. The novelty of this system is that the syntax is an extended version of Java, which in light of the software engineering community makes it more accessible than for instance Haskell. The JastAdd system works on basis of attribute grammars. Attribute grammars define computations in a decorative fashion on an AST. Through use of a declarative approach and by weaving in aspect oriented programming in an extended version of Java, one can build modular and extensible compilers in JastAdd. The approach however, does require an extension of Java. With JastAdd one can create modular implementations, however the modules can be rather coarse unless the user makes use of techniques to make them less so. Finally, the authors have successively implemented Java 1.4 in JastAdd, however the feature set

(10)

of JastAdd is limited and the question remains whether every language can be implemented.

2.3 A candidate solution

Recently a new approach to formal specification was published by Mosses et al.[6] in the form of fundamental constructs (FunCons). The goal of FunCons is to provide an open ended collection of reusable modular constructs in a component-based approach to both static and dynamic semantics. FunCons are simplified language constructs with a fixed interpretation. Semantics can then be de-fined by translating constructs of a language to FunCons. The approach is an evolution of action semantics[21]. Where actions in action semantics are very similar to FunCons, they provide specifi-cation of operations such as control flow, abrupt termination and scoping. The collection of actions however, is a closed set. Meaning if the implementations of the actions do not support what is needed, one is lost. On top of this, action semantics only capture dynamic semantics, and not the static seman-tics. FunCons then effectively provide a reusable and modular approach to capture semanseman-tics.

A recently introduced design pattern called Object Algebra[23] provides an approach to modular-ity, reusability and extensibility in Object Oriented Programming (OOP) languages. It provides a solu-tion to the expression problem[28]. It is not the only design pattern to do so. However, other solusolu-tions rely on either advanced features such as F-bounds or wildcards[5], or much type parametrisation[26]. Object algebras use only simple generics and can thus be used in a mainstream language like Java. The novel idea of this thesis is then, given a design pattern that promises modularity, reusability and ex-tensibility and a formal specification method (FunCons) that promises the same, the union of the two should result in a modular, reusable and extensible implementation. Effectively making a language specification on FunCons executable.

2.4 Overview

We explore the combination of Object Algebras and FunCons to create reusable and extensible pro-gramming language implementations. The idea is to build these FunCons with Object Algebras, and translate abstract syntax to these FunCon implementations to form the language implementation. What is wanted here is that the resulting implementation is extensible with new data types and opera-tions without imposing change on the existing code. This means that the set of implemented FunCons must be extensible in both data types and operations as these are the units on which the language implementation is build. In terms of reuse what is wanted is an implementation of FunCons that can be shared across languages without requiring change.

To explore the idea of combining Object Algebras and FunCons, a library of FunCons called FunCon4J was developed. On top of FunCon4J , FunCaml was build. FunCaml is a complete imple-mentation of the dynamic semantics of Caml Light. All of this is done in Java, a popular mainstream language[2] which brings the implementation to a wide audience and not just the scientific crowd. Specifically Java 8 is used that introduces lambdas and functional interfaces which are good tools to have when implementing object algebras. Both FunCon4J and FunCaml are then evaluated along axis of reuseability, extensibility and performance.

(11)

Chapter 3

Background

The novel idea proposed in this thesis is to take an approach to modularity and extensibility in OOP languages (Object Algebras) and apply it to an approach to modularity and reuse in formal semantics (Funcons). Making the latter executable, and ultimately resulting in an implementation of a program-ming language. In this chapter an introduction to Funcons and Object Algebras is provided.

3.1 Funcons

Fundamental constructs are a component-based approach to semantics. They are an evolution of action semantic[21] and a direct successor of Basic Abstract Syntax (BAS)[16]. In contrary to action semantics’ collection of actions, the collection of Funcons is left open-ended. Where BAS requires usage of both BAS and action notation, Funcons only require one notation. Semantics of a language can be defined by translating constructs of the language to Funcons.

Funcons were developed as a result of the research project PlanComps1. The aim of the research project was to develop "a substantial collection of highly reusable, validated language components" called Funcons. Among other publications spawned from the project, a paper by Mosses et al.[6] was published, describing the results of defining Caml Light on Funcons. With this paper is provided a complete translation of Caml Light on Funcons, and source code to perform the translation and test the semantics. In total 61 source code files that define Funcons are provided. Most source code files define a single Funcon, but some define more that are closely related. All operations concerning values are coined operations, but their usage is similar to that of Funcons.

The dynamic semantics of Funcons are defined using Implicitly Modular Structural Operational Semantics (I-MSOS). Which is a modular variant of Structural Operational Semantics (SOS) also known as small-step semantics. This approach describes how small steps of a computation take place. For instance consider the following example of conditional evaluation in SOS taken from the paper which is used as the definition of the dynamic semantics of the if-true Funcon:

E1 → E10 E1?E2: E3 → E10?E2 : E3 (3.1) true?E2 : E3 → E2 (3.2) false?E2: E3→ E3 (3.3) 1 http://www.plancomps.org/

(12)

The inference rule (equation 3.1) specifies that evaluation of this computation may require evalu-ation of E1. The axioms (equations 3.2 and 3.3) specify what must happen once the computation of

E1results in a boolean. If entities are introduced, such as a store for side-effects or an environment

for binding, the specification of conditional evaluation needs to be rewritten to include these entities. As SOS requires explicit propagation of entities, whereas I-MSOS does not. The Funcon collection is open-ended, and Funcons are to be defined once and for all, it is a requirement that Funcon spec-ifications need not be rewritten when a new entity is introduced. Hence Funcons were defined in I-MSOS.

Three Funcon sorts are defined, these are: comms, decls and exprs. The only thing that separates them is what they compute. Exprs compute values, decls compute environments, and comms compute unit (a void value). Each Funcon belongs to a sort, and is simply assigned to one based on what is computed. For instance, the Funcon if-true for conditional evaluation is assigned to exprs, and the Funcon bind-value to decls. Operations such as not and or can be lifted to a computation. This may be required as some Funcons are specified over generic computations. For instance, the if-true Funcon is not concerned with the sort of the second and third argument as it does not need to perform any type of operations on these arguments. Thus the result of the computation can be left generic.

The static semantics of Funcons are specified in the big step style notation of I-MSOS. As static semantics generally requires performing checks on parts of the code once in no particular order. In contrast to small step semantics, big step semantics relates a computation to a value (or multiple) instead of describing in detail what happens to the computation. For the static semantics of Funcons this proved to be enough. Equation 3.4 taken from the paper, describes the static semantics of the if-true Funcon. In this example E must have type boolean, and X1and X2must share the same type.

The resulting type is the type X1and X2share.

E : boolean X1 : T X2: T

if-true(E, X1, X2) : T

(3.4) Besides the definition of quite a substantial collection of Funcons, the paper by Mosses et al.[6] also provided a complete translation of Caml Light to Funcons. Most translations are rather simple, or even one-to-one where a language construct from Caml Light translates directly to one Funcon. This is a consequence of Funcons often modelling simple language constructs. Take the definition of the if ... then ... else structure in figure 3.1. Here the if ... then ... else structure of Caml Light maps directly on the if-true Funcon. The parts between [[ ]] are considered patterns, and the instances of E nodes from the abstract syntax tree. A more interesting example is found in figure 3.2. Here the for-loop structure of Caml Light is mapped to Funcons. Quickly glancing over the translation, we find the Funcon apply-to-each which takes in an abstraction and applies it to every integer generated by the int-closed-interval Funcon. The abstraction is build by the patt-abs Funcon. This Funcon creates an abstraction that first tries to match a pattern (a computation of sort decls) and then executes a command. The pattern in this case is the binding of an identifier, and the command the evaluation of E3for effect. All put together, this defines the for-loop structure of Caml Light in a for-each loop

style, where a list is created and each iteration an element of the list is bound to an identifier. expr[[ if E1 then E2 else E3 ]] =

if-true(expr[[ E1 ]], expr[[ E2 ]], expr[[ E3 ]])

(13)

expr[[ for I = E1 to E2 do E3 done ]] = apply-to-each(

patt-abs(bind(id [[ I ]]), effect(expr[[ E3 ]])), int-closed-interval(expr[[ E1 ]], expr[[ E2 ]]))

Figure 3.2: Translation of Caml Light’s for loop to Funcons

3.2 Object Algebras

Object algebras[23] is a design pattern that solves the expression problem. The expression problem as formulated by Wadler[28] imposes four requirements and Odersky et al.[22] add a fifth. These requirements are:

1. Extensibility in both data types and operations 2. Strong static type safety

3. No modification or duplication of existing code 4. Separate compilation and type-checking 5. Independent extensibility

These requirements are directly relevant to an implementation of Funcons. Firstly, there needs to be support for static type checking in the language build on top of Funcons. This is required as Fun-cons define both static and dynamic semantics and emphasised by requirement two and four. FunFun-cons are supposed to be defined once and for all, that is they may not require modification by introduction of new Funcons. This is exactly described by requirement three. The first requirement requires ex-tensibility in data types, in our case Funcons, and operations. Operations are for instance evaluation, compilation, type checking, pretty printing or debugging. Access to extensibility in operations gives a Funcon implementation on basis of Object Algebras the promise of a wide range of features that one gets for free. Consider Funcons implemented for both an evaluation operation and a debugging operation. Translating ones language to these Funcons then effectively provides an interpreter and a debugger for free. For the fifth requirement, the collection of Funcons needs to be open-ended and extensible by independently developed new Funcons. This allows the collection to grow in other projects than ones own, and thus enables the developer to reuse Funcons of other projects. Object algebras fulfil all of these requirements.

So what are object algebras? It is a design pattern that uses simple generic and Object Oriented features to provide a modular and extensible approach to programming language development. There are three parts that require introduction: object algebras, implementations of object algebras, and operations. The general school of thought is: instead of executing operations directly, an object is build that will perform the operation eventually. This object effectively models the operation to be performed.

Consider the operation shown in figure 3.3. The operationIEval is implemented through one of Java 8’s new features: a functional interface, with one unimplemented method calledeval(). Making use of Functional interfaces here allows for usage of lambdas for creating instances ofIEval later. Ultimately it is only syntactical sugar. An implementation with a class as operation is also possible.

(14)

Except then one has to override the method within the class, instead of just providing a lambda which conforms to the methods signature. TheIEval operation models evaluation, it takes in no arguments and results in a Value. Where Value is some tagging interface, or the top of a value hierarchy that marks every Value that the operation can return.

public interface IEval { Value eval ();

}

Figure 3.3: An operation (IEval) that models evaluation

Moving on to the signature (the abstract syntax) and the object algebras implementing this, con-sider the example in figure 3.4. HereStringAlg represents the signature and StringEval an object algebra over operationIEval. The abstract syntax only defines the data types over a generic oper-ationE. Whereas the object algebra provides an implementation over operationIEval. By keeping the operation type generic in the signature one can provide object algebras for multiple operations, like debugging or type checking for instance. In the concrete implementation ofStringEval the data types are defined for the operationIEval. Here the method string() takes in a native Java string and returns an instance ofIEval. The instance of IEval is created by returning a lambda that matches the signature of theeval() method. In this case a method that accepts no argument and returns a Value. The methodconcat() accepts two instances of IEval, and returns an instance of IEval. In its body it calls theeval() method on both instances of IEval to retrieve the value they evaluate to. The results are then cast to typeString and appended. The result is a concatenated String.

public interface StringAlg <E> { E string (java.lang. String s); E concat (E s1 , E s2);

}

public interface StringEval extends StringAlg <IEval > { default IEval string (java.lang. String s) {

return () -> new String (s); }

default IEval concat ( IEval s1 , IEval s2) { return () ->

(( String )s1.eval ()). append (( String )s2.eval ()); }

}

Figure 3.4: A signature (StringAlg) and an object algebra (StringEval) implementing this signature for operation IEval

In a nutshell this describes object algebras. Signatures that define data types over a generic op-eration and one or more object algebras that implements one or more signatures over an opop-eration. Data types can be extended by simply introducing new signatures and extending the existing signa-ture. Alternatively one could choose to create signatures independent of each other where possible,

(15)

and have one signature extend all others. When new data types are added existing objects algebras can be extended in a similar fashion. Simply by creating a new object algebra that implements the new data types over the operation, and then by extending the existing object algebras. Operations can be extended by simply creating a new object algebra for that operation. In both cases of extension existing code is left untouched.

3.3 Overview

Provided by Mosses et al.[6] is a collection of Funcons and translations from abstract syntax to these Funcons for the complete specification of the static and dynamic semantics of Caml Light. To pro-vide an extensible and reusable implementation of these Funcons we turn to Object algebras. Object algebras is a design pattern that solves the expression problem[28][22], has as only requirement sim-ple generics which are available in mainstream languages such as Java, and has a strong focus on modularity and extensibility. We aim to implement the Funcons as defined by Mosses et al.[6] with object algebras to then implement the translations from Caml Light’s abstract syntax to these Fun-cons. Ultimately this should provide an extensible and reusable implementation for Caml Light build on Funcons in a mainstream language.

(16)

(17)

Chapter 4

FunCon4J ’s Design

4.1 A library of semantic building blocks

FunCon4J is a library of FunCons build with object algebras available at https://github.com/ Jelleas/ObjectAlgebraFunCons. The implemented FunCons are heavily inspired by Mosses et al.[6]. However, FunCons in FunCon4J currently only define dynamic semantics. As a static type checking operation is not implemented. Another key point where FunCons in FunCon4J deviate from FunCons defined by Mosses et al. is that there are no semantic sorts. The entirety of the dynamic semantics is specified over a single sort. It is possible to introduce multiple sorts such as decls and comms. This would make the FunCons more strict on which sorts of arguments they can accept and which they return.

FunCon4J exists to serve as a modular approach to re-usability in programming language defini-tion and implementadefini-tion. Programming languages can be defined by translating constructs from the to-be-defined programming language down to FunCons from FunCon4J .

4.2 Focus points

4.2.1 Reusability

FunCons are supposed to be used as reusable entities. Allowing language engineers to define lan-guages by translating abstract syntax to FunCons. FunCon4J is here to provide the reusable FunCons. FunCons as defined by Mosses[6] are kept generic for reusability, and FunCon4J does not break this. The language engineer is restricted to use only FunCons in the translation. For instance, one cannot introduce new types of values outside of what can be done through FunCons, as FunCons are unable to operate on these values. By using only FunCons the translations themselves become directly reusable, as these are not dependent on language specific code. This enables a language engineer to for instance easily build language variants of existing languages build on FunCon4J .

4.2.2 Extensibility

The implementation of FunCon4J focuses on extensibility as this is important for definitions of new programming languages. For instance, definition of domain specific languages might require con-structs that do not exist within the current collection of FunCons. Such concon-structs could include graphical user interface interaction, parallelism or event-driven control flow. Implementing every FunCon that a language engineer could possibly want or need is an infeasible task. It is not possible

(18)

to predict or estimate what constructs one could need within a to-be-defined language. Therefore the collection of FunCons must remains open-ended. The language engineer should then be able to add new FunCons to the collection, if the existing FunCons do not serve her needs.

Besides adding FunCons which simply have not been implemented, one might desire different implementations of existing FunCons. An example of this is optimisations for performance. If ones language makes heavy use of imperative features such as looping constructs. It could be beneficial to prefer an imperative implementation of said looping construct over a recursive implementation within FunCon4J. So besides addition, it should be possible to overwrite existing FunCons if the language engineer requires something that the current FunCon implementation does not provide.

A language engineer might also want to compose existing FunCons within a new FunCon for sev-eral reasons, such as re-use and as for performance. The opportunity to compose FunCons within a new FunCon gives the language engineer a tool to re-use existing translations to FunCons within sev-eral language constructs. Essentially mimicking the role of a method within a programming language. Besides re-use, composing FunCons could also increase performance. For instance, an imperative for-loop in a language can be defined as through theapplyToEach FunCon. Which mimics the semantics of a for-each loop, but then requires a list to operate on. Leading to possibly unnecessarily allocating a list. Alternatively, one could create a new FunCon specifically for the imperative for-loop and apply Java’s native for-loop within that FunCon. As within a FunCons definition in FunCon4J , one has access to all of Java’s construct as we shall see shortly.

4.2.3 Modularity

FunCons are an approach to modular semantics. Thus FunCon4J was designed with modularity in mind, and this shows through different aspects of FunCon4J ’s implementation. Firstly, FunCon4J maintains FunCons modular specification. That is specification of a new FunCon is independent of existing FunCon specifications. One can add FunCons without having to make any changes to existing FunCons.

FunCons can be build in terms of other FunCons, and as such there are dependencies amongst them. FunCon4J strives to restrict these dependencies as much as possible by giving the engineer the opportunity to explicitly limit the scope of FunCons when defining FunCons or families of FunCons. This gives the engineer a tool to keep track of dependencies among FunCons. With limited dependen-cies among FunCons, one can more easily remove subsets of FunCons from the collection. This can be used to enforce certain restrictions upon the language engineer. Such as for instance, no access to IO FunCons for languages that prioritise some form of safety.

4.3 FunCon design

The following is a running example that serves as an introduction to the design of FunCons in Fun-Con4J , while also showcasing all of the modularity and extensibility features described earlier. It starts with introducing a single FunCon and steadily introduces more FunCons and concepts. Note that in order to run and compile any examples of this section Java 8 is required.

This section starts with an introduction to object algebras and how FunCons are defined in object algebras. What follows is how to independently extend object algebras, override object algebras, and manage dependencies among object algebras.

(19)

4.3.1 One FunCon: ifTrue

FunCons are defined through the Object Algebra design pattern[23] in FunCon4J . As such implemen-tation and definition are separated. Definition of a FunCon is done in a signature, and implemenimplemen-tation of said FunCon in an object algebra that implements said signature.

interface ControlAlg <E> {

E ifTrue (E boolExp , E exp1 , E exp2); }

Figure 4.1: Signature of an ifTrue FunCon

interface IEval {

IValue eval(IMap env , IValue given ) }

Figure 4.2: IEval, carrier type for evaluation

interface ControlEval extends ControlAlg <IEval > { default IEval ifTrue ( IEval be , IEval e1 , IEval e2) {

return (env , given ) ->

(( Bool )be. eval(env , given )). value () ? e1.eval(env , given ) : e2.eval(env , given );

} }

Figure 4.3: Object algebra that implements a FunCon ifTrue

For instance, take the FunConifTrue for conditional expressions. During execution it should accept something that eventually becomes a boolean. On basis of which it executes either its second or third argument and returns the result. TheifTrue FunCon as defined in figure 4.1 thus takes three arguments: boolExp, exp1, exp2. If during evaluation boolExp evaluates to the boolean valuetrue, exp1 is evaluated, otherwise exp2 is evaluated. ifTrue returns the result of the executed expression, that is eitherexp1 or exp2. Note that the type of the FunCon and each of its argument is some generic typeE. HereEis the carrier type of the algebraControlAlg. This carrier type is eventually bound in some implementation of the algebra.

One such carrier type could beIEval which represents evaluation. Making use of Java 8’s func-tional interfaces, an implementation ofIEval can be found in figure 4.2. eval() is a method that accepts an environment and a given value as argument and returns someIValue. Here IValue is a tagging interface that marks values used for evaluation. Such values include integers, strings, lists, etc. An environment and a given value are passed as some FunCons require these entities to be build. The role of an environment and a given value are explained in sections 4.4.1 and 4.4.2 respectively. Anticipating the inclusion of entities is not required as one can add new entities without having to change existing code. However, including entities at a later stage requires existing algebras to be

(20)

extended with the new carrier type as discussed in section 4.4.6, and doing so results in more code. Therefore some anticipation is preferable.

With the carrier typeIEval and the signature ControlAlg defined, an object algebra can be imple-mented. Take for instance the implementation in figure 4.3. Even thoughControlEval is an interface, it is a concrete object algebra of the signatureControlAlg as it implements all of its methods. This is achieved through use of Java 8’sdefaultmethods. Interfaces are beneficial for object algebras as interfaces in Java allow for multiple inheritance, that is in contrast to classes in Java which only allow for single inheritance. Using interfaces thus allows for easy composition of multiple signatures and object algebras as we shall see shortly.

On the second line of the implementation is the concrete implementation of FunConifTrue over carrier typeIEval. The result of ifTrue is an instance of IEval where the method eval() is bound to the lambda function defined after thereturn keyword. Callingeval() executes the body of the lambda, which leverages Java’s native conditional expression operator?: to determine whether e1 or e2 should be evaluated. Note that the result of the call to the eval() method of be is cast to type Bool. Casting is required here as the eval() method returns something of type Value, and aboolean is needed for evaluation of this FunCon. Also note that the IEval implementation of the ifTrue FunCon only describes the dynamic context of the FunCon. Not the static context. As such there is no type safety. Furthermore, there is no difference in lifted and non-lifted arguments for this FunCon. In other words, this implementation is unaware of whetherbe is a constant, or rather some expression that evaluates to aBool. Care must thus be taken not to call an eval() method more than once as side effects may occur. If the result of someIEval needs to be used more than once, it is best to cache the result by closing it in an instance ofIEval that directly returns the result. This can be done like in figure 4.4.

IValue result = someIEval .eval(env , given ); IEval cachedResult = (env , given ) -> result ;

// example calls

alg. someFunCon ( cachedResult ).eval (); alg. someOtherFunCon ( cachedResult ).eval ();

Figure 4.4: Caching results

4.3.2 Extending algebras

On its own the FunConifTrue is useless. As for instance there are no FunCons for creating booleans and thus no way of making use of theifTrue FunCon. An implementation of a boolean FunCon could be the following:

Firstly, for completeness a concrete class ofBool that is tagged by the interface Value is shown. In this implementationBool only wraps Java’s native boolean, but that is enough for this example. Moving on to the signatureBoolAlg, shown here is the same pattern as was used for the ControlAlg with only a subtle difference. That is the FunConbool accepts a Java native boolean instead of some instance of generic typeE. By doing so this FunCon serves as a bridge between Java values and FunCon values, convertingbooleanto some carrier type. Directly belowBoolAlg is its object algebra with carrier typeIEval in BoolEval. Recall that IEval is a functional interface with thus only one unimplemented method, in this caseeval(). The instance of IEval that is returned by the

(21)

public class Bool implements Value { private boolean value ;

public Bool(boolean b) { this. value = b;

}

public boolean value () { return value ; } }

interface BoolAlg <E> { E bool(boolean b); }

interface BoolEval extends BoolAlg <IEval > { default IEval bool(boolean b) {

return (env , given ) -> new Bool(b); }

}

Figure 4.5: Introducing Bool

FunConbool is created by returning a lambda that matches the signature of eval(). That is a method that accepts no arguments and returns something of typeValue, in this case a Bool.

To create generic ASTs the algebras need to be combined in a single algebra. A solution to this problem is found by leaning on Java’s multiple inheritance for interfaces. One can create a signature that extends multiple other signatures. An object algebra can then be build by extending all the object algebras carrier type that together implement the signature. For the running example can be found in figure 4.6

interface AllAlg <E> extends ControlAlg <E>,

BoolAlg <E> {}

interface AllEval extends AllAlg <IEval >,

ControlEval , BoolEval {}

Figure 4.6: Composing algebras

AsControlEval and BoolEval implement ControlAlg and BoolAlg, the combination of the two also implementAllAlg. AllEval is thus a concrete implementation of AllAlg. This approach can be applied to any number of signatures and object algebras of those signatures. Use ofAllEval is demonstrated in figure 4.7. This example highlights one of the benefits of the Object Algebra design

(22)

public static <E, A extends AllAlg <E>> E ast(A alg) { return alg. ifTrue (

alg.bool(true), alg.bool(false), alg.bool(true) );

}

AllAlg <IEval > alg = new AllEval () {};

// should result in false ( wrapped in Bool)

ast(alg).eval(new IMap () , new Null ()); Figure 4.7: Creating generic ASTs

pattern. Data variants can be extended without having to change underlying operations, which is in this caseIEval.

4.3.3 Overriding algebras

Implementations of FunCons may not suit every use case. Thus there needs to exist a mechanism to replace implementations with implementations that do suit the use case. For example, some languages may prefer logical operators that short circuit over those that do not, and vice versa. A signature for FunCons that resemble logical operators can be found in figure 4.8. The signature defines three FunCons: lAnd, lOr and not. These represent the logical operations and, or and not respectively. The signatureLogicOpAlg can be implemented for carrier type IEval as in figure 4.9. This algebra simply implements the FunCons by mapping them to corresponding Java operators, and wrapping the result in aBool. However, the FunCon lOr is build on top of the FunCons not and lAnd through De Morgan’s laws. Note that Java’s&& operator short circuits. Meaning, if the left hand side evaluates to valuefalse, the right hand side is never evaluated as the result of the operator can already be determined. By building directly on top of the Java operator, this implementation oflAnd and lOr thus also short circuits.

interface LogicOpAlg <E> { E not(E bool);

E lAnd (E bool1 , E bool2 ); E lOr(E bool1 , E bool2 ); }

Figure 4.8: Signature for the FunCons not, lAnd and lOr

Implementations ofLogicOpAlg without short circuiting are also possible. Consider figure 4.10, here NoShortCircuitLogicOpEval extends a concrete implementation of LogicOpAlg directly to override one of its methods. In this case with a non short circuiting variant oflAnd. Note that because

lOr in LogicOpEval is build on top of lAnd it will also no longer short circuit in NoShortCircuitLogicOpEval. This is then an example of overriding FunCons to provide a different implementation. In this case

(23)

interface LogicOpEval extends LogicOpAlg <IEval > { default IEval not( IEval b) {

return (env , given ) -> new Bool (!(( Bool)b.eval ()). value ()); }

default IEval lAnd( IEval b1 , IEval b2) { return (env , given ) ->

new Bool(

(( Bool)b1.eval (env , given )). value () &&

(( Bool)b2.eval (env , given )). value () );

}

default IEval lOr( IEval b1 , IEval b2) { return not(lAnd(not(b1), not(b2))); }

}

Figure 4.9: Object algebra implementing the not, lAnd and lOr FunCons in a shortcircuiting fashion interface NoShortCircuitLogicOpEval extends LogicOpEval {

@Override

default IEval lAnd( IEval b1 , IEval b2) { return (env , given ) -> {

boolean b1Val = (( Bool)b1.eval(env , given )). value (); boolean b2Val = (( Bool)b2.eval(env , given )). value (); return new Bool( b1Val && b2Val );

}; } }

Figure 4.10: Object algebra overriding the lAnd FunCon with a non short-circuiting variant

LogicOpEval one now has logical operators that do not short circuit. Overriding existing FunCons is thus a case of extending existing implementations, and overriding the FunCons that need replacing.

4.3.4 Handling dependencies

So far only FunCons have been introduced that are independent of other FunCons. It may however be beneficial to build FunCons in terms of other FunCons for various reasons, such as prevention of duplicate code. This is once again something that can be achieved with Java’s multiple inheritance for interfaces.

Take the algebra introduced in the previous section: LogicOpAlg. It can also be implemented through means of the FunConsbool and ifTrue. This can be done as shown in figure 4.11. In this implementation all FunCons introduced inLogicOpAlg are build on FunCons from both BoolAlg and ControlAlg. The FunCons not and lAnd are simply calls to the FunCon ifTrue. Note also that

(24)

interface LogicOp <E> extends BoolAlg <E>,

ControlAlg <E>, LogicOpAlg <E> { default E not(E b) {

return ifTrue (b, bool(false), bool(true)); }

default E lAnd(E b1 , E b2) {

return ifTrue (b1 , b2 , bool(false)); }

default E lOr(E b1 , E b2) {

return not(lAnd(not(b1), not(b2))); }

}

Figure 4.11: Object algebra for a generic carrier type implementation of LogicOpAlg

in this implementationlAnd short circuits, as the argument b2 is only ever evaluated if b1 evaluates to true as per ifTrue’s specification. This is obviously quite an extreme example, where every FunCon can be translated in terms of other FunCons. However, being able to do so has it merits. All type information is kept generic, that means that if an object algebra ofBoolAlg and ControlAlg is provided over some carrier type (not justIEval), one gets an object algebra for LogicOpAlg for free. It is important thatLogicOp is still an object algebra of signature LogicOpAlg, rather than moving the implementation intoLogicOpAlg. That would merge implementation and definition and doing so would move the dependencies from algebra level to signature level. This would mean that if one wants logical operators, bothBoolAlg and ControlAlg are required, no matter the object algebra provided forLogicOpAlg.

4.4 Semantic entities

So far only stateless FunCons have been introduced. However, for any realistic programming language to be build on FunCon4J , support is needed for several processes. These include, but are not limited to: binding, abstractions, side effects and support for recursion. In this section is discussed how these processes are supported in FunCon4J .

4.4.1 Binding

Starting with support for binding. That is binding values to names, and later having some method of retrieving these bound values. In FunCon4J this is done through several FunCons, but most impor-tantly through the FunConsbindValue and boundValue. Which deal with binding values to names and retrieving said values from those names respectively. In order to support this implementation an environment is needed in which names and values can be stored. This leads to the introduction of the first semantic entity in FunCon4J , an environment. All an environment has to be is a map that maps names to values. However, it is quite beneficial if this map is immutable, as scoping needs to be applied quite regularly in any language. If the environment is immutable a method of passing it

(25)

around to other FunCons is required, this is achieved by having theeval() method of IEval accept an environment as argument. This makes it possible to build FunCons that operate on this environment. Implementations for the FunConsid, bindValue and boundValue can be found in 4.12.

interface BindAlg <E> {

E id( java.lang. String name); E bindValue (E id , E value ); E boundValue (E id);

}

interface BindEval extends BindAlg <IEval > { default IEval id(java.lang. String name) {

return (env , given ) -> new Id(name); }

default IEval bindValue ( IEval id , IEval v) {

return (env , given ) -> env.put(id.eval () , v.eval(env , given ));

}

default IEval boundValue ( IEval id) {

return (env , given ) -> env.get(id.eval (env , given )); }

}

Figure 4.12: Object algebras for environment bindings

The FunConid is introduced to have some value for identifiers in the FunCon world. All it does in this implementation is create an instance ofIEval that when called upon returns a new instance ofId. Where Id is some value in the value hierarchy of FunCons tagged by the interface Value. Do note that now theeval() method accepts an argument, the lambda returned by any FunCon that implementsIEval must do so as well. Java will automatically inference the type, so type information is not required in the lambda.

On to the FunCons that actually interact with the environment.bindValue simply puts somethings into the environment, and returns the result of that. Where the result of theput operation should be a newEnvironment in which the identifier is mapped to the value it is supposed to bind. All that boundValue then needs to do is retrieve the value that is supposedly bound to id.

These are just two simple FunCons FunCon4J provides for interacting with the environment. However, more complex FunCons are also provided. Such as FunCons for scoping, closing and pattern matching on identifiers.

4.4.2 Supply & Given

Two rather special FunCons in FunCon4J areSupply and Given. The dynamic of the two FunCons is as follows.Given returns the value bound by the closest call to Supply. These FunCons effectively act as a reserved space in the environment for passing a single argument to FunCons. They find their use in method calls, and pattern matches. Rather than mingling the implementation of the environment with a reserved keyword that could then be used by Supply and Given, the choice was made to keep it as a separate semantic entity in FunCon4J . As such it should also be passed as an argument

(26)

to any operation that needs access to a supplied value, likeIEval. Supply and Given can now be implemented as in figure 4.13. With this implementation the valuegiven can be accessed through theGiven FunCon. A value can be supplied by making use of the FunCon Supply. For instance, constructions like in figure 4.14 are now possible. This construction passes a boolean value through the FunConSupply to an ifTrue FunCon. Within this ifTrue FunCon is the FunCon Given which will evaluate to whatever value was supplied, in this case the boolean value true. Ultimately the whole construct is evaluated under a new environment and a null value as given value. The result should be the booleanfalse. Such a construction can be used to for instance resemble functions in languages.

interface SupplyGivenAlg <E> { E supply (E value , E exp); E given ();

}

interface SupplyGivenEval extends SupplyGivenAlg <IEval > { default IEval supply ( IEval v, IEval exp) {

return (env , given ) -> exp.eval(env , v.eval(env , given )); }

default IEval given () {

return (env , given ) -> given ; }

}

Figure 4.13: Object algebras for the Supply and Given FunCons

supply (

bool(true),

ifTrue ( given () , bool(false), bool(true)) ).eval (new Environment () , new Null ());

Figure 4.14: example usage of the Supply FunCon

4.4.3 Abstractions

In order to truly support functions, needed is a value to wrap function bodies. As this can be used to delay evaluation, and separate evaluation of the function itself. That is first building the FunCons that ultimately contains the code the function should execute, the function definition. And later, evaluating its body, i.e. when the function is called. This is important as a function may be defined under some environment, and later called under a completely different environment. To support higher order programming, such a value is needed for functions to be stored in the environment.

The FunCon value used for this purpose in FunCon4J isAbs, which is short for abstraction. For the operationIEval it can be implemented as in figure 4.15. This value has several FunCons that operate on it, most notably the FunConsAbs and Apply. Where the FunCon Abs simply creates an abstraction,

(27)

public class Abs <E> implements Value { private E body;

public Abs(E body) { this.body = body ; }

public E body () { return body ; }

}

Figure 4.15: A value for abstractions

interface ApplyAlg <E> { E abs(E body);

E apply (E abs , E arg); }

interface ApplyEval extends ApplyAlg <IEval > { default IEval abs( IEval body) {

return (env , given ) -> new Abs(body); }

default IEval apply ( IEval abs , IEval arg) { return (env , given ) ->

supply ( arg ,

((Abs <IEval >) abs.eval(env , given )).body () ).eval(env , given );

} }

(28)

apply (abs( given ()), bool(true))

. eval(new Environment () , new Null ());

Figure 4.17: Applying the boolean value true to an abstraction that simply returns what it is given.

andApply applies an argument to an abstraction. That is apply first evaluates the abstraction, and then calls its body with the argument that is to be applied. Both can be implemented for operationIEval as in figure 4.16.

Note that this implementation of the FunCon Apply evaluates the abstraction, the call to the eval() method of abs, when the FunCon Apply is evaluated. If earlier evaluation is wanted, that is when theApply FunCon is called, a different implementation can be supplied with evaluation of abs outside of the lambda method. Or the argument abs passed to the FunCon Apply can simply be one that directly returns anAbs. Which can be achieved through means of caching as shown in section 4.3.1.

Through theAbs and Apply FunCons, functions can be defined as in figure 4.17. Here the con-structabs(given()) is effectively a function that directly returns its input, i.e. an id function. To which in this case the boolean valuetrueis applied. This construct should thus result in the boolean valuetrue. Note that this implementation is rather limited, as for instance there is no support for multiple parameters, or curried functions. Support for this is introduced in FunCon4J withCurry and Uncurry FunCons.

4.4.4 Side effects

To support side effects in the form of mutable variables in a language, some mutable entity is needed. In FunCon4J this is implemented as the valueVar, which represents a single variable. A Var wraps another value, however Var itself is mutable. As such the wrapped value can change to another value. The choice for this implementation over a mutable or immutable store was made on basis of performance. Look ups in a mutable store proved too costly. Without a store these look ups can be entirely avoided. The implementation ofVar can be as in figure 4.18. Nothing special here, just a class tagged byValue that can store another value. The introduction of mutability however, in the form of a set method allows for side effectFunCons to be implemented. One such FunCon is the Assign FunCon that assigns a value to a variable. Another notable FunCon is the Alloc FunCon that allocates (creates) aVar, and returns it as a result. These can be implemented as in figure 4.19. HereAlloc simply creates a variable that stores the result of evaluating val. Whereas Assign first evaluatesvar followed by calling its set() method with the result of the evaluation of val. Note that because Var is tagged through the Value interface, it can also be bound to some identifier through means of theBind FunCon introduced earlier.

A FunCon is required to actually interact with the value stored in a variable. In FunCon4J two vari-ants exist, those are the FunCons AssignedValue and AssignedValueIfVar. Here AssignedValue takes some variable as argument and then unwraps it by calling itsvalue() method. AssignedValueIfVar is a little more complex and only unwraps a Variable if it is in fact a variable. A FunCon like this is effectively Java’s instanceof in the FunCon domain. It enables the language engineer to write code that is oblivious of whether side-effects are supported or not, by simply making use of AssignedValueIfVar in either case.

(29)

public class Var implements Value { private Value value ;

public Var( Value val) { set(val);

}

public Value value () { return value ;

}

public void set( Value val) { value = val;

} }

Figure 4.18: A value Var for side effects

interface AssignAlg <E> { E alloc (E val);

E assign (E var , E val); }

interface AssignEval extends AssignAlg <IEval > { default IEval alloc ( IEval val) {

return (env , given ) -> new Var(val.eval(env , given )); }

default IEval assign ( IEval var , IEval val) { return (env , given ) ->

(( Variable )var. eval(env , given )) .set(val.eval(env , given )); }

}

(30)

4.4.5 Recursion

Support for recursion can be quite a challenge in language engineering. As references to a recursive function are scoped in an environment where the function itself may not be defined yet. In FunCon4J this is dealt with in a similar fashion as variables, through means of a mutable entity. In this case that entity is calledFwd. Its implementation is similar to that of variables, as such it is not shown here. Like variables, the choice was made to not have some form of a store. As the look ups proved to be too costly in terms of performance.

Recursion in FunCon4J is handled through several FunCons, worth noting are the FunConsFreshFwds andSetForwards. Which are responsible for creating several instances of Fwd and later setting them to lead to some value. On top of these two FunCons, theRecursive FunCon is build. Which takes a list of identifiers that require recursive binding, and a declaration in which this is supposed to hap-pen. A declaration is some computation that when evaluated results in an environment. Firstly, all identifiers in the list are bound to a Fwd that points to an Undefined value. The resulting environ-ment is scoped over the declaration, and the declaration is then evaluated. The environenviron-ment that is created by evaluating this declaration is then used to set all instances ofFwd created earlier. Because Fwd is mutable, this effectively fixes the missing bindings (those instances of Fwd originally pointing toUndefined) in the declaration. This is then also a demonstration of why it is crucial to support separation of evaluation of an abstraction (function definition), and later that of its body (function call).

Do note that because instances ofFwd are bound in the environment that then point to some value, a construction similar to that of variables is necessary to actually interact with the values. In the case ofFwd, the FunCons FollowFwd and FollowdIfFwd are available. Through use of these FunCons, the language engineer can interact with recursively defined values.

4.4.6 Adding new semantic entities

Algebras in an object algebra can be extended as long as the operation is the same. However, introduc-tion of new semantic entities may require change in the signature of the operaintroduc-tion. For instance, one may want to introduce an immutable store, this would however require the signature ofIEval to be extended with such a store. FunCons are supposed to be defined once and the collection of FunCons is open ended. Meaning anyone should be able to add new FunCons that may require introducing new semantic entities. Hence this poses a problem. Luckily a solution to this problem is found in a paper by Inostroza et al.[15] that introduces implicit context propagation in object algebras. The method proposed is to introduce a new operation when the signature of the old operation needs extension. All existing implementations then have to be re-implemented for the new operation. Instead of completely re-implementing the existing implementations to comply with the changed signature, one applies the pattern shown in figure 4.20.

The general thought behind the pattern is that one re-uses the old implementation within the new implementation. By providing an instance ofAlg over operation OldOp through the abstract method base(), one provides access to the already existing implementation. Then what is left to do is wrap the existing implementation with the new signature insomeFunCon(). The result is an implementation overNewOp without duplicate code. Thus, by applying this pattern even extension of semantic entities that require change in the operation signature is possible within FunCon4J .

(31)

interface AlgNewOp extends Alg <NewOp > { Alg <OldOp > base ();

default NewOp someFunCon ( NewOp arg) { return ( newEntity , oldEntity , ...) ->

base (). someFunCon (). operation ( oldEntity ); }

... }

Figure 4.20: Pattern for implicit propagation in object algebras interface IPrint {

java. lang. String print (); }

Figure 4.21: The IPrint operation

4.5 More operations

Up to this point only theIEval operation has been discussed. However, object algebras and thus Fun-Con4J support expansion in types of operations without requiring changes in previously implemented operations. As such it is easy to implement new operations, such asIPrint for pretty printing and ITypeCheck for type checking. These operations are not implemented in FunCon4J as this was sim-ply out of scope of the project. Nevertheless in this section is provided some examples to hopefully inspire future work on these operations.

4.5.1 IPrint

An implementation ofIPrint for FunCons means building a pretty printer that prints FunCon terms instead of what one probably wants, that is source language constructs. It is somewhat a limitation of the approach, but a logical one. As the building blocks, that are FunCons, can only know about themselves and not the house, the source language, they ultimately form. However, an implementation ofIPrint is not meaningless. Even in its simplest form it can be used to debug a language as one sees exactly those building blocks that make up each language construct. It also provides a translation from source language to FunCons.

To demonstrate, take the implementation ofIPrint in figure 4.21. This is a simple implemen-tation ofIPrint as a functional interface with a method print() that returns a Java native String. Now take the signature and its algebra for bothIPrint and IEval in figure 4.22. IntAlg is a sig-nature with support for integer literals and integer addition. Here two algebras are provided for the signature. TheIPrint algebra is concerned with printing, whereas the IEval algebra is concerned with evaluation. As they both implement the same algebra, one can now write code such as in figure fig:printdemo. This is where Object Algebras truly shine. As the translation to FunCons is only given once in the static methodaddTwoToFive(). Where both algebras of these FunCons can be used simply by passing a different algebra to the method. Note that the implementations have no

(32)

knowl-interface IntAlg <E> { E lit(int a);

E intAdd (E a, E b); }

interface IntPrint extends IntAlg <IPrint > { default IPrint lit(int a) {

return () -> a + ""; }

default IPrint intAdd ( IPrint a, IPrint b) { return () -> a. print () + " + " + b. print (); }

}

interface IntEval extends IntAlg <IEval > { default IEval lit(int a) {

return (env , given ) -> new Int(a); }

default IEval intAdd ( IEval a, IEval b) { return (env , given ) ->

(( Number )a.eval (env , given )) .add(b.eval(env , given )) . toInteger ();

} }

Figure 4.22: The IntPrint and IntEval algebras

public static <E> E addTwoToFive (IntAlg <E> alg) { return alg. intAdd (alg.lit (2) , alg.lit (5)); }

public static void main( String [] args) { addTwoToFive (new IntEval () {})

.eval (new Environment () , new Null ());

// -> should result in the value 7

addTwoToFive (new IntPrint () {}). print ();

// -> should result in the String : "2 + 5"

}

(33)

interface ITypeCheck { Type check ();

}

Figure 4.24: The ITypeCheck operation

edge of each others existence, and as such more implementations can be added without requiring any change in the other.

This is just a simple implementation ofIPrint for IntAlg. One might want some more advanced features like indentation or line numbers. My recommendations for supporting this is to pass some more information to theprint() method from IPrint, similar to how IEval needs to be called with an environment and a given value. However, the complete implementation of this is left up to future work.

4.5.2 ITypeCheck

One of the promises of FunCons is that besides defining dynamic semantics, likeIEval does, it also defines the static semantics. For instance, static type checking can be achieved through means of implementation of a different operation:ITypeCheck. One such implementation could be as in figure 4.24. Implemented as a functional interface with a single methodcheck() that returns a Type. This Type specifies the type of the operation. If there is a conflict, an exception can be thrown or some message can be reported. For instance an implementation for theIntAlg as defined earlier could be as in figure 4.25. This is a very simplistic, but functional type checker forIntAlg. In a realistic setting however, where assigning, binding and recursion exists, it cannot be this simple. Several entities such as an environment for types, are needed to fully support a type checker. Besides sharing some requirements with theIEval operation, the ITypeCheck operation also introduces new ones. For instance, consider dealing with polymorphism or generics. Also consider statically type checked languages in which the user can define its own types. New FunCons will be required to handle this, those are FunCons that are only truly interesting when type checking. Most of these FunCons are present in FunCon4J , however only implemented for theIEval operation and thus rather trivial. The complete implementation of theITypeCheck operation is left for future work.

4.5.3 IDebug

Another operation that one could implement isIDebug. Implementation of such an operation would provide any language build on FunCon4J with a debugger, for free! Similarly toIPrint there are some challenges to such an implementation. As ultimately in FunCon4J source code gets translated down to FunCons. This translation makes it quite difficult to keep track of which source language construct is being executed. Moreover, the user (the source code programmer) may be completely oblivious of the existence of FunCons. As such to make any form of meaningful debugger, source code information needs to be carried over to FunCons. Such information could include line numbers and references to abstract or concrete syntax. This would require the language engineer to supply such information to FunCons. This could be done by building an operation that requires such information, similarly to howeval() in IEval requires an environment and a given value. Implementation of this operation is however left to future work.

(34)

interface IntCheck extends IntAlg < ITypeCheck > { default ITypeCheck lit(int a) {

return () -> new Type("int"); }

default ITypeCheck intAdd ( ITypeCheck a, ITypeCheck b) { return () -> {

if (a. check (). equals (new Type("int")) && b. check (). equals (new Type("int"))) { return new Type("int");

}

throw new TypeCheckException (); }

} }

Figure 4.25: The IntCheck algebra

Alternatively, one could make use of Java’s proxies to build a debugger. Doing so requires wrap-ping a concrete implementation of a signature in a proxy. This then gives the language engineer the tools to determine what should happen before and after every call to a FunCon. Without any changes required to any other part of the program. One can repeat this process for the abstract syntax (more on this in section 5). Which grants the language engineer access to both abstract syntax and later to the FunCons they call in the proxy. This then gives the engineer quite some tools to build a meaningful debugger. On top of this, this approach does not require any passing of source code information in the abstract syntax.

FunCon4J

FunCon4J

Reusable programming language implementations

through FunCons implemented with Object Algebras

This thesis was written in the context of graduating

from the master of science program in software

engi-neering at the University of Amsterdam.

Contents

Chapter 1

Introduction

Chapter 2

Reusable Language Implementations

2.1

Motivation

2.2

Challenges faced: ambiguity and lack of modularity

2.3

A candidate solution

2.4

Overview

Chapter 3

Background

3.1

Funcons

3.2

Object Algebras

3.3

Overview

Chapter 4

FunCon4J ’s Design

4.1

A library of semantic building blocks

4.2

Focus points

4.3

FunCon design

4.4

Semantic entities

4.5

More operations