Translating Rascal Virtual Machine programs to the Java Virtual Machine

(1)

Translating Rascal Virtual Machine

programs to the Java Virtual

Machine

Ferry Rietveld

Ferry.Rietveld@gmail.com

Summer 2014, 62 pages

Supervisors: Prof. Dr. Paul Klint, Anastasia Izmaylova

Host organisation: Centrum Wiskunde & Informatica,http://www.cwi.nl

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

Abstract

In 1995 Niklaus Wirth wrote ”software is getting slower more rapidly than hardware becomes faster[11]“. This also applies to the new programming languages and the Rascal language is no exception to this rule. Rascal[6] is a meta-programming language for software analysis and software transformation.

The Rascal language has an enormous expressive power and it features among others, pattern matching, immutable data, context-free grammars, relations and algebraic data-types. But this ex-pressive power comes with a cost, namely its execution speed.

The current Rascal implementation uses an AST based interpreter to execute its programs. At the moment a project is ongoing to implement a complete compiler pipeline with the capability to run Rascal programs on the Java Virtual Machine (JVM) without an interpreter.

In this new approach a Rascal program is translated into a so-called Rascal Virtual Machine bytecode. The topic of this thesis is to translate these generated RVM bytecode programs to JVM bytecode.

This project aims to answer the following research questions:

1. What is an effective way to translate Rascal Virtual Machine programs to the Java Virtual Machine?

2. How to measure the performance of the translated RVM code and how to use the measurements to improve performance?

Translated RVM programs showed a speedup ranging from 1.04 to 7.21 times faster than running on the RVM bytecode interpreter. These measurements were done with small applications and micro-benchmarks. In real-life applications a speedup factor of 1.2 is to be expected.

Measuring the turnaround times of JVM applications produces fluctuating results. Tests of at least 30 runs and a large execution time for each test did produce reproducible results.

(5)

Chapter 1

Introduction and Motivation

1.1 Context

Rascal is a meta-programing language embedded in the Eclipse workbench. And although Rascal is used in practice and education it is still considered in its alpha state. The architecture of the current system, an AST based interpreter writen in JAVA, is depicted in Figure1.1

Figure 1.1: Existing Interpreter Architecture.

To reduce the complexity of the system, work is under way to build a totally new system. No longer a single parser and interpreter but a complete compiler pipeline in which different parts of the translation are separated in multiple compilation steps. In this pipeline there are two intermediate languages introduced. The first is a high-level language called muRascal introducing guarded stackful coroutines. This language is used to implement Rascal language features, like pattern matching and iterators. The language features implemented in muRascal rely heavily on the use of guarded coroutines[5]. The second language is a low-level bytecode language called RVM with support for overloaded functions, dynamic function invocation, exception handling and stackful coroutines. This RVM language runs on a stack based virtual machine built for this compiler pipeline.

The compiler architecture in Figure 1.2 is proposed. The aim of this architecture is to realize ”a fast, reusable and extensible implementation infrastructure that can accommodate future language evolution.“[5]

(6)

Figure 1.2: Compiler architecture under construction.

1.2 Problem definition

This thesis project is driven by the quest for speed and is about the final stage of the compiler pipeline: the translation of the generated RVM code and its infrastructure onto the JVM (RVM2JVM in Figure 1.2). The RVM infrastructure contains features not available in the JVM:

1. Guarded stackful coroutines

The paper [5] describes the design of guarded coroutines used in the RVM interpreter. The following characteristics describe a stackful guarded coroutine:

• Local data of the coroutine persists between successive activations of the coroutine. • The execution of the coroutine is suspended as control leaves (YIELD) it and carries on

where it left off when control re-enters (NEXT) the coroutine in a later stage. • Suspension and resumption of a coroutine is also allowed from nested function calls. • Coroutines may have preconditions, these preconditions are validated during the creation

of a coroutine. If the validation fails an exhausted1 coroutine is created.

2. Dynamic function invocation

The RVM holds a number of instructions that use dynamic function calling. The name of the functions to be invoked is contained in a element on the RVM stack. There are no JVM instructions to call functions based on a value in a variable. Since the implementation of ”JSR 292: Supporting Dynamically Typed Languages on the JavaTM_{Platform” [}₄_{] the JVM supports}

dynamic method invocation. In the currently used JVM version (1.7 45) [1] there are two approaches to invoke methods dynamically; first through a system of MethodHandles[7] and secondly through reflection. In both cases the methods are retrieved and activated by querying the running JVM, instead of calling them directly.

3. Overloaded functions

Function overloading, Christopher Strachey called it in 1967 ”ad hoc“ polymorphism[10], is a kind of polymorphism in which polymorphic functions can be invoked with arguments of different types. In Java and C++ the binding between parameters and functions is only done during compile time, and based on type only (early binding). Rascal and the RVM allow binding on both type and value. In the RVM an overloaded function is build of a list of all functions matching given parameter types. During run-time the functions contained in the overloaded function are invoked with the provided parameters, the invoked function is responsible to check

(7)

if the values of the parameters match. If there is a match on type and value the function is executed and the overloaded function returns (late binding), otherwise the function returns a failed status forcing the overloaded mechanism to call the next function alternative.

The RVM has also a system for exception handling. This system of exception handling is implemented in a similar way as in the JVM.

To implement a full RVM to JVM translator, the following parts have to be realized; 1. A system to run or to emulate stackful guarded coroutines on the JVM.

2. Methods for overloaded function invocation. 3. Methods for dynamic function invocation.

4. A system for exception handling, translating RVM exceptions to JVM exceptions. 5. A system for translating individual RVM functions to JVM methods.

6. A translation of individual RVM instructions to JVM bytecode instructions or method calls. 7. A system for presenting compile time information at runtime e.g., function specification, constant

declarations and type definitions contained in the “PDB“.

To address the research questions two distinguishable parts have to be realized: (1) a system trans-lating RVM programs to the JVM; (2) a profile and benchmark suite for testing the translated RVM programs.

1.3 Scope

The previous section shows the list of the parts to realize a complete translation from RVM to JVM, the goal of this project is to at least implement the items 1,2,3,5,6. These items allow the benchmark-ing and profilbenchmark-ing of the translated RVM programs.

1.3.1 Expected results

This project will have the following deliverables:

• A concise specification of the major parts of the RVM.

• A system that translates RVM programs to the JVM, and the translated programs pass the compiler tests.2

• A set of benchmarks on the effectiveness of the translated system compared to the RVM inter-preter.

• A master thesis with the answers on the research questions.

1.4 Research questions

The central research questions of this thesis are as follows:

1. What is an effective way to translate Rascal Virtual Machine programs to the Java Virtual Machine?

2. How to measure the performance of the translated RVM code and how to use the measurements to improve performance?

The definition of “effective would be in this case: (1) the translated RVM programs execute faster than interpreted programs; (2) the software implementing the RVM instruction-set is more readable and therefore better maintainable than the code in the current RVM interpreter.

(8)

1.5 Outline

The subsequent chapters elaborate on the aspects of the research questions listed above.

Chapter 2 will present a global structure of the RVM interpreter and shows the new structure of the system after the translation to the JVM. This chapter also presents the design decisions taken to build the new system. The last part of this chapter will describe the major parts of the translation in detail.

Chapter 3 presents the validation of the mapped system and addresses the effectiveness. This is done by testing the mapped system and measuring the turnaround time of Rascal programs. The test of the program mapping will be done with the existing compiler tests.

Chapter4 analyzes the results and discusses the design decisions. Chapter5 concludes and proposes future work.

(9)

Chapter 2

Translating the RVM to JVM

2.1 Introduction

Before I describe the translation from RVM to JVM, I give a global introduction to the RVM in both the interpreter form and the translated form (RVMonJVM). Then I present a list of design decisions that were taken before actually building of the translator, followed by the parts of the translation in detail. The detailed description of the translation of a Rascal program is divided into four sections: (1) the structure at runtime; (2) function and coroutine structure; (3) function and coroutine translation; (4) translation of individual RVM instructions.

2.2 Introduction to the RVM

The Rascal virtual machine (RVM) is an application virtual machine that can execute functions

written in RVM bytecode1_{. The RVM runs as a normal application inside a JVM thread and is}

implemented in a single Java class called RVM.

2.2.1 The RVM structure

The major elements of the RVM are shown in Figure2.1.

Figure 2.1: Structure of the RVM

(10)

The information about the RVM is constructed based on question sessions with the original designers and a study of the RVM source code. The following paragraphs give a brief explanation of each component. The preceding compilation steps from Rascal to RVM have constructed all the information contained in the RVM.

Entry points

The entry point of the RVM has multiple methods for initialization and execution of RVM functions. The stores holding, functions, overloaded functions and constructors, play a central role in the RVM. The stores contain all the information necessary to execute functions and coroutines, or to create instances of defined types. The initialization is done by loading the internal stores (collections) with their entries. After the loading of the three stores is the system ready to execute functions.

Current Frame

The current frame holds the contextual information of a function that is currently active. It is created and initialized prior to the start of a function and destroyed on an explicit function return.

Codeblock

This component holds the list of instructions (function body) of the currently active function. It is loaded with the list of instructions contained in a Function. The functions are contained in the FunctionStore and retrieved before executing them.

Instruction Interpreter

This is the execution engine of the RVM, it executes the instructions retrieved from the Codeblock. The instruction interpreter realizes the features of the RVM. Among the usual instructions (like store, load, goto) the interpreter holds specialized instructions for: (1) guarded stackful coroutines; (2) overloaded function invocation; (3) dynamic function invocation and (4) partial function application. FunctionStore

The FunctionStore contains an entry for each function or coroutine in the system. Each function entry holds the complete set of information about a function/coroutine: (1) the number (arity) and type of parameters and the return type of the function; (2) information about the number of local variables and the needed size of the stack frame; (3) the list of instructions building the functions executable body; (4) the exception handling information.

An entry from the FunctionStore is used to create a stack frame for a specific function or a FunctionInstance2 _{and to load the Codeblock with the list of instructions. The RVM instructions}

have the capability to interact with the three stores and load elements from them (new functions). The FunctionStore is also used to create a FunctionMap which contains a mapping between the Rascal name of the function and an integer id. An example of a RVM function is shown in Listing2.1.

1 FUNCTION(” exp : : C o m p i l e r : : Examp : : Fib / main ( l i s t ( v a l u e ( ) ) ; ) #0”,

f u n c (i n t( ) , [ l i s t ( v a l u e ( ) ) ] )

3 [ @isVarArgs=f a l s e] ,” ”, 2 , 4 ,f a l s e , 7 , [

LOADLOC( 0 ) ,

5 LOADTYPE( l i s t ( v a l u e ( ) ) ) ,

CHECKARGTYPE( ) ,

7 JMPFALSE(”ELSE LAB3”) ,

LOADLOC( 0 ) ,

9 STORELOC( 3 ) ,

POP( ) ,

11 LOADCON( 3 3 ) ,

LOADCON( ( ) ) ,

13 OCALL(” exp : : C o m p i l e r : : Examp : : Fib / main ( l i s t ( v a l u e ( ) ) ; ) #0/ u s e : f i b ( i n t ( ) ; ) ”, 2 )

RETURN1( 1 ) ,

15 LABEL(”ELSE LAB3”) ,

FAILRETURN( ) ] , [ ] )

Listing 2.1: RVM function int main(list[value] args)

2_{FunctionInstances are created as result of a partial function application in the MuRascal language, and can be}

(11)

Clearly visible in this listing is the name of the function (line 1), the return type, and types of the parameters (line 2) the specification of the required stack frame (line 3) and the RVM instructions building the function body (lines 4-16). AppendixAdescribes each used RVM instruction in detail. OverloadedFunctionStore

The OverloadedFunctionStore contains information about all overloaded functions in the system. It does not really contain functions but information on how to aggregate overloaded functions. An overloaded function is build from a set with function alternatives and constructors. In an overloaded function call at runtime these function alternatives are tried one after another with the provided parameters. The function with the matching list of parameters is executed. If there is no match at runtime a Constructor is tried used to create a datatype value.

To handle an overloaded function call an object is created based on a an entry in the Overloaded-FunctionStore. This object holds the ids of all function alternatives and the constructors. This object is used to iterate over the functions, creating stack frames, and let the function test parameter types and values or use the constructor to create a value.

ConstructorStore

The ConstructorStore contains all constructors in the system, a constructor is used to create values for user-defined datatypes (Algebraic Datatypes).

Module variables

The unit Module variables contains all the global variables from the compiled Rascal modules. The initialization of the module variables is done by special for this purpose generated initialization func-tions.

Execution of a RVM program

The starting of an RVM program is done in 4 steps: (1) first all elements have to be initialized, this initialization is done by traversing the stores and initializing each element; (2) based on the function in the FunctionStore a Frame is created containing a stack, stack pointer, program counter, and the function itself; (3) from the function the set of instructions is loaded in the Codeblock ; (4) the interpreter starts executing instructions in the Codeblock at address 0. Calling a function from within the interpreter follows the same procedure only the newly created frame has a link to the previous one. Returning from a function is then just dropping back into the previous frame and set the instructions (Codeblock), program counter and stack pointer to their previous value.

2.2.2 The RVMonJVM structure

The following issues have to be addressed when translating RVM to JVM:

• Each Rascal function should be translated to a JVM method (this includes the Rascal name into a valid JVM method name).

• RVM instructions have to be translated to their JVM counterpart.

• A dispatcher has to be created to find the translated function based on their original Rascal name.

• Suspending and continuation locations of coroutines have to be determined. • The Current frame should hold the continuation point of a coroutine.

The translated RVM has the following as shown in Figure2.2. The function entries in the Function-Store are now only used to create stack frames with a stack pointer and in case of dynamic function invocation to create FunctionInstances. The list of instructions making the function body is ignored at runtime. The OverloadedFunctionStore and the ConstructorStore have the same function as in the RVM interpreter.

(12)

Figure 2.2: Structure of a translated RVM/Rascal program

The translation is done, after the stores are fully loaded by the preceding compiler steps. The indi-vidual entries in the FunctionStore and OverloadedFunctionStore are translated into JVM methods, and based on the number of entries in the FunctionStore a dynamic dispatcher method is created. The goal of this dispatcher is to call the generated JVM methods based on a name (String) or an integer id. The function of the ConstructorStore is in both systems the same.

Execution of a translated RVM function

The procedure of starting a function in the translated system is nearly identical as in the interpreter. The only difference is in the last step, instead of starting the interpreter the translated function is called through the dispatcher function. The returning from a function means dropping the current frame and execute a JVM RETURN instruction.

2.3 Design decisions

Based on the study of the Rascal code-base, the relevant literature and the proposed structure, as presented in the previous chapter, the following design decisions were made.

DD.1 Split the current RVM object into 3 parts, the first part is the interface with the Rascal execute function, the second part contains the methods implementing the RVM instructions and type conversion functions, the final part containing the Rascal functions that are generated by the object that interfaces with the Rascal execute function. The final part should also inherit from an object containing the instructions and support methods forming a runtime environment for the Rascal functions. The generated class file should contain all functions from all modules, and keep the original Rascal function and module names if possible.

DD.2 Add the translator functionality to the existing interpreter infrastructure leaving the inter-preter functional. This structure will allow for differential debugging[8] of the new system.

DD.3 Stack frames should be created prior to function invocation but are dropped by the return statement in the called function.

DD.4 Implement a dual stack system allowing the signaling of upstream callers information about the instruction causing the return of the function. This signaling mechanism should facilitate the suspension of stackful coroutines.

(13)

DD.5 If possible avoid using reflection or MethodHandles to implement dynamic invocation, in the dispatcher method.

DD.6 Translate each of the functions in the overloaded store into function handling the overloading (probing the alternatives) and call the constructor.

DD.7 Implement RVM instructions as calls to JVM methods, inline only if no other solution is possible. In special cases inline if the number of JVM bytecode instructions does not increase drastically, allow a factor of 2 times the bytecode size. Build a test comparing the impact of inlining done by the JVM or during the translation of RVM instructions.

DD.8 Split large instructions into micro-instructions or hybrids build of partly Java and partly

bytecode. Combine instruction sequences like LOAD* + STORE* + POP into one single

instruction for efficiency reasons.

DD.9 Build the bytecode generator in a single object, and pass it to the interpreter objects like Function, CodeBlock and Instruction emitting bytecode during the traversal.

DD.10 Add the profiling, benchmarking code to the RVM’s executeProgram method. Profiling will probably only show the RascalPrimitives, and the translated functions. While instrumentation of the RVM instruction will influence the results. Micro benchmarking and analysis of the generated code is the preferred method. And possible by the analysis of the profiling data provided by the JVM itself.

DD.11 Add code to the CodeBlock object to calculate reentry labels for coroutines, and add fields to the Frame object to store the runtime reentry points.

2.4 The Structure at Runtime

The structure of the translated system at runtime is shown in Figure 2.3. After all the stores

are completely loaded, in the RVM object, the functions can be generated and executed. The

method executeProgram(String name, IValue[] args) in the RVM object calls the initialization method (finalize()) which traverses all internal data structures and emits a JVM class in memory. This newly created class (RVMRunner) inherits the entire RVM infrastructure from an existing object called RVMRun. After the loading3 _{and instantiating an object of the created class, the stores currently}

contained in the RVM object have to be copied to the created object to allow function execution. The dynRun(String name, IValue[] args) method in the created object can now be used to execute a translated RVM function by providing its name and parameters.

Figure 2.3: Structure during runtime

(14)

2.5 Function and coroutine structure

The following issues have to be addressed when translating RVM functions to JVM methods: • The Rascal function name (also containing the module name) contains characters that are not

allowed in a JVM method name[1].

• The location where the function starts or continues executing must be determined at function entry.

• Some instructions require variables, of certain type, at a known location on the JVM stack. • The JVM stack should contain the same number of elements and of the same type at each

coroutine continuation point.

2.5.1 Naming and signature

The RVM functions and coroutines are translated to JVM methods. To allow the loading of these translated functions the names of the generated methods have to comply with the naming convention enforced by the JVM verifier[1]. This is done by a technique similar to byte stuffing by replacing the characters illegal to the JVM with a $ followed by a character valid in the JVM. The use of this method allows a forward and backward conversion of function and coroutine names (DD.1).

p u b l i c O b j e c t e x p $ 1 $ 1 C o m p i l e r $ 1 $ 1 E x a m p $ 1 $ 1 F i b $ 2 m a i n $ 3 l i s t $ 3 v a l u e $ 3 $ 4 $ 4 $ C $ 4 ( Frame c f ) { i n t sp = c f . sp ; O b j e c t [ ] s t a c k = c f . s t a c k ; O b j e c t r v a l ; B o o l e a n p r e c o n d i t i o n ; C o r o u t i n e c o r o u t i n e ; Frame p r e v ; PRELUDE CODE ; FUNCTION BODY ; r e t u r n r v a l ; }

Listing 2.2: Structure of a generated method

2.5.2 Internal structure

The generated JVM method has the internal structure given in Listing2.2, this structure is assumed by the instructions which have an inlined part, and is needed to pass the JVM runtime verifier. The local variables in the function can be used by different inlined instructions as long as the used types are respected. For example, the local variables precondition, coroutine and prev are all used by the GUARD instruction. The variable rval is used in the inline part of the CALL and RETURN (see paragraph2.7.3) instructions.

The part PRELUDE CODE in the body of the generated method is used to initialize the local variables of the method and in case of a coroutine to jump to the entry point. The JVM verifier performs type and liveness analysis to ensure that the operand stack always has the same number and type of data elements independent of the execution path taken to get to a particular point. This means that it is not possible for the PRELUDE CODE to jump arbitrarily into the middle of a method. The jump performed by the PRELUDE CODE is needed for the reentry of a coroutine, this is to continue at its previous recorded suspension point. This liveness check forces the stack in a translated RVM function to have the same structure at each continuation point. Secondly, the JVM verifier verifies that a register or stack element has been properly initialized with the correct type before they can be accessed. It also enforces that after first initialization the type of the local variable stays the same4_,

the result is that after the initialization of a variable element with an object reference it cannot be used for the other basic JVM primitive types.

(15)

The FUNCTION BODY contains the translated code from the RVM functions Codeblock and assumes three instance variables, a stack, a stack pointer and the call frame.

The JVM further imposes the following rules for its stack frames: • Operand stack space is allocated statically.

• Frames are stored onto the JVM stack (per thread) and are not accessible to JVM instructions, the only accessible part is the operand stack and then only with instructions like PUSH/POP in the current active method.

• No JVM instructions to manipulate the stack pointer.

• The stack is verified at jump targets, forcing to specify the stack content at each labeled JVM bytecode instruction.

2.6 Function and coroutine translation

The following issues have to be addressed when translating RVM coroutines to JVM methods: • Instructions that invoke functions should check the cause of the return of a invoked function. • Return instructions must signal upstream caller.

• The RVM stack structure should be maintained when executing a yield instruction and should contain all the continuation points to recreate the execution path on continuation (NEXT in-struction).

The translation of a stackful guarded coroutines[5] onto the JVM is conceptually simple and is shown in Listing 2.3. The used technique is similar to technique used by Srinivassan[9] to demonstrate an architecture for suspending and resuming methods in Java.

The basic idea of this implementation is to record the current position (stack frame and nested level and location in the function) if there is a YIELD instruction and to recreate the execution call stack in case of an activation of a coroutine (NEXT instruction).

In the current implementation the same translation also applies for functions, thus allowing corou-tines invoking functions invoking coroucorou-tines.

The system uses two stacks, one custom stack for the RVM function parameters and return values, and the native JVM stack to signal the upstream caller the type of the return instruction. The type of return is determined by the instruction causing the return e.g., YIELD, EXHAUST, FAILRETURN or NOACTION (plain return). This use of a secondary stack allows the calling function to handle suspending and resumption (DD.4) based on the value returned on the JVM stack.

The entry code in the prelude consults the provided stack Frame and starts the coroutine either at point 0 (first-time entry) or in the case of reentry straight at the previously recorded entry point. To facilitate the reentry of coroutines, two fields need to exist in the stack frame, one field needed to store the coroutine reentry point and one to store the downstream stack frame.

Listing 2.3 shows how the GUARD, YIELD and CALL instruction can be staged to allow entry

and reentry of a coroutine. The GUARD instruction (lines 5-10) sets the reentry point in case it is a coroutine root instance, and it returns to the CREATE instruction that caused the routines initial invocation. If the coroutine was called as a function it will fall through and continue until the first yield or exhaust instruction. The code implementing the YIELD instruction (lines 15-17) sets the entry point past the return of the YIELD instruction. The CALL instruction is handled in the lines 23-40, it sets the reentry point prior to the call instruction. The stack for the function to be called is created and based the stored entry point. Based on position and entry point value there is an old or newly created stack frame passed to the called function. In case of return from a yield or a yield in a nested function or coroutine the entry point is set, the called stack frame is kept and as a last step the function yields to the upstream caller.

(16)

1 p u b l i c O b j e c t c o r o u t i n e ( Frame c f ) { s w i t c h ( c f . e n t r y p o i n t ) { // P r e l u d e 3 c a s e 0 : . . . ; // P r e c o n d i t i o n i n i t i a l i z a t i o n 5 gu ard ( ) ; // C r e a t i n g C o r o u t i n e O b j e c t b a s e d on r o o t and p r e c o n d i t i o n i f ( i s r o o t c o r o u t i n e ) { 7 s t o r e c o r o u t i n e o b j e c t o n n e x t c a l l e r s t a c k ; c f . e n t r y p o i n t = 1 ; 9 r e t u r n NOACTION; } 11 c a s e 1 : 13 . . . ; 15 y i e l d ( ) ; // L o c a l y i e l d s t o r e s c o r o u t i n e r e s u l t c f . e n t r y p o i n t = 2 ; // Y i e l d s e t new e n t r y p o i n t 17 r e t u r n YIELDED ; 19 c a s e 2 : . . . ; 21 c a s e 3 : 23 i f ( c f . e n t r y p o i n t == 3 ) { c f = c f . n e x t C a l l e d F r a m e ; // Give o l d Frame 25 } e l s e { c f . n e x t C a l l e d F r a m e = new Frame ( c f ) ; 27 c f = c f . n e x t C a l l e d F r a m e ; } 29 O b j e c t r e t u r n S t a t e = c a l l F ( c f ) ;// F u n c t i o n e x e c u t i n g YIELD i n s t r u c t i o n . 31

i f ( r e t u r n S t a t e == YIELDED) { //CALL roundup r e t u r n h a n d l i n g

33 c f . e n t r y p o i n t = 3 ; c f = c f . p r e v i o u s C a l l F r a m e ; 35 r e t u r n YIELDED ; } e l s e { 37 c f = c f . p r e v i o u s C a l l F r a m e ; c f . e n t r y p o i n t = 0 ; 39 c f . n e x t C a l l e d F r a m e = n u l l ; } 41 . . . ; e x h a u s t ( ) ; 43 r e t u r n NOACTION ; 45 d e f a u l t : Throw I l l e g a l E n t r y P o i n t E x c e p t i o n ; } 47 }

Listing 2.3: Translation of a coroutine

The creation of new stack frames is always done by the calling function (CALL or NEXT instruc-tion) and the dropping of the stack frames by the called function (RETURN or YIELD instrucinstruc-tion) the possible yielding downstream forces the call instruction to do both under certain circumstances (DD.3). The use of a second stack allows communication with the upstream caller even if the stack frame of the called function is destroyed.(DD.4)

Although this method is not free of costs the used technique also allows for an elegant implemen-tation for the handling of unbounded5_{exceptions and for failed returns from overloaded function calls.}

(17)

2.7 Translation of RVM Instructions

The following issues have to be addressed when translating RVM instructions to JVM equivalent structures:

• Determine if the instruction has to be adapted to support coroutines. • Determine the amount of necessary inlining.

There are three categories of instructions in the RVM: 1. Primitive manipulation instructions.

2. Basic control flow instructions. 3. Function call and return instructions.

Each category requires its own method of translation.

2.7.1 Primitive Manipulation

The choice of using the custom stack implementation from the interpreter allowed a simple imple-mentation for most the RVM instructions. This simple translation is done by just translating each instruction into a method call (DD.7) no inlining is necessary and there is no modification needed to support coroutines.

To experiment with translations variants three different translators were built, one with only method calls to instructions, a second implementation with only inlining of the instructions that call the language primitives (CALLPRIM and CALLMUPRIM ) and the third where the inlining was done if the body of the RVM instruction was small (less then 16 JVM bytecode instructions). Examples of the instruction translations are given in appendixB.

2.7.2 Basic Control Flow

The RVM system has 5 control flow instructions from which the instructions JMP, JMPFALSE and JMPTRUE have be coded in JVM bytecode. The two other instruction JMPINDEXED and TYPESWITCH get a Java helper method. This helper method is used to load an integer on the JVM stack to enable the use of the JVM tableswitch instruction and secondly, to keep the bytecode size of the translated instruction within limits. AppendixB shows the implementation of the control flow instructions JMPFALSE and TYPESWITCH in detail. The implementations of the basic control flow instructions are not influenced by the system of coroutines.

2.7.3 Function Call and Return Instructions

The RVM has two groups of instructions handling invocation and returning. The first group is re-lated to calling stackful coroutines and functions and is comprised of CALL, CALLDYN, CREATE, CREATEDYN, GUARD, YIELD[01], RETURN[01], NEXT[01] and EXHAUST. These instructions need adaptation for coroutine suspending and activation as well as inlining to implement the suspen-sion (temporary return) of a coroutine.

The second group is for named Rascal functions this group is made up of OCALL, OCALLDYN,

RETURN[01], FAILRETURN and THROW6_{. Which the RETURN[01] and the FAILRETURN need}

an inline part to implement the return of a function.

The translation of these instructions was split out in four groups: • Activating stackful coroutines.

• Returning from functions and coroutines. • Dynamic invocation.

• Overloaded function calling and failreturn.

Each group needs its own method of translation which differs from full inlining, hybrid or a pure Java solution only, and are discussed in the next paragraphs.

(18)

Activating stackful coroutines

The main issue in activating stackful coroutines is to recreate the nested call stack of a previously suspended coroutine. Figure2.4shows an example of the call sequence of a stackful coroutine. Each coroutine starts with a prelude, and the task of this prelude is to jump to the recorded continuation point in case of the reentry of the coroutine7_{. At the CREATE (*1) and CALL instruction (*5) new}

stack frames are created prior to the invocation of the coroutine. At the GUARD instruction (*2) the current stack frame is stored in a CoroutineInstance object on the stack of the Main routine8_{and the}

coroutine returns. At the GUARD (*2), YIELD and CALL (*4) instructions the continuation points are recorded in the current stack frame and the routine returns. The NEXT and CALL instruction (*3) reactivate a coroutine by loading a saved stack frame prior to coroutine invocation, this allowing reentry at a previously stored continuation point. The EXHAUST (*6) just returns to the upstream caller.

Figure 2.4: Coroutine call sequence

The RVM functions and coroutines are translated to JVM methods this allows a call to that method to be done by the global dispatcher, but the return from that method still has to be implemented by an JVM bytecode instruction. This resulted in the following translation variants:

Instruction Translation variant

CREATE, CREATEDYN Full Java implementation with a dynamic call.

GUARD Java part to create coroutine instance, inline

bytecode to return.

NEXT[01] Java implementation with a dynamic call.

CALL, CALLDYN Java to call the coroutine, inline part to return in

case of a downstream yield. See appendixBfor the details of the CALL instruction.

YIELD[01] Java to store coroutines results, inline for the return.

EXHAUST Java part to disable coroutine, inline bytecode to

return.

Table 2.1: Translation variants for coroutine related instructions.

7_{Newly created coroutines use continuation point 0.}

8_{The configured CoroutineInstance is only stored if the stack contains TRUE prior to the GUARD instruction}

(19)

Returning

The RETURN[01] instruction is plays a role when returning from functions and returning from corou-tines. In a coroutine its behavior is similar to a YIELD[01] instruction. The RETURN[01] instruction stores its reference values and returns to the calling function, while the YIELD instruction returns to the function calling NEXT.

This double function of the return instruction requires a Java part to handle the possible results of a coroutine and an inline return instruction. And last but not least the instruction triggering the return, has to signal the upstream caller the kind of return (YIELD, RETURN etc.).

Dynamic function invocation

The following issues have to be addressed when implementing dynamic function invocation: • The translated function name differs from the original name in Rascal.

• Function names are contained in a variable, and there is no direct JVM support for calling methods based on a value in a variable.

• Reflection or MethodHandles should be avoided if possible, this is due to their introduced invo-cation overhead.

Function or coroutine invocation in the RVM is always based on a function name passed as argument. It can be provided as a constant during compile time or retrieved from an element on the RVM stack. This passed name is used in the RVM to retrieve a function object from the FunctionStore, which contains detailed information about the function. This Function object is then used to create a correct stack frame and pass the required parameters. A reflective lookup of the generated method based on the name would then allow an invoke dynamic technique[7] to call the named method with the created stack frame, but a much simpler and faster system is: to use the integer values contained in the FunctionMap9 _{to call a generated dispatcher method with the integer value of the function}

(DD.5).

Listing2.4shows the code how to activate a Rascal modules main function. This is done by using the generated dispatcher method named dynRun(int index) and the retrieved id from the Function-Map.

9_{The FunctionMap holds a mapping between the Rascal function name and a unique id. This id is based on the}

(20)

1 // Method t o c a l l a modules main f u n c t i o n .

p u b l i c O b j e c t dynRun ( S t r i n g fname , I V a l u e [ ] a r g s ) {

3 i n t n = f u n c t i o n M a p . g e t ( fname ) ;

5 F u n c t i o n f u n c = f u n c t i o n S t o r e . g e t ( n ) ;

Frame r o o t = new Frame ( f u n c . s c o p e I d , n u l l , f u n c . maxstack , f u n c ) ;

7 c f = r o o t ; 9 s t a c k = c f . s t a c k ; 11 c f . s t a c k [ 0 ] = v f . l i s t ( a r g s ) ; // p a s s t h e program argument t o c f . s t a c k [ 1 ] = v f . mapWriter ( ) . done ( ) ; 13 sp = f u n c . n l o c a l s ; 15 c f . sp = t h i s . s p ; r e t u r n dynRun ( n ) ; 17 } // G e n e r a t e d d i s p a t c h e r method 19 p u b l i c O b j e c t dynRun (i n t i n d e x ) { s w i t c h( i n d e x ) { 21 c a s e 1 : r e t u r n f u n c t i o n 1 ( ) ; 23 c a s e N : r e t u r n f u n c t i o n N ( ) ; } 25 r e t u r n FALSE ; }

Listing 2.4: Dispatch methods.

One drawback of this method is caused by the limitations of the JVM. The maximum size of a JVM method is set to 65536 bytecode instructions, which limits the number of cases in the dispatcher. Each case entry requires 10 bytecode instructions 5 of them in the tableswitch and 5 in the invocation of the method and return statement.

Overloaded Function Calling and FailReturn

The following issues have to be resolved when implementing overloaded function calling:

• Functions alternatives have to be tried as long as the invoked function executes a FAILRETURN instruction.

• The signaling of FAILRETURN and RETURN instructions should bypass the RVM stack. • For each function tried, a stack frame should be created.

The working of Overloaded functions is based on delegation: it is the task of the invoked function to signal the caller. The function signals if there was a match or no match on its arguments, this signaling is done by the execution of a RVM FAILRETURN instruction in case of a parameter mismatch. Two variants of overloaded function calling were built: (1) a fully generated JVM method and (2) a Java version with a dynamic call. The handling of the overloaded function calls in the JVM differs significantly from the handling in the RVM interpreter. In the JVM implementation the call mechanism is responsible for the handling of the returned value of the called overloaded function. If the function called executed a FAILRETURN instruction the caller trys the alternatives or as a last resort (no more alternatives) calls the constructor. In the interpreter the calling of the alternative functions and the constructor is done in the FAILRETURN instruction.

Based on design decision DD.6 a function created for each entry in the OverloadedFunctionStore. This generated function uses the algorithm used in Listing2.5. To handle the overloaded call, the set of alternative functions is unrolled in the functions main body(DD.6) and at the and the constructor is used to create the value.

(21)

p u b l i c v o i d o v e r l o a d e d F u n c t i o n H a n d l e r ( ) { on e n t r y ; s e l e c t c o n t e x t from o v e r l o a d e d F u n c t i o n S t o r e ; p r e p a r e s t a c k f r a m e ; c a l l o v e r l o a d e d f u n c t i o n a l t e r n a t i v e 1 ; t e s t r e t u r n t y p e ; i n c a s e o f r e t u r n r e t u r n r e s u l t p r e p a r e s t a c k f r a m e ; // FAILRETURN c a l l o v e r l o a d e d f u n c t i o n a l t e r n a t i v e 2 ; t e s t r e t u r n t y p e ; i n c a s e o f r e t u r n r e t u r n r e s u l t ; // l a s t a l t e r n a t i v e f a i l e d c r e a t e c o n s t r u c t o r from c o n t e x t ; r e t u r n c o n s t r u c t o r on c a l l e r s s t a c k ; }

Listing 2.5: Pseudo code OCALL.

As a result of this translation the OCALL instruction itself is then reduced to a simple invocation to the generated method.

Overloaded Function Calling an Alternative Implementation

A second implementation for the translation of the OCALL instruction is more similar to the im-plementation of the instruction in the RVM interpreter. This imim-plementation removes the need for translating each overloaded function to a JVM method. The FAILRETURN result is still handled in this instruction and the call to the overloaded alternative function is then handled by the dynRun(n) method (Listing2.6line 15).

p u b l i c v o i d jvmOCALL(i n t ofun , i n t a r i t y ) { 2 c f . sp = s p ; 4 O v e r l o a d e d F u n c t i o n o f = o v e r l o a d e d S t o r e . g e t ( o f u n ) ; O v e r l o a d e d F u n c t i o n I n s t a n c e C a l l o f u n c a l l = o f . s c o p e I n == −1 ? 6 new O v e r l o a d e d F u n c t i o n I n s t a n c e C a l l ( c f , o f . f u n c t i o n s , o f . c o n s t r u c t o r s , r o o t , n u l l, a r i t y ) : O v e r l o a d e d F u n c t i o n I n s t a n c e C a l l . c o m p u t e O v e r l o a d e d F u n c t i o n I n s t a n c e C a l l ( c f , 8 o f . f u n c t i o n s , o f . c o n s t r u c t o r s , o f . s c o p e I n , n u l l, a r i t y ) ; 10 Frame f r a m e = o f u n c a l l . nextFrame ( f u n c t i o n S t o r e ) ; w h i l e ( f r a m e != n u l l) { 12 c f = f r a m e ; s t a c k = c f . s t a c k ; 14 s p = c f . s p ; O b j e c t r s u l t = dynRun ( c f . f u n c t i o n . f u n I d ) ; 16 i f ( r s u l t . e q u a l s (NONE) ) r e t u r n ; // A l t e r n a t i v e matched . f r a m e = o f u n c a l l . nextFrame ( f u n c t i o n S t o r e ) ; 18 } Type c o n s t r u c t o r = o f u n c a l l . n e x t C o n s t r u c t o r ( c o n s t r u c t o r S t o r e ) ; 20 sp = sp − a r i t y ; s t a c k [ s p++] = v f . c o n s t r u c t o r ( c o n s t r u c t o r , o f u n c a l l . g e t C o n s t r u c t o r A r g u m e n t s ( c o n s t r u c t o r . g e t A r i t y ( ) ) ) ; 22 }

Listing 2.6: OCALL implementation

Compared to the previous implementation this version delays the lookup of function information to runtime, resulting in extra overhead in the handling of an overloaded function call at runtime. Evolution wise this implementation is easier to debug and maintain compared to the fully generated version from the OverloadedFunctionStore10_{. This second method is therefore the preferred solution}

during the development of the RVM.

(22)

Chapter 3

Validation of the RVM to JVM

translation

3.1 Introduction

The validation of the RVM to JVM translation consists of three independent parts: correctness, profiling and micro-benchmarking. The correctness of the translation is a prerequisite for the other

two validation parts. In the current RVM to JVM translation exception handling has not been

implemented. The goal of profiling is to identify hot spots for quick win optimizations. The micro-benchmarking is to compare the performance with the existing RVM interpreter and to determine if the translation is effective.

3.2 Correctness of the translation

The current system contains a large set of compiler tests. These tests were initially coded to test the RVM interpreter but can also be used to test the translated system. These tests are extensive and work by comparing the output from the Rascal AST based interpreter with the output of programs running on the RVM (interpreted or translated). The Rascal test set contains programs to test the following elements:

• Expressions: Test expressions on the Rascal basic types like maps, sets, strings, numbers, loca-tion, date and Booleans.

• Statements: Tests the Rascal statements like assignments, if, for and while. • Patterns: Tests the pattern matching features.

• String Templates: Execute code contained in string and incorporate the result in the string. • Examples: The last test contains a set of Rascal examples. This set contains a wide variety of

programs demonstrating Rascal features like visiting, overloading and string templates. The workbench integrated in Eclipse contains an interactive Rascal console. This console contains a command to start loaded unit tests. Figure3.1shows how to define a unit test in a Rascal module and the highlighting after the test has run indicating a pass (green) or fail (red underlined) of the test.

(23)

Figure 3.1: Workbench unit test result.

The tests showed a working translation with exception to the handling of exceptions.

3.3 Profiling

There are multiple methods to profile a JVM application. The most fine-grained method is to ask the JVM to do the profiling. The JVM has two built-in profilers:

1. A low-impact sampling profiler. Option -Xprof to the JVM.

2. A higher-impact instrumenting profiler. The option -Xrunhprof turns on the default invocation with no extra parameters records object allocations and high-allocation sites, which is useful for finding excess object creation. Option -Xrunhprof:cpu=times instruments all Java code in the JVM and records the actual CPU time a method spends.

Both profiling methods did not produce any useful information. The low-impact sampling profiler used a probing interval to large to produce useful information about translated RVM functions or instructions. The higher-impact instrumentation profiler slowed down the Eclipse workbench into an unworkable system.

3.4 Micro Benchmarking

Benchmarking was done using the three translation variants, and the performance was compared with the results obtained from the tests with the RVM interpreter1_{. The variants tested are: (1) naive;}

(2) partial; (3) full. In the “Naive“ translation the instructions are coded as methods calls where possible. There was only inlining if the instructions caused a return from a function or goto to a new location within a function. In the “Partial“ translation the RVM instructions for invoking the Rascal and MuRascal primitives were inlined. And in the last translation variant “Full“ all RVM instructions with bodies containing less the 16 JVM instructions were inlined.

3.4.1 Measurement setup

Based on the paper of Georges[3] the micro-benchmarks2 _{were modified to achieve that that if they}

run multiple times they produce timing results that are normally distributed. To realize this each test had to run for at least 2000 ms using the “Full“ translation variant. The programs were run 35 times each and the results from the first 5 runs were discarded. The discarding of results was to avoid the warming up effects of the JVM. This benchmark setup allows the use of the Student’s t-test to verify that the results are possible significantly different.

1_{The used interpreter was the refactored version. The large interpreter case-statement was rewritten to use a method}

for each RVM instruction.

2_{Benchmarking system, a late 2013 model Macbook pro core I7 16GB ram. Configured with Eclipse Kepler Service}

Release 2 (64 bit) with the Rascal plugin version 0.6.2.201404200832, and the JVM version 1.7.0 45 using a maximum heap size of 3GB.

(24)

3.4.2 Selected Benchmarks

To test the effectiveness of the translations benchmarks were selected from the available tests. The focus of the selected test should be: coroutine handling, function calling, primitive invocation, variable manipulation, and last but not least a program with a mix that is comparable to a real live situation. The source code of each new3 _{benchmark program is listed in appendix}_D_.

Name LOC Description Uses

Mean 37 Calculates the mean of a

list of 100 integers

List iterator, integer and real arithmetic

Variance 37 Calculates the variance of

a list of 100 integers

List iterator, integer and real arithmetic

Marriage 69 A stable marriage problem Set matching, arithmetic,

(dynamic)function calling

and coroutines.

OverLoading 26 Two sets of 6 overloaded

functions, the return val-ues are passed as argu-ments to each other, cre-ating a chain of overloaded function calls

Overloaded function calling

SendMoreMoney 22 Send-more-money-puzzle Coroutines, integer

arith-metic

Fib 8 Fibonacci(33) Function calling, with no

overloading alternatives, inte-ger arithmetic

While 8 Count down in a while

statement

Minimal usage of Rascal

primitives(2), expect maxi-mal increase of speed-up.

Sudoku 123 Solve Sudoku puzzle Maximal usage of Rascal

primitives for integer arit-metic, recursion with over-loading, coroutines, nested-functions.

Table 3.1: Selected tests with their specific usages.

3.4.3 Benchmark results

Table3.2 shows the results of the measurements, all timings are given in milliseconds and represent the average of 30 executions of each program. The speedup factors are calculated with the times of the interpreter, the highlighted cells show the highest factor for that specific test.

(25)

Name RVM Interpreter Naive Partial Full

(ms) (ms), factor (ms), factor (ms), factor

Mean 3808 1998, 1.91 1976, 1.93 2058, 1.85 Variance 2576 2012, 1.28 1958, 1.32 1983, 1.30 Fib 7980 3208, 2.49 2455, 3.25 2499, 3.19 Marriage 2312 2132, 1.08 2052, 1.13 2036, 1.14 SendMoreMoney 4033 2293, 1.76 2220, 1.82 2215, 1.82 Overloading 4440 2733, 1.62 2568, 1.73 2484, 1.79 While 15577 4185, 3.72 2160, 7.21 2433, 6.40 Sudoku 4062 2312, 1.76 2222, 1.83 2143, 1.90

Table 3.2: Execution times and speedup.

As we can see in Table3.2, the “Naive“ variant is always outperformed by the other two while the difference is only the inlining of the calls to the Rascal and MuRascal primitives. This difference in effectiveness of the “Naive“ variant was expected because the inlining removes the need for a method lookup.

More confusing is the difference between the results of the variant “Full“ and “Partial“: we should expect that the “Full“ variant outperforms the “Partial“ on all fronts but it does not. The effect of the JVMs optimizations looks better than our own inlining and optimization effort.

The difference between the speedup factor of programs Mean and Variance and the analysis on the RVM code generated for the Marriage program suggests the ratio between the RVM instructions and the calls to Rascal primitives limits the realized speedup factor.

1 module e x p e r i m e n t s : : C o m p i l e r : : Examples : : S t a t i m p o r t u t i l : : Math ; 3 i m p o r t L i s t ; 5 r e a l mean ( l i s t [i n t] s ) { i n t sum = 0 ; 7 f o r ( i n t e l <− s ) { sum = sum + e l ; 9 } r e t u r n ( sum / 1 . 0 ) / ( s i z e ( s ) / 1 . 0 ) ; 11 } r e a l v a r ( l i s t [i n t] a ) { 13 r e a l avg = mean ( a ) ; r e a l sum = 0 . 0 ; 15 f o r (i n t e l <− a ) {

sum = sum + ( ( ( e l / 1 . 0 ) − avg ) ∗ ( ( e l / 1 . 0 ) − avg ) ) ;

17 } r e t u r n sum / s i z e ( a ) ; 19 } r e a l s t d e v ( r e a l v a r ) = s q r t ( v a r ) ; 21 v a l u e main ( l i s t [ v a l u e ] a r g s ) { 23 // L i s t o f 100 i n t s l i s t [i n t] l l i s t = [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 25 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 , 27 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 1 0 ] ; r e a l avg = mean ( l l i s t ) ; 29 r e a l v a r i = 0 . 0 ; f o r ( i n t i <− [ 1 . . 1 0 0 0 0 ] ) { 31 v a r i = v a r ( l l i s t ) ; } 33 r e t u r n <avg , s t d e v ( v a r i ) , v a r i > ; }

(26)

Benchmarked with a modified version on the Variance program, line 16 was duplicated in 15 subsequent runs,4 _{resulted in the following speedup factor decline shown in Figure}_3.2_.

Figure 3.2: Speedup vs sum lines.

Additional benchmarks found the minimal speedup factor in this configuration to be 1.046. To reach this factor line 16 needed to be duplicated 550 times, at that moment the function var reached the maximum size of a RVM function5_{. Appendix}_E_{holds detailed information about this benchmark}

test.

4_{A line adds, 14 RVM instructions were from which 6 are calls to Rascal primitives.}

(27)

Chapter 4

Analysis and Discussion

4.1 Analysis

4.1.1 On translation

The functions

The translation of the RVM functions and coroutines is straightforward and the created JVM routines still have a name that can be translated back to the original Rascal module and function. One point of concern is the size of the translated methods the ‘Naive“ translation variant increases the size of a function 5 fold and the “Full“ variant increases the size 10 fold. AppendixCshows the translation from Rascal function to JVM method.

The instructions

The instructions in the translated system are more straightforward and simpler than in the interpreter each instruction has now its own method, which is not shared with other instructions. The tandem instructions OCALL and FAILRETURN now have a more straightforward implementation. In the interpreter the FAILRETURN is also responsible for calling overloaded function alternatives and constructors. In the JVM version FAILRETURN does what it says, returning a failed state to the caller signaling the OCALL instruction to try alternatives.

In case of the CALL and CALLDYN there is an added complexity. The presence of the stackful coroutines changed the behavior of the CALL instructions to be a-typical. Based on last instruction of the called function the CALL instruction stores a entry point, executes a JVM return or continues with the next RVM instruction.

The speedup

Analyzing the speedup in Table3.2we find a maximum speedup of 7.21 in program While using the “Partial“ translation variant and a minimum gain in program Variance with a speed up value of 1.04 (after modification, as descibed in Section3.4.3). If we analyze the RVM interpreter, we will notice that it is basically doing the following in a continuous loop: get instruction from Codeblock ; decode instruction; and execute instruction.1

In the translated program the instruction cycle is removed. The RVM instructions are now executed as sequences of JVM method calls and RVM jump instructions are directly handled by the JVM.

On the part of function invocation the interpreter has the advantage here, after the stack creation the instructions array and program counter are set and the function is active. The JVM version must do two JVM method calls, to invoke the function, after the stack creation.

The effect on the speedup of inlining the RVM instructions is neglectable, this is caused by the inlining capabilities of the JVM. There are however some exceptions: the inlining of the instructions for calling the RASCAL and muRASCAL primitives allows a direct call to those primitives. The

(28)

inlining of the RASCAL and muRASCAL instructions eliminates lookup for the primitive method at runtime.

The speedup of a translated RVM function comes from the following differences with the interpreter: 1. No instruction execution cycle.

2. Loading arguments to instruction methods as constants no lookup in the instruction during run-time.

3. Jumps are directly handled by the JVM.

4. Direct calls to RASCAL and MuRascal primitives.

5. The JVM can JIT compile translated RVM functions to native machine code.

4.1.2 On measuring

The measuring system gave a stable result for the speedup factor. This was achieved by creating an execution time for a test that was least 2000 ms for each individual run. Analysis of the measured values showed that the set of 30 samples with an individual test size of at least 2000 ms came possible from a normal distributed population. This was tested with a Kolmogorov-Smirnov[2] test choosing an α = .05. This result allows me to compare the two means of the tests within an acceptable error margin. The blowing up of the individual execution time for each test gave as an end result a stable speedup factor within a five percent variation.

Leaves us with the question is there more to gain? If we look at times measured we can see it is composed of:

ttotalExecutionT ime= tinternal+ tprimitiveExecutionT ime

Giving us the calculations of the speedup:

s = average(tinternalInterpreter+ tprimitiveExecutionT ime) average(tinternalJ V M+ tprimitiveExecutionT ime)

This shows that the maximum speedup of an RVM function is limited by the ratio of the internal and primitive execution time.

4.2 Discussion

Designing and building software can easily generate a lot of dissention, there are multiple solutions possible based on different knowledge and preferences of the designer/developer. The design decisions in Chapter2led to implementation or were discarded by newly acquired insights.

DD.1

The refactoring and separation of the RVM object in three separate objects created an inter-preter, which was faster than the original. The movement from switch case to individual methods together with the de-entanglement of the instructions simplified the code of the instructions con-siderably. The choice of creating one object containing all functions from all Rascal modules limits the scalability of the system. This limitation in scalability is caused by the size of the generated class file and the maximum number of methods allowed in a class file. Creating a separate class file for each Rascal module would create a scalable and pluggable system allowing demand loading of pre-compiled modules at runtime.

DD.2

(29)

led to a double implementation of a number of instructions forcing an ad-hoc naming convention to keep them apart. On the other hand it created an interleaved testing environment with the same conditions for the interpreter and the mapped system. The integration of the mapped system in the interpreter facilitated a platform, which gave access to the differential debugging. DD.3, DD.4

Although the mapping of a function or coroutine could have easily been done with a single custom stack implementation, the use of a secondary stack to communicate between functions removed the need of inlining extra instructions to store results in the upstream stack frame. The decision about when to create/delete stack frames was just common sense.

DD.5

The decision to avoid reflection or MethodHandles looks wise and is also possible in a system, with multiple Rascal modules mapped to objects with minor modifications2_{. The only drawback}

is caused by the fact that JVM methods have a maximum size, which limits the size of the dispatcher function to approximately 6550 entries.

DD.6

The creation of functions for the handling of overloaded function calling seemed at the start a good idea; changes in the stack frame creation for overloaded functions in the RVM, during this project, forced a new implementation using a dynamic call. This new implementation introduces a small overhead in calling the alternatives but is the preferred solution in a still changing system. Another advantage of this new pure Java based solution is that it allows debugging and easier adaptation to changes of the system.

DD.7, DD.8, DD.11

The mapping of the individual RVM instructions, combining them or splitting them up was not implemented. The choice to use the existing done and finalize methods made it extremely easy to generate the mapped functions and instructions but made it impossible to combine or rewrite instructions into other sequences. There were instructions inlined from the start based on profile information from the interpreter, at that time it looked as a good idea, which was totally ignoring the fact of the inlining capabilities of the JVM.

DD.9

Coding the generator in a single object allows for an easy migration to a different runtime backend like JavaScript or LLVM.

DD.10

The current system where Rascal is integrated in the Eclipse workbench is nearly impossible to profile, the gathered information is far from complete and useless in analysis. The benchmarking information is a little bit better and accurate enough to make statements like system A is faster than System B.

(30)

Chapter 5

Conclusions and Future Work

5.1 Conclusions

The speedup of a Rascal program translated to JVM depends on the instruction mix within the program. Rascal programs with relative high percentage primitive calls will benefit less than others. Further optimization on the RVM will lead to improving the turnaround time of the system but will stay limited by the ratio between primitive calls and translated RVM instructions. The effect of inlining RVM instructions are neglectable and are better left over to the JVM.

The performance improvement of the selected benchmarks does not predict the improvement of performance in realistic applications. It is my opinion that to invest in the system of translating RVM programs is wise. A performance gain of at least somewhere between 1,20 and 1,30 is to be expected. And a final reason to invest in the translator of RVM programs is that it will make it possible to create standalone Rascal programs contained in a class file.

The construction of primitives for handling all types and their operations is flexible but influences the maximal performance gain possible in RVM functions.

Redesign on how to handle basic types, closer to the types supported JVM would give access to fast JVM bytecode instructions for manipulating them.

Measuring JVM performance is known to be non-trivial. The measuring method produced different results on each test. Multiple runs of at least 2 seconds a run produced calculated speedup results that were stable and within a variation of five percent.

Regarding maintainability, the entanglement of instructions in the interpreter makes the system hard to maintain, a split-up of the instructions will improve the quality of the software and will have a positive impact (minor) on performance.

5.2 Future Work

The translation is not entirely done, to create standalone Rascal program the following work still has to be done: Implement exception handling, Serialize the functionStore, overloadedFunctionStore, typeStore, and constantStore.

After a complete translation from RVM to JVM there are some interesting topics, which are worth investigating further. Here are these topics:

• Explore an alternative runtime configuration where Rascal modules are translated to individual class files instead one file containing all Rascal modules and functions.

• Explore the possibility of moving Rascal and muRascal primitives into RVM instructions or methods called directly, and possible design optimized instructions that combine often-occurring bytecode patterns.

• Evaluate the RVM instruction set, some instructions are complex and could easily be split in multiple smaller instructions, or combine instructions into new ones.

(31)

Bibliography

[1] Tim Lindholm. Frank Yellin. Gilad Bracha. Alex Buckley. The Java Virtual Machine Specifi-cation, Java SE 7 Edition. Oracle Co., Inc., Oracle Parkway, Redwood City, California, 2nd edition, 2011.

[2] Andy Field. Discovering Statistics Using SPSS. SAGE Publications, 2005.

[3] Andy Georges, Dries Buytaert, and Lieven Eeckhout. Statistically rigorous Java performance evaluation. SIGPLAN Not., 42(10):57–76, October 2007.

[4] https://jcp.org/en/jsr/detail?id=292. Supporting dynamically typed languages on the JavaTM platform, may 2014.

[5] Anastasia Izmaylova. Paul Klint. Guarded coroutines for language implementation. unpublished., 2013.

[6] Paul Klint, Tijs van der Storm, and Jurgen Vinju. Rascal: a domain specific language for source code analysis and manipulation. In Proceedings of 9th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM’09). IEEE, 2009.

[7] John R. Rose. Bytecodes meet combinators: Invokedynamic on the JVM. In Proceedings of the Third Workshop on Virtual Machines and Intermediate Languages, VMIL ’09, pages 2:1–2:11, New York, NY, USA, 2009. ACM.

[8] D. Spinellis. Differential debugging. IEEE Software, 30(5):19–21, 2013.

[9] Sriram Srinivasan. A thread of ones own. University of Cambridge Computer Laboratory., 2006. [10] Christopher Strachey. Fundamental concepts in programming languages. Higher Order Symbol.

Comput., 13(1-2):11–49, April 2000.

(32)

(33)

Appendix A

Rascal virtual machine specification

This specification is created based on reverse engineering of the code in the RVM interpreter.

A.1 Primitive types

All the primitive types in the RVM are Java objects, the RVM itself allows all types of object to be stored and manipulated onto its stack. The set of objects it knows is rather limited and given in the following list: • Boolean • Object[] • IList • Integer • IValue • Type • Constructor • Reference • FunctionInstance • OverloadedFunctionInstance • Throw • Coroutine

Other types of objects like sets maps and relations are allowed on the RVM stack, they can be moved or copied by the RVM instractions but only the Rascal primitives can manipulate them.

A.2 The stores

The RVM runtime environment relies on four data sets called stores. Each of these stores hold specific information.

functionStore

Implemented as a Java ArrayList it hold all functions closures and nested functions in the system and is queried for the creation of stack frames and functionInstances.

overloadedStore

(34)

constructorStore

The constructor store is implemented as an ArrayList list of Type. typeStore

This class manages type declarations. It stores declarations of annotations, type aliases and data-type constructors.

A.2.1 Function

The function object holds al information about a particular function it contains information about the number of parameters and there types, return type, name, function id, scope id, stack frame size and in case of closures or nested function the scope of the function it is nested in.

f i n a l p u b l i c S t r i n g name ; p u b l i c i n t f u n I d ; f i n a l Type f t y p e ; p u b l i c i n t s c o p e I d ; p u b l i c S t r i n g f u n I n ; p u b l i c i n t s c o p e I n = −1; p u b l i c f i n a l i n t n f o r m a l s ; p u b l i c f i n a l i n t n l o c a l s ; p u b l i c f i n a l i n t maxstack ; f i n a l CodeBlock c o d e b l o c k ; p u b l i c i n t c o n t i n u a t i o n P o i n t s = 0 ; p u b l i c I V a l u e [ ] c o n s t a n t S t o r e ; p u b l i c Type [ ] t y p e C o n s t a n t S t o r e ; i n t[ ] f r o m s ; i n t[ ] t o s ; i n t[ ] t y p e s ; i n t[ ] h a n d l e r s ; b o o l e a n i s C o r o u t i n e = f a l s e; i n t[ ] r e f s ;

A.2.2 OverloadedFunction

An overloadedFunction consists of three part the set of function alternatives, the set of constructors in case there is no match

f i n a l i n t[ ] f u n c t i o n s ;

f i n a l i n t[ ] c o n s t r u c t o r s ;

f i n a l S t r i n g f u n I n ;

on the parameters and finally the identification of the scope in which the overloaded function is active (nested).

A.2.3 Constructor

A constructor is used to create values for user-defined datatypes (Algebraic Datatypes).

A.2.4 Type

This class is the abstract implementation for all types. Types are ordered in a partially ordered type hierarchy with ’value’ as the largest type and ’void’ as the smallest. Each type represents a set of values.

A.3 Instruction specification

A.3.1 Instruction format

Instructions are 32 bit wide, in which two arguments are incorporated arg1 13 bit wide, and arg2 12 bits wide.

Translating Rascal Virtual Machine programs to the Java Virtual Machine