Conclusions - Analysis and Transformation of Source Code by Parsing and Rewriting

based on the rewrite rule formation. The application operator is explicit, allowing to handle sets of results explicitly.

Theρ-calculus is typically a new rewriting formalism which can benefit from the the Meta-Environment. We have prototyped a workbench for the completeρ-calculus.

After that, we connected an existingρ-calculus interpreter. This experiment was real-ized in one day.

The JITty interpreter [137] is a part of the µCRL [18] tool set. In this tool set it is used as an execution mechanism for rewrite rules. JITty is not supported by its own formalism or a specialized environment. However, the ideas of the JITty interpreter are more generally applicable. It implements an interesting normalization strategy, the so-called just-in-time strategy. A workbench for the JITty interpreter was developed in a few hours that allowed to perform experiments with the JITty interpreter.

2.6 Conclusions

Experiments with and applications of term rewriting engines are within much closer reach using the Meta-Environment, as compared to designing and engineering a new formalism from scratch.

We have presented a generic approach for rapidly developing the three major in-gredients of a term rewriting based formalism: syntax, rewriting, and an environment.

Using the scalable technology of the Meta-Environment significantly reduces the ef-fort to develop them, by reusing components and generating others. This conclusion is based on practial experience with building environments for term rewriting languages other than ASF+SDF. We used our approach to build four environments. Two of them are actively used by their respective communities. The others serve as workbenches for new developments in term rewriting.

The Meta-Environment and its components can now support several term rewriting formalisms. A future step is to build environments for languages like Action Seman-tics [117] and TOM [127].

Apart from more environments, other future work consists of even further pa-rameterization and modularization of the Environment. Making the Meta-Environment open to different syntax definition formalisms is an example.

Part II

Parsing and disambiguation of

source code

C H A P T E R 3

Disambiguation Filters for Scannerless Generalized LR Parsers

In this chapter we present the fusion of generalized LR parsing and scan-nerless parsing. This combination supports syntax definitions in which all aspects (lexical and context-free) of the syntax of a language are defined explicitly in one formalism. Furthermore, there are no restrictions on the class of grammars, thus allowing a natural syntax tree structure. Ambi-guities that arise through the use of unrestricted grammars are handled by explicit disambiguation constructs, instead of implicit defaults that are taken by traditional scanner and parser generators. Hence, a syntax def-inition becomes a full declarative description of a language. Scannerless generalized LR parsing is a viable technique that has been applied in var-ious industrial and academic projects.¹

3.1 Introduction

Since the introduction of efficient deterministic parsing techniques, parsing is consid-ered a closed topic for research, both by computer scientists and by practitioners in compiler construction. Tools based on deterministic parsing algorithms such as LEX

& YACC [120, 92] (LALR) and JAVACC (recursive descent), are considered adequate for dealing with almost all modern (programming) languages. However, the develop-ment of more powerful parsing techniques, is prompted by domains such as reverse engineering, domain-specific languages, and languages based on user-definable mixfix syntax.

The field of reverse engineering is concerned with automatically analyzing legacy software and producing specifications, documentation, or re-implementations. This area provides numerous examples of parsing problems that can only be tackled by using powerful parsing techniques.

1I co-authored this chapter with Mark van den Brand, Jeroen Scheerder and Eelco Visser. It was published in CC 2002 [46].

Grammars of languages such as Cobol, PL1, Fortran, etc. are not naturally LALR.

Much massaging and default resolution of conflicts are needed to implement a parser for these languages in YACC. Maintenance of such massaged grammars is a pain since changing or adding a few productions can lead to new conflicts. This problem is ag-gravated when different dialects need to be supported—many vendors implement their own Cobol dialect. Since grammar formalisms are not modular this usually leads to forking of grammars. Further trouble is caused by the embedding of ‘foreign’ language fragments, e.g., assembler code, SQL, CICS, or C, which is common practice in Cobol programs. Merging of grammars for several languages leads to conflicts at the context-free grammar level and at the lexical analysis level. These are just a few examples of problems encountered with deterministic parsing techniques.

The need to tackle such problems in the area of reverse engineering has led to a revival of generalized parsing algorithms such as Earley’s algorithm, (variants of) Tomita’s algorithm (GLR) [116, 149, 138, 8, 141], and even recursive descent back-track parsing [59]. Although generalized parsing solves several problems in this area, generalized parsing alone is not enough.

In this chapter we describe the benefits and the practical applicability of scanner-less generalized LR parsing. In Section 3.2 we discuss the merits of scannerscanner-less parsing and generalized parsing and argue that their combination provides a solution for prob-lems like the ones described above. In Section 3.3 we describe how disambiguation can be separated from grammar structure, thus allowing a natural grammar structure and declarative and selective specification of disambiguation. In Section 3.4 we discuss issues in the implementation of disambiguation. In Section 3.5 practical experience with the parsing technique is discussed. In Section 3.6 we present figures on the per-formance of our implementation of a scannerless generalized parser. Related work is discussed where needed throughout the chapter. Section 3.7 contains a discussion in which we focus explicitly on the difference between backtracking and GLR parsing and the usefulness of scannerless parsing. Finally, we conclude in Section 3.8.

In document Analysis and Transformation of Source Code by Parsing and Rewriting (pagina 53-58)