Towards a Catalogue of Grammar Smells

(1)

Smells

Mats Stijlaart

Student nr. 11152990 mats.stijlaart@student.uva.nl

August 29, 2017, 99 pages

Research supervisor: dr. Vadim Zaytsev, vadim@raincodelabs.com Host organisation: University of Amsterdam (UvA)

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

Software engineers have the notion of code smells to reason about code, discuss code with colleagues or identify refactoring opportunities. In grammar engineering - the art of working with grammars and grammar-dependent software - there is no notion of grammar smells. While software engineering is a complex task, software engineers require tools to reduce mistakes or structures that hinder the evolution of software. The same evolution hinders in grammar engineering. Some deficiencies are resolved by technology that parses and interprets grammars, but this does not take away the fact that it’s human beings themselves who have the task to engineer grammars. Engineers may have to select, rewrite or make incremental changes to grammars. In this work we analysed the original definition of code smells defined by Fowler and constructed a catalogue of 14 grammars smells. We’ve tested these smells against a corpus of 499 grammars and evaluated the effect of the detected violations on the subject grammars. For 12 of these smells we found significant results that show the grammars in the corpus do contain the deficiencies for which the smell descriptions warn the engineer. For the other two smells, we did not detect valid violations within the corpus. For 10 smells we identified that the readability or maintainability of the grammars improves when the violations are resolved. With the catalogue of grammar smells we propose, we extend the field of grammar engineering and widen the vocabulary of grammar engineers. This benefits the process of grammar engineering and will result in more readable and maintainable grammars, and a more professional discipline.

(3)

(4)

1 Introduction 5 1.1 Motivation . . . 5 1.2 Research Question . . . 6 1.3 Research Method . . . 6 1.4 Related Work . . . 6 1.5 Contribution . . . 7 1.6 Outline . . . 8 2 Smells 9 3 Grammars 13 3.1 Grammar Terminology . . . 13 3.2 BGF . . . 13 3.3 Capabilities . . . 14 3.4 Definition . . . 15 3.5 Structure . . . 15

3.6 Ignored Grammar properties . . . 16

4 Catalogue 17 4.1 Smell Template . . . 17 4.2 Distant Levels . . . 17 4.3 Unsequence . . . 19 4.4 Proxy Nonterminal . . . 20 4.5 Duplication . . . 22 4.6 Foreign Names . . . 24 4.7 Improper Responsibility . . . 25 4.8 Mixed Definitions . . . 26 4.9 Legacy Structures . . . 27

4.10 Unclear Entry Point . . . 29

4.11 Dead Code . . . 30

4.12 Single List Thingy . . . 31

4.13 Explicit Ambiguity . . . 32

4.14 Scattered Nonterminal . . . 32

4.15 Massagable . . . 33

4.16 Additional Formal Definitions . . . 34

4.16.1 Production Rules for Nonterminal . . . 35

4.16.2 Index . . . 35

4.16.3 Nonterminals of Expression (nonterminals) . . . 35

4.16.4 Does Refer . . . 35

4.16.5 Referred Nonterminals (references) . . . 35

4.16.6 Tops (tops) . . . 35

4.16.7 Grammar Expressions . . . 35

4.17 Intentionally omitted properties . . . 36

4.18 Fowler Smells which were not mapped . . . 36

(5)

6 Implementation 39

7 Results 41

7.1 Distant Levels . . . 41

7.2 Unsequence . . . 43

7.3 Proxy Nonterminal . . . 45

7.3.1 Single nonterminal production rules . . . 45

7.3.2 Single Used Nonterminals . . . 46

7.3.3 Summary of Proxy Nonterminal . . . 47

7.4 Duplication . . . 47 7.4.1 Duplicate Rules . . . 47 7.4.2 Known Subexpression . . . 49 7.4.3 Summary of Duplication . . . 54 7.5 Foreign Names . . . 54 7.6 Improper Responsibility . . . 56 7.7 Mixed Definitions . . . 60 7.8 Legacy Structures . . . 62

7.9 Unclear Entry Point . . . 63

7.9.1 Multiple Top Nonterminals and/or Start Symbols . . . 63

7.9.2 No Start Symbol or Top Nonterminals . . . 65

7.10 Dead Code . . . 66

7.11 Single List Thingy . . . 67

7.12 Explicit Ambiguity . . . 67 7.13 Scattered Nonterminal . . . 68 7.14 Massagable . . . 68 8 Discussion 73 8.1 Research Question 1 . . . 73 8.2 Research Question 2 . . . 73 8.3 Research Question 3 . . . 74 8.4 Threats to Validity . . . 75 9 Conclusion 77 9.1 Future Work . . . 78 Acronyms . . . 79 Appendices 81 A Inspected Files for Distant Levels 83 B Inspected Files for Unsequence 85 C Files without tops 87 D Inspected Files for Disconnected Nonterminal Graph 89 E Unclear Entry Point 91 E.1 Full Top/Start Files . . . 91

E.2 2 Tops/Starts and Fully Connected Grammars . . . 91

F Proxy Nonterminal 93 F.1 Single Nonterminal Production Rules . . . 93

(6)

Introduction

There is no standard model to determine the quality of grammars. Zaytsev introduced a Grammar Maturity Model, which specifies to what extent a grammar could be used disregarding if the grammar actually “works” [60]. Others have recorded their findings on improving the quality of (recovered) grammars [3, 29, 47]. The applied techniques are intended to make the subject grammars complete, quantify completeness, eliminate deficiencies, or make these readable and maintainable. The authors share their findings of these techniques but do not generalise this to a reusable framework to assess quality for the grammar engineering discipline.

Determining the quality would be beneficial in the following tasks of a grammar engineer:

Selecting grammars For research or software engineering purposes, one may want to select a grammar from a set. For example, Fischer et al. selected grammars to analyse to what extent these grammars produced the same parse tree [16]. By discriminating the grammars based on grammar quality, the variety of the selected grammars may be supervised.

Additionally, if one has to select a grammar from a set of apparently equal grammars, quality may ultimately be the decisive factor.

Evolving grammars Grammars are often defined in multiple iterations [3, 47, 51]. Just like software, features and ideas are added incrementally to the final artefact. An engineer may affect the quality of the grammar by applying changes to it, for example, by introducing duplication or unnecessary complex patterns. This effect may prevail especially in large teams, where authors have little knowledge of the code that others have written. Quality analysis may stop these changes from being applied or make the authors of the grammar aware of their changes.

Recovering grammars If grammars are automatically recovered from source code, the ultimate arte-fact should be a usable, but also a maintainable grammar. Automatic quality assessment may help in this process. For example H¨oschele and Zeller are working on this kind of grammar recovery [22].

1.1 Motivation

Klint et al. have laid out an agenda for the grammarware engineering and state that “The underlying goal is to improve the quality of grammarware” [27]. Also, Klint et al. identified that much ground has to be covered in this field to increase productivity, increase quality and promote research.

Over the years there have been multiple projects that worked on measuring grammars [12, 43, 44, 60], analysing grammars [8, 16, 28, 30, 31, 59, 61–63] developing and evolving grammars [3, 22, 31, 47, 51], and refactoring grammars [3, 64, 65].

In this work, there has been little focus on the aspect of assessing the quality of grammars by a grammar engineer.

Code smell detection is a well-known assessment method to identify refactoring candidates in the source code. The concept has been mapped to many domains (Usability [2], Aspect Oriented Program-ming [35], Spreadsheets [20] and more), however, never to the domain of grammars. A set of Grammar Smells are a tool to assess grammars, which then enables refactoring which in its turn improves aspects such as maintainability and readability. Having such a tool is found to be beneficial.

Over the years there has been research with mixed results yielding the effect on the maintainability of code infected by smells. Studies show that maintenance effort is not affected [49], increases [39] or

(7)

decreases [38] on infected units of code. Aside from this, the motivation for having this new tool was greater than being certain that these Grammar Smells tend to be effective. Besides, the concept of smells is mapped to a new domain. Smells may have a different effect on the maintainability of grammars in comparison to the maintainability of code. The work in this thesis should support the capability to improve and maintain the quality of grammar and the field of grammar engineering.

1.2 Research Question

To a certain extent, the research in this thesis composes of three parts. RQ1: What characteristic properties are comprised by code smells?

RQ2: What set of Grammar Smells can be constructed by mapping known code smells to the domain of Context Free Grammars (CFGs) where we support the mapping with practical grammar engineering principles?

RQ3: To what extent do the violations of Grammar Smells yield valid results in the sense of improving grammars and preserve the characteristic properties which code smells embed?

1.3 Research Method

We will analyse existing code smell literature, and literature that maps the concept of code smells to other domains. From this, we will derive the key characteristics of Smells (Section 2) to provide the properties for RQ1.

After this, we will adopt an approach that is comparable to the approach applied in “Micro Patterns in Java code” [19]. First, a set of proposed smells is curated (section 4), which will provide an answer to RQ2 ; secondly, the set of smells is inspected via an experiment that will provide statistical information about the existence of the proposed smell in the corpus of grammars available through the Grammar Zoo [61] (RQ3 ). Finally, the smells will be evaluated and presented as a final set. The smells will be empirically evaluated and assessed based on the properties resulting from RQ1.

1.4 Related Work

“Refactoring: Improving the Design of Existing Code” [18] Fowler and Beck wrote a book on refactoring in which they added a chapter on code smells. After this, smells have been an extensively researched topic.

The smells were originally defined for the Object Oriented (OO)-paradigm, and the list of smells defined by the authors respect the modularity of OO. They do this for example by identifying classes, instances, methods, inheritance, and global and local variables. However, there is no mapping from the OO-paradigm to grammars, but we believe the motivations that led to the code smells may also apply to grammars. For example, Fowler defines smells related to the composition of data. Since grammars (or at least how we will treat them) do not contain data or state, this mapping may not apply. On the other hand, the concept of calling methods on OO, and a method being a sequence of invoked methods, we may map this as a sequence of derived nonterminals in a production rule.

The smells defined by Fowler are widely applied to different programming languages [15, 54] and even other domains [2]. In this work, we will try to relate the smells defined by Fowler to grammars (Section 4).

“Development, Assessment, and Reengineering of Language Descriptions” [47] Sellink’s and Verhoef’s work focused on “tools that aid in the development, assessment and reengineering of language descriptions” [47, p. 151]. Their work is closely related to smells, as their tools give “indications as to what is wrong”, and “hints towards correction”. This is close to the ‘local’ characteristic defined in section 2. This claim returns in the presented assessment tools. For example, list-redundant-rules prints duplicate production rules, identifying the local problems such as ambiguity. Other tools such as top and bottom nonterminals also show local consequences, but the focus in these tools is on the ‘completeness’ or ‘progress’ of the grammar engineering process, instead of humans recognising patterns that would increase maintainability if removed.

The paper does encapsulate identifications to smells in grammars. During the recovery of a grammar, Sellink and Verhoef encountered different problems that could be traced back to topics such as naming

(8)

convention, overly complex structures, duplication and inconsistent definition of production rules. This work thus shows that certain grammar structures should be avoided if possible and presents alternatives that are less error-prone.

Sellink and Verhoef applied their research on a single grammar in a highly context specific context (mapping a Backus-Naur Form (BNF) grammar to Syntax Definition Formalism (SDF)). In our work, we generalise the grammars, such that we can apply our smell detection over a larger set. In addition, we will try to generalise the issues to the concept of smells.

“A Case Study in Grammar Engineering” [3] Alves and Visser discuss the process of iterative grammar engineering. During their study, they applied techniques such as grammar metrics, unit testing and test coverage analysis.

The authors state that “Grammars for sizeable languages are not created instantaneously, but through a prolonged, resource consuming process”. This supports the argument that grammars do change over time and are non-static artefacts [27].

The authors “adopted, adapted and extended the suite of metrics” defined by Power and Malloy. With these ‘new’ metrics, the iterative development of a SDF grammar is inspected [55]. SDF is a syntax definition formalism that allows modular syntax definition, for scannerless generalised left recursive parsing [7].

The last phase in their iterative development was focused on refactoring: “changing the shape may make the description more concise, easier to understand, or it may enable subsequent correction, exten-sions, or restrictions” [3, p. 288].

We may assume that the changes applied to the grammar in this phase are in line with this goal, implying that the grammar quality increases. We will apply the same iterative approach when we assess the elimination of smell violations. However, we will deviate from the fact that Alves and Visser use metrics to identify these properties. For example, splitting the production rule for a nonterminal into two new nonterminals, will likely make the grammar lengthy, but more understandable. Since our focus is on identifying parts of the grammar that could hinder the maintainability of that grammar by an engineer, we will assess if resolving the violations will stimulate better maintainability.

Code smells mapped to other domains There have been different studies with a somewhat sim-ilar approach. For example, studies suggested smells for domains like spreadsheets [20], Python [54], Javascript [15] and usability [2], in which the original set of smells suggested by Fowler are mapped on the subject domain. Sometimes smells are added to complement the smells within the domain.

We will relate to these studies as we will try to apply the same method on the domain of smells.

“Micropatterns in Grammars” [63] Zaytsev presents various micropatterns for grammars and shows to what extent these are used in the corpus used for the research. His research does not classify the micropatterns as good or bad, or as anti-patterns. It is suggested that the presence of (different) patterns can be taken into account before applying transformations.

One way to identify a smell is with a metric-based approach, which is an actively applied method [15, 52]. In Zaytsev’s work, extensive pattern matching is applied to detect the prevalence of each pattern. This kind of approach would apply to this thesis as well for automatic detection of smells.

The setup of Zaytsev’s work is quite interesting. To curate the list of micropatterns, the prevalence of a set of micropatterns over a corpus is used. Prior to the experiment “all possible combinations of metaconstructs were considered and tried on a corpus of grammars” [63, p. 120]. This resulted in a somewhat complete set of patterns1_.

We will adopt the detection mechanism that Zaytsev applied, but will not use the prevalence prop-erties that he used as indicators. For the concept of smells, the prevalence is interesting to detect how incorporated the smells are within the grammar. Aside from this, in this work we are interested in the individual assessment of smell violations, while this shows how and if the smell affects the grammar.

1.5 Contribution

With this research we make the following contributions: 1_{It is not stated to what extent the combinations were considered.}

(9)

• We provide an extensive analysis on the characteristics of code smells. Something that we did not encounter in prior research.

• A catalogue of 14 proposed grammar smells that apply to Context Free Grammars (CFGs). These smells are accompanied with a description and a formalisation to detect violations of this smells (which could be applied for automatic detection).

• An empirical evaluation of the proposed grammar smells with examples where the smells are applicable or not.

• A Rascal code base containing the implementation of the smell formalisation2_.

1.6 Outline

Section 2 introduces the concept of smells and iterates over the identified properties which form the frame in which the catalogue is built on top. Section 3 sets the scope for grammars. Due to the high variety of grammars, capabilities and technologies, we limit the scope to Context Free Grammars (CFGs). Section 4 contains the constructed smells based on the previous chapters. The corpus that is used in the experiment is described in section 5 and the actual implementation of the experiment is described in Section 6. This is followed by the results and evaluation of the defined smells (Section 7), and the threats to validity (Section 8.4).

(10)

Smells

Smells, or originally coined by Martin Fowler as ‘code smells’, are symptoms within a system that indicate that not all is ‘right’. These smells “aren’t inherently bad on their own — they are often an indicator of a problem rather than the problem themselves” [11]. However, these smells do not result in non-functioning software but will affect its evolution. For software to be evolved or maintained, one must understand the intent of code’s author/authors to make sense of it.

Fowler and Beck “give you indications” of different smells for OO-programming [18, p. 75]. From their definition (and other sources [2, 15, 20, 26, 34, 41, 46]) we can deduct characteristic properties to provide an answer to RQ1. For this, we analysed the previously cited articles and the book written by Fowler.

Fuzzy The definitions that Fowler provides are somewhere between vague and clear. For example, “if you see the same code structure in more than one place, you can be sure that your program will be better if you find a way to unify them” [18, p. 76]. This statement is not clear about the size of the code structure: having ‘int i = 0’ on multiple locations in the code base is likely not identified as duplication. Fowler leaves the judgement of size to the engineer’s common sense. Additionally, he states that “you can be sure that your program will be better”, but without declaring the criteria how or why the program becomes better.

This fuzziness is intentional as they state that “one thing we won’t try to do here is give you precise criteria for when a refactoring is overdue” [18, p. 75]. In the context of smells, we believe just as Fowler that this fuzziness is beneficial for a human being to apply smells. As also described in items Recognisable by eye and Threshold of annoyance, smells should be recognisable and depend on a context. Conveying the message of why a smell should be eliminated is therefore more important, than the exact criteria that apply when this smell elimination has to be applied. Munro detected this as well, stating that a code smell “lacks enough information that then makes it difficult to identify automatically” [36]. This lack of information leaves room for interpretation (Munro even names his paragraphs Interpretation).

This characteristic reappears in the other papers that define smells, which either adopt and adapt the smells defined by Fowler [2, 15, 41, 54], recognise this property [20, 26], or define new smells [34]. We see that informal definitions for smells are often mapped to a formal definition for detection. In this mapping authors often take a stance of what metrics or patterns represent this smell. For example, in the work of Munro, we see a metric-based approach (‘if x is exceeded or y is less than z’) [36], while in the work of Fard and Mesbah a pattern-based approach prevails (‘when we encounter this thing, it is a smell’).

Recognisable by eye Although the smells identified by Fowler could all be detected automatically1, all are described in such a manner that a software engineer can detect them by eye. An example of how a smell measurement is not defined is: ‘if the average variables in a function are greater than the average classes per package then . . . ’. This kind of measurement is simply not practical, yet productive.

Instead, recognition of deficiencies in the code is often effectively performed by manual inspection. We see this for example in the work of M¨antyl¨a et al., where developers are asked to detect smells 1_{Although the smells are fuzzy, if these are parameterised sufficiently the violations could be detected using tools. Other}

(11)

and find refactoring opportunities [33, 34]. And as Fard and Mesbah state: “A common heuristic-based approach to code smell detection is the use of code metrics and user defined thresholds” [15]. From this, we deduct that human engineers will, and have to assess the quality of code.

This does not exclude automatic detection of smells. Finding the most extreme cases [54] (A is greater than all that are not A) and threshold based approaches [15, 20] are useful as shown in the literature.

In addition, we want to add to this that this recognition may be easier while the code is evolved, instead of inspected as as static artefact. For example, smells related to the effect of code evolution (Shotgun Surgery, Divergent Change [18, pp. 79-80]) may be easier detectable while the modification is in-flight.

Not necessarily bad None of the smells are affecting the functioning of the software2. Instead, all affect how the consumer can treat the system, or make it brittle to change in the future. This aligns with what Almeida et al. state about smells: “. . . not necessarily errors in the application’s source code, but they might indicate a weakness in the application’s implementation” [2, p. 175]. For code, this consumer is the programmer, while for usability of software this is the end-user [2]. Fowler states two smells which are great examples of this property. The first is the Long Method smell, which identifies code that does not explain its intent and is likely hard to understand. The code, however works, and developers are possibly able to make changes in it as well. This would likely make the customer happy. However, if this Long Method smell is removed, productivity may go up.

Local A violation of a smell is identified locally. Fowler identifies files, OO-classes & -methods and comments as possible locations where violations occur. This excludes packages or the whole system. This implies that the violation of a smell contains a pointer to a part of the system, this plays nicely with the Resolvable characteristic described below. This pointer shows where the system may need to change.

Threshold of annoyance In addition to the ‘fuzzy’-ness of the smell, each violation of a smell is subjective. Different developers will judge the severity of the smell differently and in what time frame it should be resolved. M¨antyl¨a and Lassenius “found that refactoring decisions are based on a wide variation of drivers, and that the drivers are greatly influenced by the type of the source code under evaluation” [33], and Gil and Maman state that “[the question]‘Is Design A better than Design B?’ can, still, only be decided by force of the argumentation, and ultimately, by the personal and subjective perspective of the judge” [19]. This may imply that the same holds for engineers to recognise and resolve smells in grammars. Although Fowler does not make a clear statement about this, in his smell descriptions he leaves this opportunity for interpretation to let the reader decide the severity of a smell. A good example of this is his claim on the Refused Bequest smell: “So we say that if the refused bequest is causing confusion and problems, follow the traditional advice. However, don’t feel you have to do it all the time. Nine times out of ten this smell is too faint to be worth cleaning” [18, p. 87].

We want to add to this, that the same threshold may affect if an engineer will even recognise a certain construct as a violation of a smell or not. Discussions may arise between engineers, due to the subjective nature of the smell.

Dirty It is likely the result of undisciplined work or lack of the proper system knowledge. The developer who committed a change did not take the time/was capable to keep the system ‘clean’. At some point, the system will be identified as dirty. Duplication is likely the result of copy-pasting or not noticing that duplication is introduced. Poor design decisions may arise from missing system knowledge. Causes for poor coding constructs could be caused by deadline pressure, strong focus on functionality, or inexperienced developers [46].

Resolvable Violations of smells should be resolvable. If they are not, maybe due to a language limita-tion, then they should not be detected as a violation. The effort required to resolve a violation is unclear.

The different literature on smells states how a smell can be resolved [2]. For example, with refac-toring actions [18] or reimplementation actions [2].

2_{With exception of “Towards a Catalog of Usability Smells” [2]. By definition of smells in this domain, these will focus}

(12)

These seven properties will thrive as input for the definition of smells for grammars and when assessing their effectiveness during the evaluation. The Fuzzy, Dirty and Not necessarily bad characteristics will mainly serve as input for the definition. The other four properties will be used in the evaluation of the violations.

(13)

(14)

Grammars

A grammar defines the structural rules that compose a language. These exist in different forms to specify languages [24, 40, 50, 57, 59], but in this thesis we restrict to BNF and extended Backus-Naur Form (EBNF) notations for Context Free Grammars (CFGs).

3.1 Grammar Terminology

The different grammar-related terminology is stated below.

Terminal A terminal is the most elementary symbol of a language defined by a symbol [1], i.e. breaking it down further will make it lose its meaning. An example of a terminal would be if in a programming language. Just the i and f would make no sense in that context.

Preterminal A preterminal is a symbol that resolves to terminal symbols in the derivation tree of a parse. These are often represented by the use of keywords specific to the grammar technology and regular expressions to define the concepts of non-constant lexical values. For example, to specify lowercase identifiers in the syntax of a language one can use a regex in the form of [a − z]+. Nonterminal A nonterminal composes nonterminals and terminals. Simply said, it is a “syntactic

variable” [1].

Production Rule A production rule describes how a nonterminal derives from a sequence of terminals and nonterminals1_.

Metasymbol EBNF grammars include constructs that increase the expressiveness of BNF [58, 59]. The allowed constructs deviate per EBNF implementation, but common constructs are optionality, repetition (0 or more and 1 or more occurrences) and empty syntax (epsilon/ε) [63].

Expression An expression is a non-empty sequence of factors, where each factor could be a terminal, nonterminal or metasymbol with its arguments (which are also expressions).

3.2 BGF

As shown by Zaytsev, and a long time ago by Wirth, there is a lot of diversity in grammar notations [58, 59]. Grammar Zoo provides over 600 grammars in a single format named BNF-like Grammar Format (BGF) [30]. Using the BGF for this work has two main benefits:

1. It is a central public repository that contains a large amount of test data. It is claimed that “Grammar Zoo is the largest of its kind” [63, p. 120].

2. BGF is defined in the Extensible Markup Language (XML) format, resulting in the fact that the grammars are abstracted from the original format in such a way that these do not contain distractions such as the original layout, actions2, or deviating notations from EBNF dialects. 1_{Aho, Sethi, and Ullman deviate production rules respectively into left-hand side and right-hand side. Although this is}

adopted in this work, it should be noted that some grammar technology switches these sides such as the SDF [6, 55]

2_{ANTLR [42] allows grammar engineers to define custom actions when a nonterminal is successfully derived.}

(15)

In listing 3.1 we present a snippet from a BGF grammar in the XML format. Listing 3.2 represents the same snippet in a BNF plain-text format.

<?xml version="1.0" encoding="UTF-8"?>

<xns1:grammar xmlns:xns1="http://planet-sl.org/bgf"> <xns1:production>

<nonterminal>identifier</nonterminal> <xns1:expression>

<xns1:expression>

<nonterminal>identifier_start</nonterminal> </xns1:expression> <xns1:expression> <star> <xns1:expression> <choice> <xns1:expression>

<nonterminal>identifier_start</nonterminal> </xns1:expression>

<xns1:expression>

<nonterminal>identifier_extend</nonterminal> </xns1:expression> </choice> </xns1:expression> </star> </xns1:expression> </sequence> </xns1:expression> </xns1:production> . . . </xns1:grammar>

Listing 3.1: Raw XML snippet from a BGF file.

identifier ::=

identifier_start (identifier_start | identifier_extend)*

Listing 3.2: BGF from listing 3.1 formatted in BNF. In section 5 we describe in more depth how we used the Grammar Zoo in this work.

3.3 Capabilities

Different grammars can be written for different technology stacks that provide a variety of features: for example, YACC does not have a repetition metasymbol, which forces grammar engineers to make exten-sive use of recurexten-sive definitions [25]. In other languages such as Rascal, there are powerful metasymbols that allow repetition with separation tokens [28]. These metasymbols may be useful to define a syntax for comma-separated lists.

From this, we know that grammars contain constructs that are only present due to the limitation of the technology. Terms like ‘deyaccification’ are introduced for the process to use modern constructs when replacing YACC with newer technologies [31]. Due to these varying capabilities, we divide the basic BNF constructs from the extended constructs.

The identified basic metasymbols originate from the definition of the International Algebraic Language (IAL) proposed by Backus [4]. These are:

Sequence Derive a sequence of expressions;

Choice Derive one expression of a list of expressions; Epsilon Succeed without consuming any input.

Analysing the complement of metasymbols that is used in the Grammar Zoo results in a list of extended constructs3_:

Optional Match an expression if present, otherwise continue;

Star Zero or more occurrences of an expression (the Kleene star, or reflexive transitive closure); Plus One or more occurrences of an expression (the Kleene cross, or transitive closure);

(16)

Separator List Star Match zero or more occurrences of an expression which is separated by another expression;

Separator List Plus Match one or more occurrences of an expression which is separated by another expression.

The naming convention for these metasymbols are adopted from the XML tags used in BGF and which also return in “Micropatterns in Grammars” [63]. The concepts that these metasymbols express are widely known.

It is unclear which metasymbols were available for each input grammars available in the Grammar Zoo [61]. During this research, we assume the following statement: Suppose C is the set of BNF metasymbols, E is the set of extended metasymbols, and g is the input grammar. Let P (x) be a predicate that holds if and only if metasymbol x is used in g. Let E0 be {e ∈ E | P (e)}. Then we will assume that C ∪ E0 is the set of metasymbols applicable to g. This directly implies that the ‘available metasymbols in grammar A is a subset of grammar B’-relation is a partial order.

3.4 Definition

We adopt a common idea that is used to define a grammar which consist of a grammar being a tuple of four elements (nonterminals, terminals, production rules and starting symbol) [12, 37, 43, 44, 54]. Most of these definitions slightly differ from one another, and we will do the same for our scope and context.

Important issues in our context are:

• Some grammar technologies allow multiple or no start symbols. For example, XML Schema Def-inition (XSD) allows you to define multiple top-level elements, which could all function as start symbol. This issue is also encountered and resolved by Zaytsev [64], and he states that “in practical grammarware engineering, grammars are commonly allowed to have multiple starting symbols” [63, p. 122].

• We treat the grammar as top to bottom (text-)file that will be displayed on a screen and edited by a grammar engineer. For this, we thus want to preserve the order in which production rules are defined.

Due to these points, we have set up our definition as follows: A grammar is a four-tuple of (N, T, S, P ), where: • N is a set of nonterminals.

• T is a set of terminals and preterminals.

• S is a set of start symbols for which S ⊂ N . It is valid to have 0, 1 or more start symbols,

• P is a list of production rules, which are pairs of an element of N and an expression (combination of nonterminals, terminals and metasymbols) as described in Section 3.1. The list is required to preserve the order of the grammar’s production rules.

3.5 Structure

The production rules of a grammar form a directed cyclic graph (due to the recursion and mutual recursion of using nonterminals in expressions). For example, when an expression for a production rule for nonterminal A refers to B it is said that (A, B) is in the successor relation of nonterminals.

When it is possible to derive a nonterminal B from a production rule for A, and vice versa, it is said that A and B are in the same grammatical level. This implies the following: let R be the transitive closure of the successor relation and let L be the set of grammatical levels, then if (A, B) and (B, A) are both members of R, then ∃x ∈ L({A, B} ∈ x). Also, it holds that all nonterminals in the grammar are part of a grammatical level: ∀n ∈ N (∃l ∈ L(n ∈ l)).

Grammatical levels can relate to one another. Let L be the grammatical levels, S be the successor relation of nonterminals, and let K be a relation between grammatical levels, then let for every two distinct elements x and y in L be an element (x, y) in K if and only if ∃n ∈ x, m ∈ y( (n, m) ∈ S). From this we can also deduct that K is asymmetric, and therefore it represents a directed acyclic graph. This

(17)

Figure 3.1: Relation between nonterminals, and the grammatical levels in which these nonterminals reside.

structure is used by Power and Malloy to identify the tree-ness of the grammar [43] and we depicted it in figure 3.1.

We say that a grammar is fully connected if the reflexive transitive closure of K ∪ K−1 is equal to the cartesian product of L: (K ∪ K−1)∗ = L2.

A grammar can contain nonterminals that are defined but not referred in the grammar, or are referenced but not defined. We call these respectively top and bottom nonterminals [47]. Having one or less top nonterminals automatically implies that the grammar is fully connected, while the inverse does not hold: it is possible to have multiple tops and have a fully connected grammar regarding grammar levels. Sellink and Verhoef claim that multiple top nonterminals and more than one bottom nonterminal can indicate that the grammar is incomplete [47, p. 156].

3.6 Ignored Grammar properties

Modularity It is known that some grammar engineering technologies support modular definition for grammars [6, 28]. Also, we may also state that language documentation is modular if the grammar is specified over different chapters or sections. During the experiment, we will ignore the origin of grammars and thus ignore the modularity. The BGF does not make a distinction between modules of the source grammar. While the experiment data is evaluated, we will take this property in consideration for smells that relate to the structure of the grammar.

Labels and Marks BGF introduces the concepts of Labels and Marks. These are constructs that allow to encapsulate a subexpression of a production rule’s right-hand. These can then be used as pointers in a grammar, which may be useful for tasks such as refactoring grammars. We detected that only half of the grammars included this concept. Besides, the labels and marks produce noise in the sense of applying pattern-matching on the grammar. Due to these points, we decided to ignore this property in the experiment to increase the success of pattern-matching and having a more consistent data set.

(18)

Catalogue

In this section, we define a set of smells based on a format described in subsection 4.1. In the formal definitions of each smell we use recurring patterns/functions, we have collected these in Section 4.16.

During the construction of this catalogue, we included smells which can sometimes be in violation with one another. As defined in the Smells section (2), smells have a Threshold of annoyance. The balance between the thresholds of two conflicting smells needs to be managed by the engineer.

For example, in this catalogue the Distant Levels and the Unsequence smells could conflict with each other. Although we acknowledge this, recognising these balancing smells is not part of this work.

4.1 Smell Template

The format that will be applied to index the smells is:

Name A descriptive name for the smell.

Description A description of what the smell includes, how to detect the smell and why it is smelly. In this description we apply the same informal writing style as applied by Fowler in his book. This style may stand out from the other writing style in this chapter. Formalisation

and Refactor Strategies

The formalisation identifies the parameters of the smell, to which grammar constructs it applies and how a violation of the smell is detected. These definitions will set the boundaries of how the smells are implemented in the experiment.

The refactoring strategies describe the path of action that eliminates the particular violation of the smell.

Motivation The motivation outlines subject matter that led to the definition of the smells. This includes the relation between different literature, translation from existing smells and practical examples.

Table 4.1: Template for defining smells in the catalogue

The aggregation of ‘Name’, ‘Description’ and ‘Refactor Strategies’ are comparable to the form in which Fowler and others describe smells [2, 15, 18, 20]1. The formalisation is taken into account to formulate how the violations will be detected in the experiment.

4.2 Distant Levels

Description

Nonterminals that use other nonterminals should be located close to one another. The longer the distance between the nonterminal definition and the used nonterminals in another nonterminal, the more the reader of the grammar will have to switch context.

(19)

When you notice that you have to scroll a lot through a grammar to understand what is happening (which terminals get consumed and what the actual syntax is), this means there is something smelly about how the grammar is set up.

Moving the production rules that cause the scrolling closer to one another, may result in an easier to understand grammar and in the end a more coherent structure. Which production rule you move is up to you, but you can deduct it a bit from the production rule’s neighbours. If A refers to B and A does not really relate to its neighbours, move it closer to B. Otherwise, move B close to A.

Formalisation

Initially, we defined this smell in terms of distance between two nonterminals. Later in the description, we coin the term ‘neighbours’. We first explore the reference distance and note why it does not work in some cases. Then we move to a definition that takes the cohesion of closely related nonterminals into account.

The reference distance for each p in P is as following:

Let N0 be the set of all nonterminals that are referred in the expression of p. Then we define that the reference distance of p is equal to:

X

n∈N0

max({index (p)∆index (q) | q ∈ Pn}) (4.1)

The result is that we take the furthest definition for n into account from origin p2_.

We define that total reference distance for the grammar is the sum of the reference distance for each production rule.

We could state that this smell is present in the grammar if there is a permutation Q of P , such that the total reference distance of the grammar arranged as Q is less than that of the grammar arranged as P . This may result in an unwanted grammar structure. For example, the production rules in Listing 4.2 is a permutation of the rules in Listing 4.1. Although the permutation has a lower reference distance (4 in comparison to 6), the grammar may feel counter-intuitive or even scrambled. For example by moving the top nonterminal a to the middle of the grammar.

a ::= b d b ::= "b" c c ::= "c" d ::= "d" e e ::= "e"

Listing 4.1: Production rules arranged in a top down manner with a reference distance of 6 (4 + 1 + 0 + 1 + 0). c ::= "c" b ::= "b" c a ::= b d d ::= "d" e e ::= "e"

Listing 4.2: Production rules arranged with an optimal reference distance of 4 (0 + 1 + 2 + 1 + 0). Due to this reason, we will not strive for the lowest reference distance, but just for a more optimal one. We do this by identifying nonterminals that are likely not in the correct position in the grammar. In the description, we coined the term ‘neighbours’. We define appropriate neighbours as nonterminals that are in the same grammatical level or derivable from the grammatical level, i.e. nonterminals that belong to the same branch in the graph that represents the relations between grammatical levels.

Suppose we have two nonterminals n and m in the same grammatical level L. Then let P0 be the set of production rules which are defined between the locations of the production rules for Pn and Pm. Let

a member (x, y) of P0 be a violation if the nonterminal x can not be derived by n or vice versa3:

(n, x) 6∈ references(g)+ (4.2)

If the relation between x and n or m is missing, this may imply that there is a weak coupling between the elements and rearranging these would result in n and m moving closer to each other.

2_{An alternative would be the closest or average distance to the definitions of n.}

(20)

Figure 4.1: An example tree of grammatical levels.

If there are more violations among the production rules for grammatical level L than there are members in L, this may imply that the members in L are not arranged properly. In this case, the members of L are identified as violations. Otherwise, the production rules between the members are identified as violations.

When we strictly look at the relations between the grammatical levels, it means that nonterminals can only be defined in other grammatical levels if that level is an ancestors or descendants of the level in which the nonterminal resides. Thus in figure 4.1 this means that nonterminals from grammatical level A and B could be mixed through each other in terms of the production rule order, but B and C could not.

Listing 4.3 presents a violation of this kind. Nonterminal b has nothing to do with a or c, an improvement would be to remove b from its current position and move it closer to where it is referenced.

a ::= "a" c b ::= "b" c ::= "c" a | ε

Listing 4.3: Nonterminal b is mixed in between nonterminals a and c.

When a violation is moved closer to (or even beside) where it is referenced, the violation will auto-matically be resolved.

Motivation

The distance between the location where a nonterminal is referenced and where it is defined is introduced to one of the following causes:

• The author of the grammar organised the nonterminals as they are for no particular reason. • The distance was introduced due to different refactoring steps. For example, the target syntax

changed, or nonterminals were added in later versions of the grammar by appending them to the end.

• The long distance is due to the required structure of the grammar, i.e. if the production rule was reorganised in the grammar, other production rules would have a higher distance.

The first two options are unintentional, while the last one is not resolvable. This unintentional structure results in spatial distance between two nonterminals. Spatial distance will result in a higher cognitive load for the reader [13].

Also, things that belong together should be put together. This is one of the motivations that Fowler describes in the Divergent Change smell [18, p. 79]. Grammatical levels identify parts of a grammar that relate to one another. By identifying parts that have no cohesion with these levels and removing these, the Divergent Change in a grammar may be reduced.

4.3 Unsequence

Description

The order of the production rules in the grammar should be set up consistently, in such a way that referred nonterminals in production rules refer either up or down in the grammar4_{. If these are not in}

the correct sequence, we detect the Unsequence smell.

4_{Up and down refer to the fact that a nonterminal is located more to the front or back of the list relation of production}

(21)

You can detect this smell when you notice that the nonterminals are placed in a counter-intuitive order. This could either be a violation of the smell or part of a grammatical level. In the latter, you better let it be as it is, but in the former, you may want to rearrange the production rules.

Formalisation

Let R be the references relations between nonterminals, such that R is equal to relations(g). Then let d be the down references count, such that:

|{(a, b) ∈ R | ∃x ∈ Pa∃y ∈ Pb(index(x) < index(y)}| (4.3)

And let u be the up references count such that:

|{(a, b) ∈ R | ∃x ∈ Pa∃y ∈ Pb(index(x) > index(y)}| (4.4)

We will state that a grammar is down referencing when d > u and up referencing when u > d. If u = d the grammar will be even referencing. Note that production rules can be both up and down referencing for a single nonterminal reference due to the presence of the Scattered Nonterminal smell (Section 4.14)

In addition to this we define referencing ratio, which is a value between 0 and 1 that specifies if the referencing in the grammar is even referencing (0) or full up or down referencing (1,0):

δu d u+d

To create consistency in the referencing, the goal should be to have the referencing ratio as close to 1 as possible.

Violations of this smell are all the production rules that are referred to from other production rules in other grammatical levels in the counter direction. For example, let the grammar be down referencing, and let production rule a be referred to from production rule b, and let a and b be not in the same grammar level, then a is a violation if a is located above b.

To formalise this, let R be the transitive closure of references(g)5. Let f be a function that either is the identity if the grammar is down referencing, or a negation if the grammar is up referencing. Then the set of violations is:

{(x, y) ∈ P |∃(z, a) ∈ P (f (index((x, y))) < f (index((z, a))) ∧ (z, x) ∈ R ∧ (x, z) 6∈ R)} (4.5)

Motivation

The reader of a grammar wants to navigate through a grammar in a natural manner, either expecting that the grammar will break up in smaller and smaller derivations, or builds up from small blocks into bigger. Having a consistent flow in the grammar will also support the expectations of the reader. When a reader is ‘debugging’ a grammar, the same expectation management will help. Besides, having all the things that relate to one another on the same place in the grammar, the natural flow also provides focus when an engineer edits the grammar.

Although there is no clear relation, this smell was inspired by Switch Statements smell [18, p. 82]. This smell indicates that behaviour is put in the wrong part of the code base. I.e. the responsibility is in the wrong place. Due to this, we designed this smell that indicates whether parts of a grammar are given the wrong responsibility. In this case, we identify responsibility as defining a part of the syntax in a section of the grammar where it is not intended.

4.4 Proxy Nonterminal

Description

Nonterminals can be used to make abstractions over a part of the syntax of the grammar. When abstraction is used too extensively, too much (unneeded) indirection may be achieved. This abstraction for the sake of abstracting results in a lot of jumps through the grammar to understand it. Abstractions that do not really serve a purpose could be inlined for the sake of readability and reduction of the grammar size. We identified two forms of this:

5 _{This representation can be used to check whether a and b are in the same grammatical level, if and only if (a, b) ∈}

(22)

Single nonterminal production rules (chain rules) In listing 4.4 references to

defining character literal could be replaced by charlit, and both defining character literal and char-acter literal could be removed. These nonterminals may be introduced for future syntax, but this may never occur.

defining_character_literal ::= character_literal

character_literal ::= charlit

Listing 4.4: Example where defining character literal is detected as proxy nonterminal.

Single usage abstraction If the arrow nonterminal is used once (Listing 4.4) it may be better to just inline it in the production rule for the nonterminal that references it.

lambda ::=

"(" "\" identifier arrow expression ")"

arrow ::= "->"

Whether this makes sense, depends on the fact if the grammar becomes clearer or not. In the previous listing, the expression for arrow is defined as a single terminal symbol. If it were a more complex or bigger expression, the nonterminal name would clarify the grammar.

It could be that it is reasonable to introduce these middle man abstractions for future extension points; then the presence of this smell is not an immediate error. A high prevalence of violation in the grammar may show if this is the case or not, or even give insight into the style defining grammars by the engineer(s) of this grammar.

Of course, determining if this single used abstraction should be inlined or not depends on how big or complex it is. Maybe the extraction of the expression is reasonable for readability of and the ease to understand the grammar. The grammar engineer should make the considerations whether an abstraction should be inlined, he or she is most capable of deciding if it is beneficial or not.

Formalisation

Single nonterminal production rules All nonterminals with a single production rule, where the expression is just a nonterminal, are violations of this smell. The violating production rules are identified as:

{Pn|n ∈ N | |Pn| = 1 ∧ ∀(x, y) ∈ Pn(y ∈ N )} (4.6)

Let V be the violations, then resolving a violation (x, y) in V would be to remove the production rule for x and replace all references of x with a reference to y.

Single usage abstraction Nonterminals with a single production rule, for which the expression is not a nonterminal, should be inlined if it is only used once. The violating nonterminals are identified as:

{ Pn |n ∈ N | |Pn| = 1 ∧

∃(x, y) ∈ P (doesRefer (y, n) ∧ ∀(z, v) ∈ P (doesRefer (v , n) ⇒ (x, y) = (z, v)))} (4.7)

Motivation

This smell is influenced by the Message Chains and Middle Man smells defined by Fowler [18, pp. 84-85]. For example, in the earlier code example when a reader sees a reference to arrow, he or she does not know what kind of arrow is defined. This arrow could have many forms: − >, −− >, => or even < −. As Fowler describes “After a while it’s time to cut out the middle man and talk to the object that really knows what’s going on”. We translated this into “remove the middle nonterminal and use the (non)terminal that defines what is going on”.

As mentioned in the description, this does not always apply. The constructs may be in place as a future extension point6_{, which does not imply an error but a smell that requires future work. But this}

may also be a indication that the Speculative Generality smell coined by Fowler [18, p. 83] is present. 6_{The presence of this smell may then indicate that the grammar is still subject of change.}

(23)

In the definition of this smell, the thought of “I think we need the ability to do this kind of thing someday” [18, p. 83] is addressed, as well as the worthiness of this thought. We believe this also relates the presence of the future extension points that we addressed.

4.5 Duplication

Description

Things that are the same should not be defined twice. Duplication will result in extra and possibly error-prone changes if the duplicated sections of a grammar have to be changed. If things are the same, these should be merged.

Duplication could take many shapes, like whole production rules that are copied, expressions or metasymbols that are duplicated, and production rules that are defined differently but result in the same language (x = y | z and x = z | y). A fictive example of this are the modifiers for OO methods and classes, which could be defined as:

class ::= classModifier className . . . classModifier ::= "public" | "private" | ε

method ::= methodModifier methodName . . . methodModifier ::= "public" | "private" | ε

In this example, we detect duplication between the classModifier and methodModifier nonterminals. If this is actually the case, is up to the grammar engineer. Maybe there is the intention to add another modifier to the classModifier rule, which makes the example case a false positive (currently it seems the same, but with the future prospect it is not).

Eliminating the duplications with new nonterminals will reduce duplication and may result in a cleaner grammar, will likely make the grammar maintainable, and might show other deficiencies in the changed nonterminals.

It is always the question if by removing the duplication you are not just over-abstracting. If this is the case, just let the code be as it is. For example:

if-statement ::= "if" condition-and-block while-statement ::= "while" condition-and-block condition-and-block ::= "(" condition ")" block

Even though duplication is removed, this may just obfuscate the grammar due to the extra level of indirection via the condition-and-block nonterminal.

Formalisation

We formalise that an expression and subexpression is one of the BGF constructs. The difference between an expression and subexpression is that the latter is a part of the former, and that the former is the ‘right-hand side’ of a production rule. A partial expression is a proper sub list of either a BGF choice or a sequence.

We identified different categories of duplication and ordered these from ‘rough’ to ‘subtle’, and for each we define different violation criteria. In each category we ignored duplications that contain preterminals7_.

With exception of our last category, we only identified exact clones (“type 1” [45]). Due to the absence of functions in our definition of grammars, we could only identify the permuted expressions as non-exact clone category. Our identified categories are:

0. Duplicate expression for nonterminal Production rules that define the same expression for the same nonterminal. In our list, this is the roughest form of duplication, which can simply be resolved by removing all but one of the violating production rules. The simplest example is demonstrated in listing 4.5.

a ::= "a" a ::= "a"

Listing 4.5: Duplicate expression for nonterminal a.

7_{In the BGF these are generalised as string, boolean and integer. It could have been that a string in one context would}

only match uppercase characters and only lowercase characters in the other. It is thus likely that duplications containing preterminals will be false positives and thus we decided to ignore these.

(24)

1. Duplicate production rule All production rules with the same expression are duplication classes in this category.

Two simple options to refactor this are either, to create a new nonterminal with a production rule stating the duplicate expression, and rewrite both original production rules to reference that new nonterminal, or to eliminate one of the original two nonterminals. The former is applicable when the two original nonterminals are a specialisation of the same abstract syntax, the latter is applicable when there is a literal duplication of a grammatical concept (listing 4.6).

if ::= "if" if_keyword ::= "if"

Listing 4.6: Duplicate production rule for the same grammatical concept.

2. Known Subexpression We identify subexpressions that are also defined as expressions of produc-tion rules as known subexpressions. Each violaproduc-tion can be replaced by replacing the subexpression with a reference to the nonterminal that defines that subexpression. Alternatively, a new nontermi-nal can be created that defines a production rule with as the right-hand side the subject expression, and the two original locations where the expression occurred need to be replaced as a reference to the new nonterminal.

block ::= "{" stmt* "}"

if_stmt ::= "if" condition "{" stmt* "}"

Listing 4.7: Snippet where the expression for the block nonterminal is a known subexpression. 3. Common Subexpressions A subexpression that occurs multiple times is detected as a violation.

A duplication class in this category is a set of all the duplicate subexpressions.

To resolve this, a new nonterminal can be introduced with as right-hand side the common subex-pression. The locations where the violation occurs should be replaced with the new nonterminal. For example, in the listing below, the two subexpressions that define “{00stmt ∗ “}00 are common subexpressions:

if_stmt ::= "if" condition "{" stmt* "}" "else" "{" stmt* "}" while_stmt ::= "while" condition "{" stmt* "}"

Listing 4.8: Two production rules share the same common subexpression “{00stmt ∗ “}00. 4. Permuted Expressions Expressions and subexpressions that are duplications of one another where

we ignore the order of the options in Choice structures, are members of the same duplication class in this category. In this category, the expressions for nonterminals a and b are duplicates of each other as demonstrated in listing 4.9.

a ::= c | d b ::= d | c

Listing 4.9: The order of the choice expression for nonterminals a and b are permutations of each other. If the order of these choices does matter then maybe the design of the syntax should be revised. Having both d \ c and c \ d within the same grammar is even more confusing for everyone8_.

In the experiment and evaluation of this smell we only looked into Duplicate Production Rules and Known Subexpressions to limit the scope of this work. We selected the coarsest categories, and ignored Duplicate expression for nonterminal due to its triviality.

Violations for Duplicate Production Rules Production rules with the same expression are marked as violations:

∀(x, y) (z, a) ∈ P (index ((x, y)) 6= index ((z, a)) ∧ a = y ∧

a 6= Nonterminal (b) ⇒ (x, y) ∈ V ∧ (z, a) ∈ V )

(4.8)

For the sake of unresolvable duplicate rules that are identified as Proxy Nonterminals, we ignore produc-tion rules that have just a single nonterminal as an expression.

(25)

Violations for Known Subexpressions All subexpressions that are not single nonterminals and occur as expression of a production rule are duplications:

∀xRy ∈ P (∃s ∈ y∃ zRa ∈ P (s = a ∧ s 6= Nonterminal (b) ⇒ s ∈ V )) (4.9)

Motivation

This smell is derived from Fowler’s Duplicate Code smell, which he identifies as “number one in the stink parade”[18, p. 76]. Indeed, it has been the focus of many studies [45].

In common sense, we ‘know’ that duplication is a bad thing in code bases. As shown by Li et al., in the sense of security vulnerabilities, duplication may be a threat [32]. On the other hand, other studies show that duplicated code is modified less than non-duplicated code, and thus the impact on maintenance may be less [23].

In this study, we will not look into the effects of the duplication in grammars, first of all, because it is a different domain than code and thus differs to the study of Hotta et al.[23], and secondly, becausze duplication is a study on its own. We will evaluate duplication and its elimination on an empirical level, and identify merits and drawbacks when these prevail.

4.6 Foreign Names

Description

We can say a lot about naming conventions, but all in all, people will agree that these need to be consistent. Grammars allow you to define syntactic variables and give these meaningful names. For example, ‘else-block’, ‘ifStatement’, ‘PackageHeader’, ‘LITERAL’, ‘Switch block’, etc. Casing and usages of certain characters in names convey a message. If a name ‘jumps out’ out, we call it a Foreign Name. For example, an identifier written in full caps and with underscores is typically a static class variable in Java, while camel-cased identifiers are typically local variables. The same holds for grammars. Some technologies differentiate terminals and nonterminals by writing the former in uppercase and the latter in lowercase.

Consistency in the grammar will support the expectations of the engineer, but inconsistent naming might also indicate bugs. For example:

foo-bar ::= "a" foo-bar ::= "b" foo_bar ::= "c"

It may not become apparent at first glance, but the last production rule for the nonterminal foo-bar has a typo. It is highly likely that this inconsistency is a bug.

Formalisation

Suppose there is a set of naming styles X, for which each member x(n) yields when n is conform to the naming style x. These will be adopted from “Micropatterns in Grammars” [63].

Let R be a relation from members of N to X, and let the following hold:

∀n ∈ N ∀x ∈ X(x(n) ⇔ (n, x) ∈ R) (4.10)

If there is a naming style for which all n ∈ N there is a relation from n to that naming style x, then this is the naming style for the grammar. It could be that there are multiple naming styles for which this holds, then the ‘strongest’ style holds.

With ‘strongest’ we imply that the set of elements for naming style a is smaller than the set of elements for naming style b. Let M be the set of all possible names, then it holds that:

∀m ∈ M (a(m) → b(m)) ∧ ∃m ∈ M (b(m) ∧ ¬a(m)) (4.11) If this applies, a is stronger than b.

If there is a common naming style for all preterminal/terminal declarations, then all elements of N that define these preterminals/terminals are removed from the previous statement. This results in the fact that there can be two naming styles for the language: one for the set of terminals and preterminals (the lexical tokens of the grammar) and one for the remaining nonterminals. If a naming style cannot be determined, then the grammar is in violation of the naming convention smell.

(26)

Motivation

Different types of naming could easily mean that two different authors wrote the grammar. Alternatively, it could mean that someone made a typo. In the first, it might be good to set aside one another’s differences and commit to a coherent naming style in the grammar. In the second, the problem is obvious. Sellink and Verhoef show that these typos are common in grammar engineering [47, p. 156].

Zaytsev looked for the different naming patterns through the same corpus that is used for this project [63], and shows that there is a difference in the naming styles and that some styles are more prevalent. It is likely that some of the styles will be more prevalent due to technical limitations or community standards.

It is also worth mentioning that some techniques mix naming styles to separate concepts. For ex-ample, by having a different naming style for nonterminals that define a single terminal and the other nonterminals.

4.7 Improper Responsibility

Description

Production rules should only define what they are responsible for. Sometimes a nonterminal is added in front or to the back of a sequence to add or modify a new syntax structure. For example, in listing 4.10 this may have happened with the terminal or nonterminal d in the production rules for y and z.

x ::= "a" (y | z) y ::= d b z ::= d c b ::= "b" c ::= "c" d ::= "d"

Listing 4.10: A snippet containing improper responsibility in the production rule for nonterminal x. This is a form of duplication that may or may not be bad. As a grammar engineer you should decide if d should be defined in x or not, which may result in the following grammar:

x ::= a d (yNoD | zNoD) yNoD ::= b

zNoD ::= c . . .

Whether or not to do this depends on the syntax that the grammar represents and the syntactic meaning of d. One consideration to keep in mind is: if d changes in production rule y, does production rule z change as well? When refactoring this, note that you have to duplicate y and z and rename them. It may be that the original syntax defined by y and z are used in other production rules. If y and z are only used once, this may be an extra indicator that the common terms can be factored out. In this work, we will limit to the detection of improper responsibility at the front of the expression (as in reading from left to right).

Formalisation

Let there be a production rule p for which the right-hand side is expression e and the left-hand side nonterminal is n. If e denotes a choice between options xs. Let f be a unary function that recursively inlines all expressions of referred nonterminals for an expression (except an expression for nonterminal n). The result for inlining the expressions for (y | z) from listing 4.10 would be the following expression: (“d” “b” |“d” “c”). Then let xs0 be a set of expressions for which all expressions are inlined: {f (x)|x ∈ xs}. When all members of xs0have a common non-empty prefix expression9, this implies that every option in e derives from the same subexpression. If this occurs, we say that p a violation of this smell. If e denotes a sequence of expressions ys, then if a subexpression y in ys denotes a choice, and the properties of the preceding paragraph hold, p is in violation of this smell.

Motivation

This smell is derived from multiple perspectives. The first perspective is ‘Shotgun Surgery’ defined by Fowler [18, p. 80]. There is a possibility that both y and z (example in the description) have to change

(27)

when the specification changes. This is a form of the smell that Fowler describes and as Fowler states, “when the changes are all over the place, they are hard to find, and it is easy to miss an important change”[18, p. 80]. In the example, the production rules to which the changes are made are located close to one another, but if these are divided throughout a bigger grammar, modifying the grammar will become harder. It is up to the grammar engineer to decide whether the change rate of y and z are alike, and thus wheather or not move the grammars closer and lift common subexpressions.

Secondly, having this smell in the grammar may affect the performance of the generated parser. Depending on the grammar technology, the parser needs to parse nonterminal d multiple times for the alternatives specified in a.

We also relate this smell as a form of Inappropriate Intimacy [18, p. 85]. The derived nonterminals from the violating nonterminals, share a common thing, which couples them all.

Lastly, it may become a ‘cover up’ for an ambiguity issue in the grammar. We see this happen for the reference type in one of the grammars in the corpus where nonterminal type-name is a common subexpression and two choices in reference-type only define this terminal10.

reference-type ::= class-type | interface-type | array-type | . . . interface-type ::= type-name delegate-type ::= type-name . . .

4.8 Mixed Definitions

Description

Nonterminals should be described consistently. You should choose to either specify multiple production rules or use a single production rule with a choice. Neither is necessarily better than the other, but if you use them mixed through each other, then things might get blurry really fast. For example, say you have the following rules:

a ::= b | c a ::= c | d

It seems apparent that these two production rules can be merged into a ::= b|c|d, otherwise, the grammar will be spuriously ambiguous. However, there was a reason that these two got defined sep-arately: maybe the language documentation defined these rules twice in different sections, and it was actually a bug in there. Once these rules are defined, and the language documentation is long and gone burned in the corner of the office, the knowledge of ‘can these rules be merged or not’ may already be lost.

Formalisation

We state that a nonterminal is defined horizontal when it is defined as a production rule of which the expression of the production rule is a choice expression. A nonterminal is defined vertical when there are multiple production rules for a nonterminal.

Let the language support both horizontal and vertical definitions of nonterminals. Let H(x) be the horizontal property of an expression, then let V be the set of violations of this smell such that:

{n ∈ N, |Pn| > 1 ∧ ∃(m, e) ∈ PnH(e) } (4.12)

With this knowledge, a nonterminal can be defined both horizontal and vertical. We identify these nonterminals as Zig Zag. If a nonterminal has neither the horizontal nor the vertical property, we state that the definition style is undecided.

Grammars can contain nonterminals with different styles. For example, a grammar can contain nonterminals which are defined in an undecided, horizontal and zig zag manner. This may imply that the zig zag nonterminals (violations) should be rewritten in horizontal style since the vertical style is missing in the grammar.