Satisfying Coverage Criteria by Grammar Mutations and Purdom’s Sentence Generator

(1)

Satisfying Coverage Criteria by

Grammar Mutations and Purdom’s

Sentence Generator

Eenass Butrus

eenassbutrus@gmail.com

August 22, 2014, 29 pages

Supervisor: Dr. Vadim Zaytsev Host organisation: University of Amsterdam

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

Abstract

In this project, we generate programs from context free grammars written in ANTLR based format systematically, using the shortest path generation algorithm defined by Paul Purdom. The generated programs can be used for parser testing as well as comparing multiple parsers that tends to accept the same programming language; the algorithm generates sentences using each production at least once. We concluded that we can satisfy different coverage criteria using Purdom’s algorithm by applying grammatical mutations and transformations before the generation rather than extending the algorithm itself. In addition, we found out why this algorithm can not reach production coverage in some cases.

(3)

Introduction

1.1 Overview

Programming languages are used to express commands that suppose to control the behaviour of the machine. These commands are mostly expressed as a string of symbols that are analyzed and converted into a parse tree that shows relations between different parts of the string. Analyzing the string and generating the parse tree are realized using a computer program that is called “Parser”.

Parsers are implemented according to certain specifications that define a given programming lan-guage’s grammar. The accuracy of a parser, in generating the intended parse tree, depends on how the grammar is interpreted when the parser is generated, and how readable and unambiguous the specifications are. As a result, the grammar might contain ambiguous rules, unused rules and errors. It is necessary to test parsers when they are built in order to detect errors and increase one’s certainty that a particular parser actually accepts the intended language. One method to test parsers is to automatically generate test sentences, examine whether they contain mistakes and which productions are causing these mistakes. A well known algorithm to generate sentences from a context free grammar is defined by Paul Purdom [11]. His algorithm aims at rapidly generating a small set of short sentences using each production of the grammar at least once.

Consequently, as this algorithm is able to reach all productions, it would be useful to cover differ-ent parts per rule explicitly after applying differdiffer-ent mutations on the grammar. Thus our research question is:

“Can the shortest path algorithm be efficiently used to test grammarware?”

More specifically: How can sentences generation algorithm defined by Purdom [11] be used to produce programs from a context free grammar to achieve various coverage criteria?

In this study we built a tool for extracting context free grammars written in ANTLR-based de-scription. Then generate test sentences from these grammars using shortest path algorithm with a variation of coverage criteria and finally determine the efficiency this algorithm. The tool is available online1.

1.2 Related Work

In this section we introduce earlier published projects and theories that are related to different parts of our project.

Grammar extraction/recovery: Grammar recovery is a long existing field in which complex algo-rithms are developed in order to obtain grammar from documentations and specifications manuals and to express the grammar in a certain format. Comparing to the cases named in [14, §2], our case is relatively simple as we extract context free grammars with ANTLR ANTLR grammar as described in section2.3.

(5)

Grammar transformation and mutation: Lämmel [4] has generalized several grammar manipula-tions in a programmable way that can be used in grammar development as these manipulamanipula-tions are implemented as transformation operators. The semantics of these operators is given as denotations using several concepts such as focus, constraints and symbolic operands. In addition, Dean et al [2] described several techniques to design a base grammar and how to make specific changes to the grammar in order to serve a specified task using TXL language system.

Grammar mutation, as a way to program grammar changes to execute many grammar adaptation steps as one uniform step, has been introduced in [13, §3.4] and was fully represented as an operator suite in [16]. One of the most used mutation within this project is SubGrammar, which is conceptu-ally similar to program slicing presented in [12]. With this method programs can be automatically decomposed through analyzing their data flow and control flow. In context free grammar a slice is a sub tree of the grammar parse tree.

Grammar based testing : Lämmel and Schulte [5] have introduced a method for systematic test data generation from a grammar called combinatorial coverage. Test problems are modeled by different control mechanisms such as depth control, recursion control, balanced control, dependence control and construction control. Depending on the problem, one or more of these mechanisms are used to generate test data for grammar-based testing. This approach was adopted in a case study by Fisher et al [3] to generate test data from a grammar according to different coverage criteria and then use these data to compare grammars. While this method emphasizes on generating positive test cases, Zelenov and Zelenova [17] presented an approach for automated generation of positive as well as negative test sets for parsers. In their project, negative test cases are generated by modifying the productions and adding terminals that does not belong to the language.

Domain Specific Language (DSL): To represent a context free grammar in a certain data structure, we use ANTLR [9] parser generator. The developer of this tool provide a DSL grammar which we use to convert grammars written according the specifications of this tool into a Grammar data structure implemented in Java programming language. The resulted extractor is limited to ANTLR based grammars.

Grammar based metrics: When we apply mutations on grammar, it is useful to collect some metrics in order to study the effect of these mutations on the target grammar and the accepted language. In our project, we limit the metrics to coverage percentage, number of generated sentences. However, Power and Malloy [10] have applied widely known software metrics to grammar syntax such as size and complexity. Črepinšek et al [1] have developed a tool for grammar metrics that include Power and Malloy’s metrics (size and structural metrics), LR automaton-based metrics and generated language based metrics. Finally, collecting metrics about micropatterns [15] is also useful because micropttrens can serve as triggers for mutations.

1.3 Outline

This thesis is organized as follows: chapter2introduces grammar type and the main transformations and changes that are applied on them. Chapter3present shortest path algorithm that will be used to generate sentences from a grammar. Moreover, we propose the main principles that guide test data generation in chapter 4 with a full explanation and examples. Chapter 5 describes the case study of this project and how the theory given in the previous chapters is used in practice and the main results are also presented. Finally, we conclude the work in chapter6and present several suggestions for future work.

(6)

Chapter 2

Grammar Preprocessing

This chapter provides the main transformations that we apply in order to extract a context free grammar from ANTLR based format into a data structure (implemented in Java) that simplifies further operations on the grammar. These operation include normalizing (modify inner choices), filtering grammar to remove unused production rules and lastly optimize the grammar to match certain objectives and coverage criteria given in chapter4.

2.1 ANTLR grammar

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator that generates a parser from a grammar1_{. This parser is capable of building a parse tree and walking through it.}

To generate a parse tree, the grammar should be written in a particular format based on ANTLR description for parsers. Two different rule types are recognized, namely parser rules and lexer rules. Parser rules define the language syntax and lexer rules define tokens of that language [9].

2.2 Grammar data structure

We have created a data structure (implemented in Java programming language) that represents the grammar. At top level, the grammar is a list of production rules and a list of terminals. A production rule is a class that contains a rule name (the left hand side) as a nonterminal and an expression which represents the right hand side. The expression class is an abstraction for the different kinds of expressions that are well known in defining context free grammars such as sequences (sequential compositions), choices, optionals, star repetition (Kleene star) and plus repetition (transitive closure). ANTLR parser description contains several patterns that may occur on the right hand side of a production rule. First of all, a production rule may have multiple alternatives, for this kind of representation we define a Choice class which contains a list of expressions. Every alternative is added to the list as an expression. Secondly, a production rule or its alternatives may contain a sequence of expressions. For this kind of expressions we have developed a Sequence class which contains these expressions as a list. In addition, Plus and Star classes are introduced to represent the star repetition and plus repetition respectively. Moreover, to represent the empty expression, we defined the Empty class and Optional class to express the optional expression. Finally, at low level, each expression is represented using terminals and nonterminals. These two objects are defined as Terminal and Nonterminal respectively.

2.3 Grammar extraction

At this point we have implemented the infrastructure to extract the grammar into grammar data structure given in the previous section. The main operation here is building a grammar data

(7)

ture that contains clean rules. With clean rules we mean rules without semantic actions, semantic predicates, syntactic predicates, and rewriting rules. As these constructs are allowed in ANTLR de-scription but they are irrelevant for our work. In addition, lexer rules that are defined as character sets and range operators are minimized to one token. We choose always the first element of the set/range. For example, if we have this lexer rule with a character set on the right hand side: ID : [a-z], then it is rewritten to: ID : ‘a’. An example of range operator: Num: (‘0’..‘9’) will be Num: ‘0’. Moreover, optional parts and star repetition are removed form lexer rules by assuming the non occurrence case, for example: ID : [a-z] [a-z]* is written as ID : ‘a’.

We use ANTLR grammar extractor which reads an ANTLR-based grammar and builds a parse tree for this grammar using ANTLR parser generator. ANTLR parser generator generates a parser and a lexer when it is not asked to generate visitors. However, when ANTLR is requested to gen-erate visitors, it gengen-erates two Java visitor classes: <GrammarName>Visitor.java and <Grammar-Name>BaseVisitor.java. We re-implement BaseVisitor class2 _{in order to turn the grammar into the}

data structure mentioned above. At the same time, all additional information that are inapplicable for our work are removed.

However, different transformations will be applied to the grammar and having them as a list may be inefficient and hard to update. Therefore, we define a new grammar data structure which is named GrammarMap in which we store production rules in a hash table with production name (the left hand side) as the key and its right hand side as the value. In addition we distinguish the start symbol so that it is accessible directly when needed. Moreover, nonterminals can also be accessed quickly as they are hash table’s keys.

2.4 Grammar normalization

The extracted grammar mostly has the same syntax as the original grammar. That is if a production rule has an inner choice in the original grammar, the associated extracted rule has also an inner choice. For the later usage we need to take care of this kind of syntax. Therefore, every time we get an inner choice, we introduce a new nonterminal which replace the inner choice. A new production rule will be added using this nonterminal as the left hand side and the expression of inner choice as the right hand side. At the same time we take care of repetitions and optionality, for example if the inner choice was inside a plus repetition then the introduced nonterminal will be put inside a Plus expression.

As we can see in figure 2.1a, the rule a has two alternatives where the first one has again two alternatives (inner choice) and they are inside a star repetition. Figure 2.1b shows the resulted grammar after applying this step.

1 s : a c;

2 a : (b|c)* | A ; 3 b : B;

4 c : C;

(a) Original Grammar

1 s : a c; 2 a : ch* | A ; 3 ch: b|c; 4 b : B; 5 c : C; (b) Normalized Grammar

Figure 2.1: Grammar normalizing

2.5 Grammatical mutations

The normalized grammar is the input that we use to generate data from grammar according to different coverage criteria expressed in chapter4. To generate test data we use Purdom’s algorithm

2_{https://github.com/Eenass/Sentences-Generation/blob/master/Thesis/src/buildAST/}

(8)

given in section3.1. The grammar should meet several requirements per method. We introduce the main transformations that we apply to the grammar and when is each one used.

2.5.1 Grammar filtering (SubGrammar)

Start symbol is the entrance point to the grammar parse tree from which different rules can be reached; in fact, each symbole can act as a start symbol. In the context of this project we are interested in grammar with a unique start symbol and to realize that we use grammar slicing. Slicing is used in two different stages depending on the context.

• As we explained in section2.3, we consider a start symbol for the grammar. We assume always that the first rule that appears in the grammar is the start symbol. It follows from this as-sumption that, every nonterminal that does not occur in a production reachable from the start symbol is unneeded. Therefore, we filter the grammar by walking through the grammar’s parse tree and store every rule that its rule name (nonterminal) appears on the path. Afterward, we remove the remaining rules. The result of this step is the input to sentence generation algorithm. This approach is defined as EliminateTop mutation in [16].

However, another approach to get a unique start symbol would be achieved by defining a new start symbol which have all unreachable nonterminals (real start symbols) as alternatives [15].

• The same algorithm is used in another case in this project, namely, when the focus of sentences generation is on nonterminals. That is, each occurrence of a nonterminal should be replaced by all its productions. Therefore, to a achieve this we build a new grammar in which the nonterminals are unfolded and new productions are obtained using Cartesian product. Then the grammar are filtered, according to the earlier adjusted start symbol, because when the occurrence of a nonterminal is replaced by its productions and it is not used by any other production, then it becomes unreachable. (see section4.5).

2.5.2 Grammar adaptation

Several operations falls within grammar adaptations. Below we list the main changes that we apply to the grammar and when we need that.

• Handling repetitions and optional expressions: Star repetition is used to represent the occurrence of an expression zero or more times in a sequence, Plus repetition performs the occurrence of an expression one or more times while an Optional expression may be taken once or not. In one part of our work on test data generation we pay attention to these kind of expressions, namely, section 4.4. Therefore, when we determine the productions of a nonterminal, we examine two cases for each type. For Star and Optional we examine two cases: the case that they are an empty expression and when the expression is taken into account. Plus expression is taken with a sequence of length one and two. (see figure4.1)

• Adding new start symbol: the main requirement of sentences generation algorithm is that the start symbol is unique and it does not appear on the right hand side of any other rule. As we mentioned in the previous section, we assume that the first rule is the start symbol, therefore, we examine whether it appears in any production. If that is well the case, we add a new rule to the grammar which its right hand side is the start symbol.

2.6 Summary

In this chapter we demonstrated the grammar that guide the project and the tool we use to extract it. Moreover, we clarified how to deal with lexical rules that have special patterns and which parts of the grammar are the most relevant for our work. Finally, we determined the main operations that are applied to the grammar in order to reach different goals related to data generation from a context free grammar efficiently.

(9)

Chapter 3

Shortest path generation

In this chapter we provide an expansive explanation of Purdom’s algorithm [11] using the original algorithm of Purdom, its interpretation by Malloy and Power [6] and Paracha and Franek [8]. With this algorithm, we are able to generate test sets of short sentences rapidly.

In 1972 Paul Purdom has introduced an algorithm for sentences generation from a context free grammar to test parsers. The algorithm is divided into three sub algorithms and all of them are expressed in pseudocode notation . This algorithm was later reformulated by Malloy and Power who simplify it by dividing the main operations per phase into functions given in a C like format. Malloy and Power’s simplification was later used by Paracha and Franek to generate test data for MACS grammar but they have been using another approach for the third phase.

We will use a running example in order to show the values of different data structures at each phase of the algorithm. The grammar is written in ANTLR based description because we use ANTLR grammar for our case study. Our sample grammar consists of four rules, four nonterminals (s, a, b, c) and four terminals (A, B, C, +). ANTLR description does not allow multiple definitions of the same nonterminal. In place of that, ‘|’ is used to introduce multiple productions for the same nonterminal. ‘s’ is the start symbol. Figure3.1shows the grammar.

1 s : a c;

2 a : (a ’+’ b) | A ; 3 b : B;

4 c : C;

Figure 3.1: Sample grammar

3.1 Purdom’s algorithm

The main idea of Purdom’s algorithm is generating a minimal set of short sentences from a context free grammar using every production at least once. In our grammar this means using all possible productions for each rule at least once. To achieve this goal, the algorithm must be able to use each production and to choose the production that generates the shortest terminal string. Therefore, different data structures are used along the three phases to generate the intended sentences.

In the following sections we explain each phase according to [6]. In addition, we will specify our modifications to the data structures so that we preserve our grammar format as well as algorithm’s goal. Finally, for the simplicity we use the same names for the most data structures and variables.

3.1.1 Phase One: Shortest Terminal String

This phase gets the grammar as input and produces three data structures. The input grammar is a hash table of production rules with the rule name as a key and its expression as a value. This

(10)

grammar is normalized as explained in2.4and filtered as described in section2.5.1. The output data structures are SLEN, RLEN and SHORT.

• SLEN: for each nonterminal it stores the number of steps in the shortest derivation of a sentence starting with that nonterminal. In the original algorithm it stores the values of terminals using its length (as each terminal is a string) while that is always 1 in Malloy and Power interpretation. We do not store terminals’ values because that is embedded in our way of calculating; we also give each terminal the value 1.

• RLEN: it stores the length of shortest string that can be derived for each production associated with a nonterminal. The original algorithm uses the order of the rules as keys to this data structure. In our implementation that is a hash table with a nonterminal as a key and a class named “ProductionsRLEN” as the value. “ProductionsRLEN” is in turn a a hash table of productions to their RLEN value.

• SHORT: the first two data structures are used to fill this in. Each nonterminal has the number of the production rule that leads to the shortest derivation. In our implementation this is a hash map from a nonterminal to an expression that has the minimum RLEN when the nonterminal has multiple productions and the right hand side of that nonterminal when the rule has only one production.

At the beginning of the algorithm, these data structures are initialized as follows: each nonterminal is assigned the maximum integer as SLEN value, the maximum integer is also given to each production as RLEN value and SHORT is initialized with -1 in the original one and an Empty expression in ours. The rest of this phase remains nearly the same as the algorithm given in [6] except the changes in the data structures and the values stored in them. In addition, the for loops to iterate over all productions are modified to match our grammar representation and the changes explained above. Moreover, SLEN calculating mentioned in lines (23 – 28) of figure 3.4b is implemented outside the algorithm but with the same idea. The main differences are highlighted in figure3.4.

s 5 a 2 b 2 c 2 (a) slen N P RLEN s a c 5 a ( a + b ) 8 A 2 b B 2 c C 2 (b) rlen s a c a A b B c C (c) short

Figure 3.2: Phase one

Figure 3.3: Parse Tree The output of this phase will be used in the following two phases.

SLEN and RLEN will be used in the next phase while SHORT is the key input to the third phase. We forward RLEN to the third phase also so that we do not have to regenerate the productions of each nonterminal. Table3.2aillustrates the values of SLEN at the end of phase one. As we can see the shortest derivation of the start symbol is 5 which can be illustrated on figure 3.3. A parse tree with s as a root, a and c interior nodes and A and C their children respectively; obviously the size of this tree is 5. RLEN is given in table3.2b, every production has an entry in this table and the values indicate the length of the shortest terminal string that can be derived using this production. Mapping these two tables gives table 3.2c which maps each nonterminal to the production that has minimum RLEN value.

(11)

1 algorithm Shortest terminal string 2 variable SLEN: terminals&nonterminals 3 output RLEN: productions

4 output SHORT: nonterminals 5

6 void init() {

7 for ( each terminal t ) { 8 SLEN[t] = 1;

9 }

10 for (each nonterminal n) { 11 SLEN[n] = max_int;

12 SHORT[n] = -1; 13 }

14 for (each production p) { 15 RLEN[p] = max_int; 16 }

17 } 18

19 void PhaseOne() { 20 bool change = true; 21 while (change) { 22 change = false;

23 for(each production p) {

24 int sum = 1; bool too_big = false; 25 for(each element e of RHS[p]) { 26 if(SLEN[e] == max_int) { 27 too big = true; break;

28 }

29 else sum += SLEN[e];

30 }

31 if (!too big && sum < RLEN[p]){ 32 RLEN[p] = sum; 33 if (sum < SLEN[LHS[p]]){ 34 SHORT[LHS[p]] = p; 35 SLEN[LHS[p]] = sum; 36 change = true; 37 } 38 } 39 } // for 40 } // while 41 } // PhaseOne

(a) Malloy and Power interpretation

1 algorithm Shortest terminal string 2 variable SLEN : nonterminals 3 output RLEN : productions 4 output SHORT: nonterminals 5

6 void init() {

7 for (each nonterminal n) { 8 SLEN[n] = max_int;

9 SHORT[n] = Empty;

10 for (each production p of n) { 11 RLEN[p] = max_int; 12 } 13 } 14 } 15 16 void PhaseOne() { 17 bool change = true; 18 while (change) { 19 change = false;

20 for (each nonterminal n){ 21 for (each production p of n) { 22 int sum = 1; bool too_big = false; 23 for (each element e of p) {

24 if (SLEN[e] == max_int) { 25 too_big = true; break;

26 }

27 else sum += SLEN[e];

28 }

29 if(!too_big && sum < RLEN[p]){ 30 RLEN[p] = sum; 31 if (sum < SLEN[n]){ 32 SHORT[n] = p; 33 SLEN[n] = sum; 34 change = true; 35 } 36 } 37 } 38 } // for 39 } // while 40 } // PhaseOne (b) Our interpretation

Figure 3.4: The main differences in phase one

3.1.2 Phase Two: Shortest Derivation Algorithm

This phase uses the data structures SLEN and RLEN computed in the first phase and introduces a local data structure called DLEN in order to build the main output of this phase which is PREV. DLEN is defined for each nonterminal and it contains the length of the shortest derivation that uses that nonterminal. On the other hand, PREV is defined for each nonterminal except the start symbol and it contains a production rule to be used to introduce a nonterminal into the shortest derivation. This is our modification to the original algorithm which uses production number for PREV in place of a production rule. As we do not have production numbers and we need to specify which production to be used. The changes are highlighted on figure3.6.

Before starting this phase, we assign SLEN value as DLEN for the start symbol, the maximum integer for the rest of nonterminals and an empty production rule as PREV.

Table3.5aillustrates the values of DLEN at the end of this phase. DLEN value of the start symbol is equal to its SLEN which indicates the length of the shortest derivation of s. In addition, the length of the shortest derivation that uses a and c is 5 which is the start symbol. The same idea is applicable for b.

(12)

The implementation of this phase is nearly the same as [6] with some additional for loops to access SLEN, RLEN and DLEN. More details are shown in figure3.6.

s 5 a 5 b 11 c 5 (a) dlen a s : ac b a : (a + b) c s : ac (b) prev

Figure 3.5: Phase Two

3.1.3 Phase Three: Sentence Generation Algorithm

The main goal of Purdom’s algorithm is reached at this phase; that is sentences generation. We followed Malloy and Power [6] simplification when implementing this phase. The algorithm is divided into five functions that facilitate loading the shortest production, marking productions to be used and processing the stack. In addition, we applied a few modifications in multiple functions according to the original algorithm as well as changing data structures’ types. We discuss these modifications partly for data structures and functions too.

The Data Structures

Three important data structures are introduced in this phase, namely ONCE, ONST and MARK. Below we explain each one and present our changes to it.

• ONCE: for each nonterminal it presents its current state which is given as an integer value ranged from −4 to n − 1, n is the total number of productions in the grammar. The values less than 0 are given names: Null, Ready, Unsure and Finished. N ull = −1, Ready = −2 and so forth. Below, a brief explanation of these states.

1. Null: an initial value of the production number to be handled.

2. Ready: the nonterminal has been used with an appropriate production and the next time this nonterminal is called, it will be used with another production.

3. Unsure: this state indicates that this nonterminal was a PREV for some useful nonterminal on the stack.

4. Finished: it means that a nonterminal is rewritten with all possible productions and is not useful any more. When this happens and a nonterminal occurs again on the stack, it will be always rewritten by the shortest derivation calculated in the first phase.

5. Integer ≥ 0: this will be the number of the production that was rewritten at the last time a nonterminal is occurred on the stack.

As we mentioned earlier in this section, we do not use production numbers but expressions. For this reason we changed this data structure into a hash table having nonterminals as keys and an expression (production) as a value. The same terminology is used for the states Ready to Finished but we define them as terminals and we omit Null state because we use the empty expression for initialization of the production to be handled (line 3 of figure 3.8b). Moreover, in place of Integer ≥ 0 we use expressions that represent every time another alternative on the right hand side of a rule.

• ONST: this data structure keeps track of how many times each nonterminal occurs on the stack. In other words, it contains an integer per nonterminal.

• MARK: each production is assigned a Boolean value which indicates whether a production is used earlier or not. At the initial state all productions are assigned false and at the end they

(13)

1 algorithm Shortest derivation 2 input SLEN : terminals&nonterminals 3 input RLEN : productions

4 variable DLEN : nonterminals 5 output PREV : nonterminals 6

7 void init() {

8 for (each nonterminal n) { 9 DLEN[n] = max_int; 10 PREV[n] = -1; 11 } 12 } 13 14 void PhaseTwo() { 15 bool change = true; 16 while (change){ 17 change = false;

18 for (each production p) { 19 if (RLEN[p] == max_int) 20 continue; 21 if (DLEN[LHS[p]] == max_int) 22 continue; 23 if (SLEN[LHS[p]] == max_int) 24 continue;

25 sum = DLEN[LHS[p]] + RLEN[p]

26 - SLEN[LHS[p]]; 27 for(each nonterminal n on RHS[p]){ 28 if (sum < DLEN[n]) { 29 change = true; 30 DLEN[n] = sum; 31 PREV[n] = p; 32 } 33 } // for 34 } // for 35 }// while 36 }// PhaseTwo

(a) Malloy and Power interpretation

1 algorithm Shortest derivation 2 input SLEN : nonterminals 3 input RLEN : productions 4 variable DLEN : nonterminals 5 output PREV : nonterminals 6

7 void init() {

8 for ( each nonterminal n ) { 9 if(n == start Symbol){ 10 DLEN[n] = SLEN[n];

11 }

12 else{

13 DLEN[n] = max_int;

14 PREV[n] = Empty Production rule;

15 }

16 } 17 }

18 void PhaseTwo() { 19 bool change = true; 20 while ( change ){ 21 change = false; 22 for(each nonterminal n){ 23 for(each production p of n){ 24 if (RLEN[p] == max_int) 25 continue; 26 if (DLEN[n] == max_int) 27 continue; 28 if (SLEN[n] == max_int) 29 continue;

30 sum = DLEN[n] + RLEN[p]

31 - SLEN[n]; 32 for(each nonterminal n1 of p){ 33 if (sum < DLEN[n1]) { 34 change = true; 35 DLEN[n1] = sum; 36 PREV[n1] = n : p; 37 } 38 } // for 39 } // for 40 } // for 41 }// while 42 }// PhaseTwo (b) Our interpretation

Figure 3.6: The main differences in phase two

all should be true so that every production is used at least once. Again, we changed this data structure to match our grammar description and the data structures of the previous phases. So, MARK is a hash table with a nonterminal as its key and “ProductionsMark” data structure as a value. “ProductionsMark” in turn is implemented as a hash table with productions (alternatives) as keys and a Boolean as value.

The Functions

Malloy and Power [6] have simplified the original algorithm of Purdom through dividing this phase into five functions. Each function is responsible for certain operations. We followed the same idea but with small modifications to these functions and we introduce a new help function to check whether the state of a nonterminal is not a normal expression. In other words, the state could be Ready, Unsure or Finished. This is necessary in different places of the main function for this phase as we do not use integers. Following is a detailed explanation of each function.

(14)

state and 0 as ONST value for each nonterminal. Moreover, each production in MARK is marked false. Our implementation of this function is nearly the same modulo the data type and how the productions of each nonterminal are filled in.

• short: returns the shortest derivation of a nonterminal using SHORT data structure computed in the first phase and mark that production true. In addition, it checks whether ONCE of this nonterminal equals Finished, if so it change it to Ready. This was a part of Malloy and Power [6] addition to the original algorithm. They claim that this production may be useful in a derivation regardless of the algorithm ability to assign a production in the usual way.

• load_ONCE: this function is responsible for rewriting nonterminals. That is, it check whether a production is still unmarked (mark[prod] = false), if so then load it to ONCE so that it could be used when the correspond nonterminal occurs on the stack. Moreover, it marks this production (mark[prod] = true). However, we found out that Purdom’s algorithm checks the state of the production also whether it is Ready or Unsure in addition to its MARK. In short, in order to load a production, its mark should be false and its state should be Ready or Unsure. We have added this condition to our implementation. This condition is necessary to ensure that only one production per nonterminal will be loaded to ONCE and marked when the rule have multiple unmarked productions.

• process_STACK: this function is in charge of managing the stack used in this phase. It begins with decrementing ONST value of the current nonterminal being handled, then pushing all elements of the right hand side of that nonterminal on the stack. If an element is a nonterminal, then its ONST value is incremented by 1. Later, the function tries to process all terminals and nonterminals until the stack is empty.

A small modification is applied to this function before pushing elements that are encapsulated in repetitions and optional. Here we check whether the element class is Star, Plus or Optional then we retrieve their contents (Terminal or Nonterminal) and push them to the stack. We also keep track of ONST when we get a nonterminal from these kind of elements.

• containsONCEcase: Malloy and Power did not need this function because it is easy to check if ONCE value is one of these states: Ready, Unsure or Finished, or a production as they use integers. In our data structure we use expressions and we need extra checks to find out if ONCE matches one of these states. So, we compare the given expression with the states values, if it matches one of them a true is returned, otherwise false is returned.

• PhaseThree: the part where sentences generation takes place. The outer while loop iterates though data structures until all productions are used and all sentences are generated. The inner while loop is responsible for generating a single sentence. Sentence generation starts with the start symbol and ends when the stack is empty and this operation is repeated until all productions are used and all sentences are generated. The path of each iteration depends on the previous state and which productions are used/not used. The main transitions are explained below:

1. The outer loop is broken and the phase is ended when the state of the start symbol (ONCE[start]) is equal to Finished. Line5 of figure3.8aand line5of figure3.8b.

2. The inner loop checks whether the current nonterminal on the stack is equal to the start symbol and its state is equal to Finished. If that occurs then the algorithm is ready with generating sentences and the inner loop is broken. Lines (10– 13) of3.8aand lines (10–

13) of figure3.8b.

3. When ONCE of the current nonterminal to be handled (given as nt in the algorithm) is an expression, then it will be forwarded to the stack. Lines (17 – 20) of3.8aand lines (17–

20) of figure3.8b.

4. In case of a nonterminal is rewritten by all its productions, its state became Finished and it is occurred on the stack again, then its shortest derivation is used. Lines (14 – 16) of

(15)

5. If non of the these case is true then productions are installed to ONCE and a path from the start symbol is created. The path contains for each nonterminal a production that is either still unused (its MARK is false) or the shortest derivation. Lines (21–54) of3.8aand lines (21 – 59) of figure3.8b. At this part we have added an extra check if a nonterminal has an entry in PREV because our PREV data structure does not have an entry for the start symbol. This is illustrated at line39of figure3.8b.

Here also we did not apply any change to the functions except how we store the sentences. All changes to the functions are highlighted in figures3.7and 3.8.

3.2 Summary

In this chapter we have shown how Purdom’s algorithm generates sentences from a grammar. We have successfully implemented it using Java programming language by following Malloy and Power refor-mulation in addition to the original definition by Purdom, and Paracha and Franek for confirmation. In some places, Malloy and Power implementation causes problems due to some conditions mentioned by Purdom but not taken into account by the reformulating. Moreover, we have composed several modification to match our grammar data structure and format without affecting the main algorithm’s goal. We tried to show these modification in a format close to Malloy and Power’s reformulation.

(16)

1 algorithm: Sentence Generation

2 Algorithm

3 input PREV: nonterminals 4 input SHORT: nonterminals 5 variable MARK: productions 6 variable ONST: nonterminals 7 variable ONCE: nonterminals 8

9 void init() {

10 for (each nonterminal n){ 11 ONCE[n] = Ready;

12 ONST[n] = 0 13 }

14 for (each production p) 15 MARK[p] = false; 16 }

17

18 int short(int nt) { 19 int prod_no = SHORT[nt]; 20 MARK[prod_no] = true; 21 if (ONCE[nt] != Finished) 22 ONCE[nt] = Ready;

23 return prod_no; 24 }

25 void load ONCE() {

26 for (each production p) { 27 if (MARK[p] == false) {

28 ONCE[LHS[p]] = p; MARK[p] = true;

29 }

30 } 31 } 32

33 void process_STACK(int & nt, 34 int prod_no,

35 void bool & do_sentence){ 36 ONST[nt]--;

37 for(each element e of RHS[prod_no]){ 38 STACK.push( e );

39 if (e is nonterminal) ONST[e]++;

40 }

41 bool done = false; 42 while (!done) { 43 if (STACK.empty()) {

44 do_sentence = false; break;

45 }

46 else {

47 nt = STACK.top(); STACK.pop(); 48 if (is_terminal(nt)) print(nt); 49 else done = true;

50 }

51 }

52 }

(a) Malloy and Power code

1 algorithm: Sentence Generation

2 Algorithm

3 input PREV: nonterminals 4 input SHORT: nonterminals 5 input RLEN : nonterminals 6 variable MARK: productions 7 variable ONST: nonterminals 8 variable ONCE: nonterminals 9 variable nt: nonterminal

10 variable onceCases: [Ready, Unsure,

11 Finished]

12 variable doSentence: boolean 13 variable output: generated sentence 14

15 void init() {

16 for (each nonterminal n) { 17 ONCE[n] = Ready; ONST[n] = 0 18 for (each production p of n){ 19 MARK[p] = false;

20 } 21 } 22

23 Expression short() {

24 Expression prod = SHORT[nt]; 25 MARK[prod] = true; 26 if (ONCE[nt] != Finished) 27 ONCE[nt] = Ready; 28 return prod; 29 } 30

31 void load ONCE() { 32 for (each nonterminal n) 33 for (each production p of n){ 34 if (MARK[p] == false && (ONCE[n] == 35 Ready || ONCE[n] == Unsure)) { 36 ONCE[n] = p; MARK[p] = true;

37 }

38 } 39 }

40 void process STACK(Expression prod){ 41 ONST[nt]--;

42 for (each element e of prod) { 43 if(e is Star || Plus || Optional){ 44 elem = getNonterminals(e); 45 if(elem is not empty){

46 STACK.push(elem); 47 ONST[elem]++; 48 } 49 else{ 50 elem = cleanRepetition(e); 51 STACK.push(elem);} 52 }else{ 53 STACK.push(e);

54 if(e is nonterminal) ONST[e]++;

55 }

56 }

57 bool done = false; 58 while (!done) { 59 if (STACK.empty()) {

60 do_sentence = false; break;}

61 else {

62 nt = STACK.top(); STACK.pop(); 63 if (is terminal(nt)){

64 output.add(nt);

65 }

66 else done = true;}

67 }

68 }

(17)

1 void PhaseThree() { 2 bool done = false; 3 int prod_no = Null; 4 while (!done) {

5 if (ONCE[Start] == Finished) break; 6 ONST[Start] = 1;

7 nt = Start; do_sentence = true; 8 while (do_sentence) {

9 once_nt = ONCE[nt]; 10 if (nt == Start && 11 once_nt == Finished){ 12 done = true; break;

13 }

14 elsif (once_nt == Finished){ 15 prod no = short(nt); 16 } 17 elsif (once nt >= 0) { 18 prod no = once_nt; 19 ONCE[nt] = Ready; 20 } 21 else { 22 load_ONCE();

23 for(each nonterminal I){

24 if(I ! = Start && ONCE[I] >= 0){ 25 J = I; K = PREV[J]; 26 while (K >= 0) { 27 J = LHS[K]; 28 if (ONCE[J] >= 0) break; 29 else { 30 if (ONST[I] == 0) { 31 ONCE[J] = K; 32 MARK[K] = true; 33 }

34 else ONCE[J] = Unsure;

35 }

36 K = PREV[J];

37 }

38 }// if

39 } // for

40 for (each nonterminal n){ 41 if (ONCE[n] == Ready){

42 ONCE[n]=Finished;

43 }

44 }

45 if (nt==Start && ONCE[nt]< 0 46 && ONST[Start] == 0) break; 47 elsif (ONCE[nt] < 0){ 48 prod_no = short(nt); 49 } 50 elsif (ONCE[nt] >= 0) { 51 prod_no = ONCE[nt]; 52 ONCE[nt] = Ready; 53 } 54 } // else 55 process_STACK;

56 } // while (do sentence) 57 }// while (!done)

58 } // PhaseThree()

(a) Malloy and Power code

1 void PhaseThree() { 2 bool done = false; 3 Expression prod = Empty; 4 while (!done) {

5 if (ONCE[Start] == Finished) break; 6 ONST[Start] = 1;

7 nt = Start; do_sentence = true; 8 while (do_sentence) {

9 once_nt = ONCE[nt]; 10 if(nt == Start && 11 once_nt == Finished){ 12 done = true; break;

13 }

14 elsif (once_nt == Finished){ 15 prod no = short(nt); 16 } 17 elsif (!containsONCEcase(once_nt)) { 18 prod = once_nt; 19 ONCE[nt] = Ready; 20 } 21 else { 22 load_ONCE();

23 for (each nonterminal I) { 24 if (I ! = Start && 25 !containsONCEcase(ONCE[I])) { 26 J = I; 27 K = PREV[J]; 28 while (!containsONCEcase(RHS[K])) { 29 J = LHS[K]; 30 if (!containsONCEcase(ONCE[J])) 31 break; 32 else { 33 if (ONST[I] == 0) { 34 ONCE[J] = RHS[K]; 35 MARK[LHS[K]] = true; 36 }

37 else ONCE[J] = Unsure;

38 } 39 if(!hasPREV(J)) break; 40 K = PREV[J]; 41 } 42 }// if 43 } // for

44 for (each nonterminal n){ 45 if (ONCE[n] == Ready){ 46 ONCE[n]=Finished; 47 } 48 } 49 if (nt==Start && 50 containsONCEcase(ONCE[nt]) 51 && ONST[Start] == 0) break; 52 elsif (ONCE[nt] < 0){ 53 prod = short(); 54 } 55 elsif (!containsONCEcase(ONCE[nt]) { 56 prod_no = ONCE[nt]; 57 ONCE[nt] = Ready; 58 } 59 } // else 60 process_STACK;

61 } // while (do sentence) 62 }// while (!done)

63 } // PhaseThree()

(b) Our code

(18)

Chapter 4

Coverage criteria

In this chapter we provide our approach for test data sets generation according to coverage criteria described by Fisher et al [3]. These criteria differ in their goals in terms of depth of derivation trees, how to deal with repetitions such as star, plus and optional, and how to handle a nonterminal that can be rewritten into multiple productions.

4.1 Trivial Coverage (TC)

Trivial coverage is achieved when the generated test set is not empty. In practice, generate the shortest complete sentence that can be derived from the start symbol without focusing on optional parts and repetitions.

4.2 Nonterminal Coverage (NC)

In order to achieve nonterminal coverage, the set of derivation trees should exercise each nonterminal at least once. In this case we have to take care of nonterminals that occurs inside repetition, especially star and optional as plus should always be taken at least once. To accomplish this coverage we generate all sentences that can be derived starting from the start symbol using Purdom’s algorithm.

There are two main approaches to handle nonterminals that are surrounded by a repetition:

1. Grammar transformations: with this approach, grammar transformations can be applied on production rules and produce a new rules set that does not contain repetitions and optional parts. This can be achieved through going through all production rules and search for these kind of expressions and unfolding them. Then forward these grammars to Purdom’s algorithm without making any change to the algorithm as there will not be any object that this algorithm is unable to deal with it.

2. The second approach is to shift this operation to the later steps and put it in Purdom’s algorithm, specifically, at the third phase. Therefore, the algorithm can be modified in order to deal with repetitions through removing them before pushing elements to the stack or when they are popped from the stack. With removing repetitions we mean use the expression stored in them.

We have adopted the second approach and we handle repetitions before pushing them to the stack at the third phase. This ensures the use of each nonterminal even though it appears as an optional part, inside star or plus repetitions. This adjustment is highlighted in figure 3.7b, lines (42 – 51). This means that even though a nonterminal is covered, it will be used again while that might seems to be redundant. Nonetheless, omitting a covered nonterminal might break the algorithm as each nonterminal is a part of a derivation.

(19)

4.3 Production Coverage (PC)

It is commonly known that in context free grammar each nonterminal has one or more productions (rewrite rules). To achieve production coverage, each production should be used at least once. In section3.1we have provided an algorithm that generates a set of sentences for a context free grammar using each production at least once.

As we mentioned in chapter 3, grammars that guide this project are written in ANTLR format in which productions appears as one expression separated by ‘|’ token and they are represented as Choice object in our Grammar data structure. As the grammar are normalized before entering the algorithm, each rule can have one or more productions without any inner choices which can directly be processed along the three phases of Purdom’s algorithm. In addition, repetitions and optional are handled the same as the previous criteria so that each nonterminal and hence its productions are accessible.

4.4 Branch Coverage (BC)

Repetitions and choices are the focus point for this criterion and they should be explicitly examined. A star and an optional repetitions will be varied using two cases: the first one is the empty expression and the second case is when they are taken into account and a tree is derived. A plus repetition is varied so that a sequence of length 1 and length 2 are exercised. Finally, in case of choices, all of them should be examined at least once. Sentences generation is also done using Purdom’s algorithm but with some modifications on productions.

Variation of objects is applied on productions before entering the algorithm, that is for each right hand side expression we examine two cases for each occurrence of star, plus and optional and we produce two productions for each of them. The first one includes star, optional and plus parts just like how they are while the second production will replace star and optional with an empty expression and replace plus with a sequence of length 2 of the same expression. An example is shown in figure

4.1.

1 s : a* c; 2 a : b c+ | A ; 3 b : B;

4 c : C;

(a) Original productions

1 s : a c | c;

2 a : b c | b c c | A ; 3 b : B;

4 c : C;

(b) Productions for BC

Figure 4.1: Rewriting productions to achieve BC

4.5 Unfolding Coverage (UC)

The focus of this criterion is on nonterminals, that is each time when a nonterminal occurs in a derivation it should be substituted by all its productions. To generate data, Purdom’s algorithm is used but the grammar are modified to meet this condition. Grammar modification is applied as follows:

1. Unfold nonterminals: the first step to achieve unfolding coverage is by going through all pro-duction of the grammar and replace each nonterminal by all its propro-ductions. Then find the new combinations by applying Cartesian product.

2. Grammar filtering: when applying the previous step to the whole rules set, some of nonter-minals will not be used as we are using their productions directly. Such nonternonter-minals become unreachable, therefore, we filter the grammar to remove them.

As a result, the grammar are optimized for this criterion as well as for Purdom’s algorithm. For illustration we assume the grammar given in figure3.1, the resulted grammar are shown in figure4.2.

(20)

1 s : (a ’+’ b) C | A C; 2 a : (a ’+’ B) | A; 3 b : B;

Figure 4.2: Grammar ready for UC

4.6 Context dependent branch coverage (CDBC)

This criterion is a straightforward combination of unfolding and branch coverage. The sentences are generated using Purdom’s algorithm which gets a grammar that have first passed through the operations named in section4.5, so the unfolding part. Then the operations to achieve branch coverage are applied to the resulted grammar.

(21)

Chapter 5

Case study

In this chapter we introduce our case study for systematic test data generation from context free grammars. We will specify the main operations that were applied to grammar test set, how the data were generated and finally we present the results and discuss them.

5.1 Framework and tools

ANTLR parser generator is the main tool that is used in this project. This tool can be integrated into the software development toolkit Eclipse1_{. ANTLR generates parsers represented using Java}

programming language for each grammar written in ANTLR format. In addition, ANTLR distribution provides an ANTLR ANTLR grammar parser which can be used to build an abstract syntax tree for a grammar using a data structure of interests2_.

The developed framework consists of three main parts: first of all is the grammar extractor which is used to build the intended grammar parse tree according to a data structure defined for this project. The second part is the sentences generation algorithm and the related classes, and the last part is the data generator which uses the first two parts and an additional code to realize the ad hoc grammar modifications. The approximated LOC of the resulted tool is 2910.

5.2 Grammar test set

We collected a set of grammars written according to ANTLR description from ANTLR distribution, distributed under the BSD license3_{. We gathered grammars that contains pure production rules}

without rule attributes. In addition, a few of these grammars contain tokens that were unreadable in the development tool (Eclipse) due to encoding style while others contain parse errors, such grammars are not included in the test set. We did not try to solve the errors to stay faithful to the source and that is outside the scope of this project. Table5.1illustrate the grammar test set with their specifications. VAR represents the number of nonterminals, PROD is the number of productions and TERM is the number of terminals. These variables are gathered after extracting the grammar.

5.3 Test data generation

In this section we present our approach for test data generation per coverage criteria.

1_{https://www.eclipse.org/}

2_{https://github.com/antlr/grammars-v4/blob/0836e297a51769b773f24e6e949bf0e261b748ac/}

antlr4/ANTLRv4Parser.g4

3_{https://github.com/antlr/grammars-v4}

4_{The grammar was converted by Terence Parr who is unsure of its provence: it could be attributed, for different}

reasons, to Matthias Koester, Laurent Petit, Jingguo Yao and/or Terence Parr

5_{Initial IDL spec implementation in ANTLR v3 by Dong Nguyen. Migrated to ANTLR v4 by Steve Osselton.}

(22)

Grammar Author Year VAR PROD TERM Abnf Rainer Schuster 2013 23 29 23 asm6502 Tom Everett 2013 23 153 142

bnf Tom Everett 2013 30 44 24 C Sam Harwell 2013 221 410 139 Clojure Matthias Koester4 ₂₀₀₉ ₂₅ ₅₉ ₄₂

creole Tom Everett 2013 27 51 31 CSV Terence Parr 2013 6 8 5 DOT Terence Parr 2013 31 39 36 Erlang Terence Parr 2013 110 207 75 fasta Tom Everett 2013 13 29 19 gff3 Tom Everett 2013 20 35 23 ICalendar Bart Kiers 2013-2014 449 809 78 IDL Dong Nguyen5 ₂₀₁₄ ₂₄₇ ₃₆₇ ₁₅₀

IRI Bart Kiers 2013-2014 105 211 81 Java Terence Parr 2013 248 381 122 JSON Terence Parr 2013 13 25 17 jvmBasic Tom Everett 2012 222 309 123

logo Tom Everett 2013 43 72 46 ObjC Terence Parr 2014 277 413 164 PCRE Bart Kiers 2014 168 395 128 PGN Bart Kiers 2013-2014 35 40 22 redcode Tom Everett 2013 35 60 36 Smalltalk James Ladd 2013 80 110 45 Swift Terence Parr 2014 243 530 151

tnt Tom Everett 2013 18 25 18 TURTLE Alejandro Medrano 2014 52 100 62 Verilog2001 Terence Parr 2013 338 675 182

WebIDL Rainer Schuster 2013 101 256 73

Table 5.1: Grammar test set

5.3.1 Trivial, Nonterminal and Production coverage sets (TC, NC and

PC)

We integrate these three criteria together as the goal of Trivial coverage is that the generated set is not empty. In addition, applying Purdom’s algorithm gives Production coverage and by controlling nonterminals that appears on the stack we determine nonterminal coverage.

Therefor, to generate sentences for these criteria we apply the following operations:

1. Read an extracted grammar in and build the abstract syntax tree for the it. Then build Gram-marMap data structure.

2. Normalize grammar by processing inner choices as explained in section2.4.

3. As we mentioned in section 2.5.1, the grammar should have one start symbol and we always assume the first rule as the start symbol. We implemented a visitor that walks through the grammar parse tree with the adjusted start symbol as the root. The nonterminals that occur on the path are marked as accessible, then the grammar is filtered by removing rules that their rule name was not marked.

4. Filtered grammar entered sentence generation algorithm. The operation is accomplished in three successive stages as explained in section 3.1. The main output of the last stage is the generated sentences and two other data structures that demonstrate covered nonterminals and productions. To determine covered nonterminals, each nonterminal is marked when it is pushed

(23)

on the stack at the third phase. Covered productions are obtained by marking each production that is forwarded to “process_STACK” function.

5. Finally, the results of the previous step is used to collect the main metrics that we are interested in.

5.3.2 Branch Coverage sets (BC)

As we mentioned in section4.4, we focus on Kleene star, transitive closures and optional expressions which should be examined in two cases for each type. Sentences generation for this criterion has the same steps as the previous one but there is a new step added after filtering. That is, the grammar are rewritten by going through all productions and looking for repetitions and optionals, and producing new productions that contains all combinations after adding the new case. The new case is an empty expression for star and optional, and a sequence of length 2 for plus.

Further, sentence generation algorithm will not change, it will only have to process more produc-tions.

5.3.3 Unfolding Coverage sets (UC)

To achieve this coverage criterion we follow the same steps as section 5.3.1 but with a new step between grammar filtering and entring sentence generation algorithm. The grammar are modified so that for each production, each nonterminal is unfolded by replacing it with its productions. Then new productions are generated so that all combinations between different parts of each production are determined. Afterward, the grammar are filtered to remove the the rules that their rule name do not appear in any production and more are its production are used directly.

5.3.4 CDBC sets

As we mentioned in section 4.6, this criterion is a combination of branch coverage and unfolding coverage. Therefor to achieve this criterion we still apply the same steps as section5.3.1but now two steps are added between filtering and sentences generation algorithm. The first one is nonterminals unfolding and filtering (section 5.3.3) then grammar transformations to achieve branch coverage are applied (section5.3.2).

5.4 Results

By running the operations named in section 5.3.1on the test set (28 grammar), we achieved trivial coverage for all of them. The number of generated sentences was always greater than zero as shown in figure 5.1. Figure5.2 shows that nonterminal coverage varied between 56.32% to 100% while the gap in production coverage was higher with a lower bound of 17.72% as illustrated in figure5.3.

From the results we observe that every time when production coverage is 100%, nonterminal cover-age is also 100%. This indicates that the grammar are consistent and ideal for Purdom’s algorithm. On the other hand, in case of coverage less than 100% we detected production rules in the gram-mar that contain only one production with a recursion. For such rules, the first phase of Purdom’s algorithm is unable to determine the length of the shortest derivation for that nonterminal as well as its parent; when the parent has one production or multiple productions that all are parents for such nonterminals. Moreover, these nonterminals can not be introduced to a shortest derivation at the second phase and hence the last phase might not reach them. The dissatisfaction ratio depends on the depth of the nonterminal and how many productions are using it or its parent. For example asm6502, which has the lowest production coverage percentage of the whole set, contains the following rule:

a r g u m e n t l i s t : argument ( ’ , ’ a r g u m e n t l i s t ) ? ;

In other grammar, namely C, we have found rules with multiple productions and each production has a nonterminal in its derivation tree that has the same pattern as the example above. These rules are shown below:

(24)

Figure 5.1: Number of generated sentences for NC_PC

Figure 5.2: NC ratio for nonterminal coverage Figure 5.3: PC ratio for production coverage

e x p r e s s i o n : a s s i g n m e n t E x p r e s s i o n | e x p r e s s i o n ’ , ’ a s s i g n m e n t E x p r e s s i o n ; a s s i g n m e n t E x p r e s s i o n : c o n d i t i o n a l E x p r e s s i o n | u n a r y E x p r e s s i o n a s s i g n m e n t O p e r a t o r a s s i g n m e n t E x p r e s s i o n ; c o n d i t i o n a l E x p r e s s i o n : l o g i c a l O r E x p r e s s i o n ( ’ ? ’ e x p r e s s i o n ’ : ’ c o n d i t i o n a l E x p r e s s i o n ) ? ;

However, when we generated sentences for branch coverage in which we added new case for rep-etitions and optional, we achieved 100% production coverage for the whole set. The reason behind that is that the case mentioned above (recursion with one production rule and its consequences) will disappear as most of the recursions occurs as optional and star expressions. For example, the rule in the first example become:

a r g u m e n t l i s t : argument

| argument ( ’ , ’ a r g u m e n t l i s t ) ? ;

Moreover, achieving 100% production coverage for this case means also that we achieve branch cov-erage as all productions, which have both interested cases for repetitions and all choices (rule

(25)

alter-Figure 5.4: Number of generated sentences for BC

natives), are covered. In addition, the number of generated sentences for BC is mostly higher than NC_PC as illustrated in figure5.4.

Nonetheless, the number of sentences is not the main metric for BC as we detected few cases in which the number of sentences was less than NC_PC. For such cases we found out that the generated set contains very long sentences which indicates that a large number of productions is used. On the other hand, for grammars that contain few repetitions and optional terms, the number of generated sentences was the same as NC_PC but with longer sentences.

As branch coverage is achieved, we turn our attention to nonterminals by running the operations named in section 5.3.3. The results have shown a coverage percentage close to nonterminal and production coverage for most of the grammars as shown in figures 5.5and 5.6. The reason behind that is again the patterns named above (recursion). In addition, as shown in figure5.7 the number of generated sentences is significantly higher than that the first two criteria which is expected as the grammar is expanded and new productions are added.

Figure 5.5: NC ratio for unforlding coverage Figure 5.6: PC ratio for unforlding coverage

Finally, when we integrate the operations applied to achieve branch coverage and unfolding coverage, we achieved 100% production coverage for the whole set except one sample which is Swift as shown in figure5.8. When examining the grammar we have found rules with a recursion and these recursions

(26)

Figure 5.7: Number of generated sentences for UC

were not part of an optional or star expression. The number of generated sentences is mostly higher than unfolding criteria with few equalities as shown in figure5.9. Figure5.10summarizes the number

Figure 5.8: PC ratio for CDBC Figure 5.9: Number of sentences for CDBC

(27)

(28)

Chapter 6

Conclusion

In this project we have implemented a tool for extracting context free grammars written in ANTLR description using a grammar parser supported by ANTLR distribution. In addition, the tool generate test data systematically from a grammar using the shortest path algorithm defined by Purdom which generates programs from context free grammar using each production at least once. This property was the cornerstone of this project.

We have found that Purdom’s algorithm is efficient for generating programs from grammars for various coverage criteria by altering the grammar and/or using grammar mutations. Especially, when the grammar satisfies the requirements of this algorithm because we have seen few patterns in some grammars which breaks the algorithm and make it impossible to introduce them to a derivation.

The cases in which the algorithm could not reach production coverage are essential and can be used to study the grammar in order to find the effect of some patterns on the way the algorithm determines the derivation trees. We have detected one pattern so far and how its effect could be omitted.

The ability to generate programs from grammars opens the door to design different kinds of tests related to parser testing. The generated sentences can be used to test whether the parser accepts them and/or it contains errors, and the parser does not define the intended language. In addition, it is commonly known that different parsers tend to accept the same programming language. To prove or disprove that, one can generate sentences for each parser and by using differential testing [7] can examine whether each parser accept the sentences generated by another one as well as its own sentences. This method was used by Fisher et al [3] for two case studies.

(29)

Bibliography

[1] M. Črepinšek, T. Kosar, M. Mernik, J. Cervelle, R. Forax, and G. Roussel. On Automata and Language Based Grammar Metrics. Computer Science and Information Systems, 7(2), 2010.

[2] T. R. Dean, J. R. Cordy, A. J. Malton, and K. A. Schneider. Grammar Programming in TXL. In Proceedings of the Second IEEE International Conference on Source Code Analysis and Ma-nipulation (SCAM 2002), pages 93–102. IEEE, 2002.

[3] Bernd Fischer, Ralf Lämmel, and Vadim Zaytsev. Comparison of context-free grammars based on parsing generated test data. In Software Language Engineering, pages 324–343. Springer, 2012.

[4] R. Lämmel. Grammar Adaptation. In Proceedings of the International Symposium of Formal Methods Europe on Formal Methods for Increasing Software Productivity, volume 2021 of LNCS, pages 550–570. Springer-Verlag, 2001.

[5] Ralf Lämmel and Wolfram Schulte. Controllable Combinatorial Coverage in Grammar-Based Testing. In Umit Uyar, Mariusz Fecko, and Ali Duale, editors, Proceedings of the 18th IFIP TC6/WG6.1 International Conference on Testing of Communicating Systems (TestCom’06), vol-ume 3964 of LNCS, pages 19–38. Springer Verlag, 2006.

[6] Brian A Malloy and James F Power. An interpretation of purdom‘s algorithm for automatic generation of test cases. In 1st annual international conference on computer and information science, Orlando, FL, 2001.

[7] William M McKeeman. Differential testing for software. Digital Technical Journal, 10(1):101, 1998.

[8] A.M. Paracha and F. Franek. Testing grammars for top-down parsers. In Tarek Sobh, edi-tor, Innovations and Advances in Computer Sciences and Engineering, pages 451–456. Springer Netherlands, 2010.

[9] Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Languages. Pragmatic Programmers. Pragmatic Bookshelf, first edition, May 2007.

[10] J. F. Power and B. A. Malloy. A Metrics Suite for Grammar-based Software. Journal of Software Maintenance and Evolution: Research and Practice, 16:405–426, November 2004.

[11] Paul Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, 12(3):366– 375, 1972.

[12] M. Weiser. Program Slicing. In Proceedings of the Fifth International Conference on Software Engineering (ICSE), pages 439–449, San Diego, California, United States, March 1981. IEEE Press.

[13] Vadim Zaytsev. Language Evolution, Metasyntactically. Electronic Communications of the Eu-ropean Association of Software Science and Technology (EC-EASST); Bidirectional Transforma-tions, 49, 2012.

(30)

[14] Vadim Zaytsev. Notation-Parametric Grammar Recovery. In Anthony Sloane and Suzana An-dova, editors, Post-proceedings of the 12th International Workshop on Language Descriptions, Tools, and Applications (LDTA 2012). ACM Digital Library, June 2012.

[15] Vadim Zaytsev. Micropatterns in Grammars. In Martin Erwig, Richard F. Paige, and Eric Van Wyk, editors, Proceedings of the Sixth International Conference on Software Language Engineer-ing (SLE 2013), volume 8225 of LNCS, pages 117–136, Switzerland, October 2013. SprEngineer-inger International Publishing.

[16] Vadim Zaytsev. Software Language Engineering by Intentional Rewriting. Electronic Communi-cations of the European Association of Software Science and Technology (EC-EASST); Software Quality and Maintainability, 65, March 2014.

[17] Sergey Zelenov and Sophia Zelenova. Automated Generation of Positive and Negative Tests for Parsers. In Wolfgang Grieskamp and Carsten Weise, editors, Formal Approaches to Software Testing, volume 3997 of LNCS, pages 187–202. Springer Berlin / Heidelberg, 2006.

Satisfying Coverage Criteria by Grammar Mutations and Purdom’s Sentence Generator