Analysing and manipulating CSS using the M3 model

(1)

Analysing and manipulating CSS

using the M

3

model

Nico de Groot

nico@nicasso.nl July 19, 2016, 66 pages

Supervisor: Vadim Zaytsev

Host organisation: University of Amsterdam,http://www.uva.nl

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

2 Cascading Style Sheets 10 2.1 Selectors . . . 10 2.1.1 Combinators . . . 11 2.2 Declarations. . . 11 2.2.1 Properties . . . 11 2.3 Rules. . . 12 2.3.1 At-rules . . . 12 2.4 Inheritance . . . 13 2.5 Cascading . . . 13 2.5.1 Importance . . . 13 2.5.2 Specificity . . . 14 2.5.3 Source order . . . 14 2.5.4 Challenging . . . 14 3 M3 _model ₁₆ 3.1 Abstract Syntax Tree . . . 16

3.2 Relational layer . . . 17

3.3 Rascal . . . 18

4 CSS Parser 19 4.1 CSS Support . . . 20

4.2 Vendor specific changes . . . 20

4.3 Remaining changes . . . 22

5 Implementation 24 5.1 Abstract syntax tree . . . 24

(3)

5.2 Relational layer . . . 27

5.2.1 Logical code locations . . . 28

5.2.2 Documentation . . . 29 5.3 Example . . . 30 6 Validation 34 6.1 Approach . . . 34 6.1.1 Sample set . . . 34 6.2 Basic analyses. . . 35 6.2.1 Volume metrics . . . 35 6.3 Specific CSS metrics . . . 36

6.3.1 Complexity Metrics for Cascading Style Sheets . . . 38

6.3.2 Defect prediction for Cascading Style Sheets . . . 39

6.4 Analysing coding conventions . . . 42

6.4.1 CSS analysis tools . . . 42

6.4.2 Coding conventions applied on sample set . . . 44

6.4.3 Re-factoring CSS . . . 45

6.4.4 Detecting code clones in CSS . . . 47

7 Conclusions 52 7.1 Threats to validity . . . 53 7.1.1 Internal . . . 53 7.1.2 External. . . 53 7.2 Future work . . . 53 Appendices 54 A Abstract Syntax Tree 55 B Sample set 57 B.1 Ignored websites . . . 57

B.2 Actual sample set. . . 57

C Clone detection implementation 59

(4)

Abstract

Cascading Style Sheets (CSS) is a language that adds presentation semantics to web documents. Even though CSS has a relatively simple syntax, lots of imperfections are introduced during the development of style sheets, ranging from small syntactic flaws to duplicate code. In this effort we introduce an M3 _{front-end for CSS, a simple and extensible model for capturing facts about}

source code, allowing developers to analyse and transform CSS without excessive development. During the development of the M3_{front-end for CSS, the proclaimed flexibility of the M}3 _model

has also been confirmed since it had only been successfully implemented for the Java and PHP programming languages. The unique characteristics of CSS, such as its cascading abilities, and its dependency on HTML, did not result in any major complications for the M3 _model.

Vali-dating the M3 _{front-end for CSS has been done by implementing a wide range of analysis and}

manipulation related tasks and conducting them on real-world CSS. From this we can conclude that the M3_{model is suitable to be used for a lots of different analysis and manipulation related}

tasks, as well that it successfully supports CSS level 3. Finally by analysing the implementations of the conducted analysis, we support our claim stating that the M3_{model allows developers to}

(5)

Chapter 1

Introduction

Cascading Style Sheets (CSS) is a language that adds presentation semantics to hyper text documents and it has been around since 1997 [Wal97]. Although most often applied to hyper text documents written in languages such as Hyper Text Markup Language (HTML) and Extensible Markup Language (XML), CSS can also be applied to other XML based documents such as Scalable Vector Graphics (SVG). CSS is a highly popular language as approximately 95% of all websites use CSS1_{, and it is ranked as the 6th most popular language on Github in 2015}2_{, rising}

from the 10th place in 2013.

CSS has a relatively simple syntax, but this does not mean that no mistakes are made during the development of style sheets. In fact, Mazinanian et al. demonstrated that there is on average a 60% duplication in style declarations when analysing 38 real-world web application [MTM14], and Gharachorlu analysed a total of 500 websites to find out that 499 of them contained at least one type of code smells [Gha14]. To help solve this problem, tools have been developed to give the developers feedback on the quality of their CSS. The World Wide Web Consortium (W3C) who are responsible for evolving and maintaining the CSS specifications besides other web standards [W3Ca] have also recognized this problem, and created the W3C CSS Validation Service [W3Cd] “to help Web designers and Web developers check Cascading Style Sheets (CSS)” [tCV]. The W3C CSS Validator Service specifically checks if the CSS is correctly implemented. To give some examples; it check if the style sheet contains no parse errors, and if only valid properties and values are used.

Most third-party CSS analysing tools are built with specific goals in mind, for example, refac-toring to decrease the file size (with or without the use of preprocessors such as SASS3_{or LESS}4₎

[Pol], [Pra], checking if the coding conventions are correctly applied [Gon], or checking if there are any errors or typos present [W3Cd]. The implementations of these tools differ as much as their specific goals, ranging from IDE plugins [Gan], to browser plugins [Gen], to websites [NCZ], and from Java [AK], to Python [Gon], to Node.js [Wei]. One thing all these tools do have in common, is that they all first have to extract relevant data from the CSS prior to conducting their analysis.

This is where the M3 _{model comes in, the M}3 _{model is a simple and extensible model for}

capturing facts about source code for future analysis [BHK+₁₅_{]. An M}3 _{front-end for CSS}

would allow developers to produce metrics and perform analysis on their CSS without excessive

1_{http://w3techs.com/technologies/details/ce-css/all/all} 2_{https://github.com/blog/2047-language-trends-on-github} 3_{http://sass-lang.com}

(6)

development, since the M3 _{model would have all the facts about the source code stored in a}

predictive way. The M3 _{model can be used to analyse or refactor individual style sheets, a set}

of style sheets used for a web application, or even style sheets from multiple web applications. This can be helpful when wanting to compare the quality of CSS over different web applications. Another application for the M3 _{model could be to analyse the evolution of style sheets over a}

period of time.

To be able to conduct these kind of analysis on CSS using the M3_{model, a specific CSS}

front-end for the M3 _{model has to be developed first. However, CSS is a Domain Specific Language}

(DSL), and not a programming language such as Java and PHP, and these are (at the time writing) the only languages supported by the M3 _{model. The implementation of the CSS}

front-end for the M3 _{model will therefore test the claim of the authors of the M}3 _{model stating that}

the M3 _{model “is not only restricted to handling programming languages”, and that it should}

be able to “model other kinds of formal languages like grammars, schema languages, or even pictorial languages.” [BHK+₁₅_].

1.1 Problem context

The problem studied in this work regards the quality of CSS in real-world web applications, and how it can be improved by presenting a tool that would allow developers to conduct analysis and transformations on CSS more efficient. This tool is the M3 _{front-end for CSS, which is}

implemented in the Rascal meta-programming language and runs in the Eclipse IDE. So far the M3 _{model successfully supports Java and PHP, CSS however has very different characteristics}

than those languages, making it uncertain if the M3 _{model is able to successfully support CSS}

as well. The following list shows the specific characteristics of CSS. • Domain Specific Language

Java and PHP are imperative programming languages. CSS on the other hand, is a Domain Specific Declarative Language, as you only have to specify what has to be done rather than how.

• Intention

Java and PHP are used to express functions in order to solve some kind of problem, whereas CSS only expresses the presentation of HTML or XML documents.

• Dependent

CSS is completely dependent on HTML or XML in order to perform its duty, as its only use is to style those documents. On its own, CSS will not have any added value whatsoever. • Cascading

In Java and PHP, duplicate packages, classes, methods, and variables are not allowed, they all have to have a unique path in some kind of way, such as placing the methods in different classes for example. In CSS however, artefacts do not have to be unique, for example a certain rule set can be duplicated 10 times in the same style sheet, and in CSS there are no packages, or classes to structure these 10 rule sets and give them all unique paths. However, CSS uses its cascading characteristic to order these duplicates and calculate which ones should be used (more on this later, in § 2.5).

All these different characteristics CSS has, make it uncertain if the M3_{model is flexible enough}

to successfully support CSS too, as the only two successfully implemented languages (Java and PHP) are very different, not sharing any of these characteristics.

(7)

1.2 Research questions

This study will answer the following research questions:

RQ1: Is the M3_{model able to support CSS without having to alter the M}3 _{model’s core?} RQ2: How does the M3 _{model for CSS differ from the other models for Turing complete}

lan-guages?

RQ3: Can the M3 model for CSS be used to conduct simple analysis such as volume metrics? RQ3.1: Can the M3model for CSS be used to measure CSS specific metrics?

RQ3.2: Can the M3model for CSS be used to validate coding conventions? RQ3.3: Can the M3_{model for CSS be used to refactor CSS?}

RQ3.4: Can the M3_{model for CSS be used to detect code clones?}

RQ4: Does the M3 _{model for CSS allow developers to be more efficient than other already}

existing tools?

Note that “efficient” in RQ4 means the lesser amount of lines of code required to implement analysis and manipulation related tasks in relation to other third-party CSS analysis tools.

1.3 Research methods

RQ1and RQ2 will be answered by conducting a confirmatory case study, meaning those ques-tions will be confirmed by developing a prototype of the M3 _{front-end for CSS. The}

implemen-tation of the final prototype will then be assessed, arguing its correctness, and answering the questions.

RQ3, its sub questions, and RQ4 will be answered performing an empirical case study. The M3 _{front-end for CSS will be validated by performing a range of analysis and refactoring with}

it on 50 real-world web application, which are a selection of the most popular websites on the internet. Some of these analysis, the detection of code smells to be specific, will have their implementations compared to other CSS analysis tools in order to answer RQ4.

1.4 Related work

This section presents all the related work which has had an effect on our work.

1.4.1 Metrics for CSS

Defect prediction for Cascading Style Sheets

Bi¸cer et al. [BD16] have introduced defect prediction to CSS. They have proposed a metric set of 14 different metrics, with which they are able to predict defects in style sheets. One of their proposed metrics, the specificity, is used in this project to validate the M3 _{model with. The}

(8)

CSS Code Quality: A Metric for Abstractness

Keller et al. [KN10] have presented a new metric for CSS, the abstractness factor. A higher abstractness factor would indicate in a better maintainability and re-usability for both the style sheets and the HTML documents. They have used this metric for analysing both CSS authored by humans, and CSS generated by machines. Having analysed over 100.000 HTML and their related style sheets, they prove that CSS authored by humans has a higher abstractness factor. The abstractness factor metric would have been used to validate the M3 _{model with, would it}

not require any additional data from the Document Object Model (DOM). Complexity Metrics for Cascading Style Sheets

Adewumi et al. [AMIO12] introduced 6 CSS specific metrics that can be used to measure the complexity of style sheets. 4 out of 6 metrics have been implemented using the M3 _{model, in}

order to validate the M3_model.

1.4.2 Tools for analysing/transforming CSS

On the Analysis of Cascading Style Sheets

Geneves et al. proposed a novel approach to analyse CSS by using tree logic. Using their ap-proach, they have created a tool that is capable of statically detecting a wide range of errors and the coverage of styling information [GLQ12].

Automated Refactoring for Size Reduction of CSS Style Sheets

Bosch et al. [BGL14] present a prototype of a static CSS semantical analyser and optimiser. This tool is capable of automatically detecting and removing redundant declarations and rules, and it uses tree logic to do so.

Reasoning with Style

Bosch et al. [BGL15] present another prototype, this time with automated refactoring techniques for the removal of redundant and inaccessible declarations, and rules without affecting the layout of document on which the style sheet is applied. They show that size reduction can be obtained while preserving the readability of the CSS and improving its maintainability. After testing their tool on 20 websites, the average size reduction they achieved was 7.75%.

Reverse Engineering a CSS Coding Conventions Catalogue

Goncharenko et al. [GZ16] created a catalogue of coding conventions that has been implemented in a tool called CssCoco which detects violations of coding conventions. These coding conventions are configured in a specifically design DSL, allowing developers to validate their own coding conventions. The catalogue is used for selecting popular coding conventions for the validation of the style sheets in the sample set. A selection of coding conventions from this catalogue have been implemented using the M3 _{model to validate the sample set.}

Code Smells in Cascading Style Sheets: An Empirical Study and a Predictive Model Gharachorlu [Gha14] has developed a tool called CSSNose, capable of detecting CSS code smells in any given website. Furthermore he conduct a large empirical study on 500 websites, to investigate which smells and errors are more prevalent and to what extent they occur in CSS

(9)

code of today’s web applications, showing that 499 out of 500 websites contained at least a single code smell.

Detecting Redundant CSS Rules in HTML5 Applications: A Tree Rewriting Ap-proach

Hague et al. [HLO14] created the TreePed tool which is a static analysis tool for the detection of redundant CSS rules. They do this by tree rewriting both the HTML and CSS.

Automated Analysis of CSS Rules to Support Style maintenance

Mesbah et al. [MM12] proposed an automated technique to support styling code maintenance that analyses the runtime relationship between CSS and DOM elements of a given web appli-cation, and detects unmatched and ineffective selectors, overridden declaration properties and undefined class values. The tool has been validated on 15 open source and industrial web-based systems. As a result finding on average 60% unused selectors, and 52% unused properties. Discovering Refactoring Opportunities in Cascading Style Sheets

Mazinanian et al. [MTM14] propose an automated approach to remove duplication in CSS code. More specifically, a technique that detects three types of CSS declaration duplication and rec-ommends refactoring opportunities to eliminate those duplications. Their technique is validated on 38 real-world web systems and 91 CSS files in total, showing that there is on average a 60% duplication in style declarations.

1.5 Contributions

• Prototype M3 _{front-end for CSS}

A prototype of the CSS front-end for the M3 _{model has been developed. Allowing CSS to}

be analysed, manipulated, and generated by programmers in an efficient manner. The CSS front-end has been validated with real-world CSS, proving that it is capable of handling the latest specifications of CSS that are used in the industry. Example analysis and trans-formations have also been implemented, supporting our claim that the M3 _{model makes}

these tasks efficient. Furthermore all the source-code regarding the prototype is publicly available on Github.com as it is a forked repository5 _{of the official Rascal repository for}

those wanting use it, or possibly continue this work. • Support claim

By implementing a CSS front-end for the M3 _{model, we can now support the claim of its}

authors; that the M3 _{model “is not only restricted to handling programming languages”.}

However, even though the front-end for CSS was successfully implemented, this does not guarantee that all other languages will also have the same success. Although the proven flexibility of the M3 _{model does suggest that the M}3 _{model will be able to cope with a lot}

of languages.

(10)

1.6 Outline

In this section, the structure of the thesis is explained. In Chapter 2 Cascading Style Sheets are introduced, as well as the M3 _{model in Chapter 3. The process of choosing a CSS parser and}

modifying it so that it aligns with the M3 _{model is explained in Chapter 4. How the front-end}

for CSS is implemented and what design decisions had to be made is described in Chapter 5. Then Chapter 6 is all about validating the M3 _{model with real-world style sheets. Finally, a}

(11)

Chapter 2

Cascading Style Sheets

CSS is a language that adds presentation semantics to hyper text documents written in languages such as HyperText Markup Language (HTML) and Extensible Markup Language (XML), al-though it can also be applied to other XML based documents such as Scalable Vector Graphics (SVG). It is a web standard and is actively being evolved and maintained by the Wide Web Consortium (W3C)1_{. CSS is very popular, as approximately 95% of all websites on the Internet}2

use CSS for styling their web documents. It is also ranked as the 6th most popular language on Github in 20153_{, rising from the 10th place in 2013.}

One of the main benefits of CSS is that it allows developers to separate the styling of a page (CSS) from its structure (HTML/XML). By doing so it encourages the reuse of styles, since the style sheets can be applied to multiple web pages, or even multiple websites.

CSS can be included to an HTML file in a couple of ways. It can be internal, which means that the CSS will be embedded within the HTML document between style tags. It can also be applied inline which means that the CSS is written directly in the style attributes of the HTML elements. However, the most popular method to include CSS to an HTML file is to have an external CSS file4_{. In which case the CSS is completely separated from the HTML, and is}

included in the HTML file by referring to it using a link tag. Overall, the external method is encouraged due to its clear separation with the HTML and thus its increased reuse abilities.

2.1 Selectors

For assigning a style to one or more specific HTML elements, the elements have to be selected. CSS does this by using so called selectors. There are three basic types of selectors, also called simple selectors. First there is the class selector, which is prefixed with a dot. It looks like .submit, and this example refers to a class with the value “submit”. Then there is the ID selector, prefixed with a hashtag. The ID selector therefore looks like #contact-form, and this example refers to a ID with the value “contact-form”. Finally there is the element selector that directly refers to its related HTML elements. This selectors contains no prefix and looks like this: input and in this case the selector refers to the “input” HTML element.

Then there are the pseudo selectors which can be divided into two groups. There are pseudo classes, such as :first-child and :hover, and then there are the pseudo elements such as

1_{https://www.w3.org}

2_{http://w3techs.com/technologies/details/ce-css/all/all} 3_{https://github.com/blog/2047-language-trends-on-github} 4_{http://w3techs.com/technologies/details/ce-css/all/all}

(12)

::after and ::first-line (note the double colon for pseudo elements). These are special selectors that need to be combined with one of the simple selectors to make them even more specific. An example is: li:even, which selects all even list-items.

Finally there are the attribute selectors, which looks at the attributes of HTML elements and its related values to select them. These also need to be combined with one a simple selector and look like this: a[target=" blank"], which selects all anchors where the target attribute has the value blank.

2.1.1 Combinators

Selectors can also be combined to create a more specific selector. To combine selectors, so called combinators are used. A single combinator is placed in between two selectors to specify their relation. There are a total of four different combinators.

• Descendant combinator

div span: Select all span elements within div elements (note that this combinator is a space).

• Child combinator

p > strong: Select all strong elements who are direct children of paragraph elements. • General sibling combinator

div ∼ p: Select all paragraph elements, only if they are preceded by the div element, and both share the same parent.

• Adjacent sibling combinator

p + span: Select all span elements that immediately follow the paragraph element.

2.2 Declarations

Declarations are used to set the property values of the selected HTML elements. Declarations are made up of a property and a value, separated by a colon (like: color: red). Declarations can only appear within declaration blocks (blocks starting and ending with curly brackets). Declaration blocks can contain zero or more declarations, although empty declaration blocks are considered a code smell by some CSS coding conventions5_.

2.2.1 Properties

All the HTML elements have default values for all their properties, which are assigned by the browser. The style sheets of the websites will then override these values to give the website the desired style.

Properties can be very specific overriding only a single value, but there are also special short-hand properties. These allow the overriding of multiple values by using only a single declaration. There are shorthand properties for backgrounds, borders, fonts, margins, and paddings. List-ing 2.1shows a total of 5 font related declarations,Listing 2.2is its related shorthand property that does exactly the same with only one single rule.

(13)

Listing 2.1: Font declarations 1 f o n t - s t y l e : italic; 2 f o n t - w e i g h t : bold; 3 f o n t - s i z e : .8 em; 4 l i n e - h e i g h t : 1.2; 5 f o n t - f a m i l y : Arial , sans-serif;

Listing 2.2: Font declaration shorthand

1 f o n t : italic bold .8 em /1.2 Arial , sans-serif;

2.3 Rules

The rule set is the most common type of rule since those are the rules containing actual style declarations. A rule set consist of one or multiple selectors (separated by a comma), followed by a block contain the style declarations. Listing 2.3shows a rule set that selects two types of buttons. The related style declarations are displayed on line 2-4.

Listing 2.3: A single rule set with two selectors and containing 3 declarations

1 input.submit , button.reset { 2 b a c k g r o u n d : # E32526; 3 p a d d i n g : 5 px; 4 b o r d e r : 0; 5 }

2.3.1 At-rules

Besides rule sets there are the at-rules that add different functionalities. At-rules can be rec-ognized by the @ symbol with which they all start. Some at-rules function as if statements, allowing the conditional processing of certain parts of the style sheet. Others at-rules add the functionality to define external fonts, or are used to import other style sheets. An example of an at-rule is the @media rule. Listing 2.4shows a @media rule on line 1, which states that the device should have a screen with a minimum width of 480px in order to apply its inner body and h1rule sets.

Listing 2.4: A @media rule that applies to screens with a minumum width of 480px

1 @media s c r e e n and (m i n - w i d t h : 480 px) { 2 body { 3 b a c k g r o u n d - c o l o r : lightgreen; 4 } 5 6 h1 { 7 f o n t - s i z e : 1.5 em; 8 } 9 }

Another example is the @font-face rule that is used to load an external font, which can then be used for font-family declarations. By adding the @font-face rule to a style sheet as shown in Listing 2.5, the font-family declaration on line 8 is able to use the custom “MyWebFont” font.

(14)

Listing 2.5: A @font-face rule declaring the “MyWebFont” font

1 @font-face {

2 f o n t - f a m i l y : 'MyWebFont ';

3 s r c : url (' myfont.woff2 ') format ('woff2 '),

4 url (' myfont.woff ') format ('woff '); 5 } 6 7 article p { 8 f o n t - f a m i l y : 'MyWebFont '; 9 }

2.4 Inheritance

Inheritance in CSS is much like inheritance in other programming languages. The properties which are assigned to a parent are passed on to its children. However, not all properties are inherited automatically. The reason behind this is that it does not make sense for all properties to be inherited. For example when assigning a padding property to an element, not all its children should inherit this padding property too, as this would result in a very confusing layout. For properties such as font though, inheritance does make sense. Setting the font property on the bodyelement, will result in all its children having the same font. Otherwise the font property has to be declared for every single element. Then there is also a way to bypass the default inheritance behaviour, for this the value inherit can be used on all properties, forcing the element to inherit its parent’s value for the related property.

All the properties that are automatically inherited are shown in the full property table6_.

2.5 Cascading

As the name “Cascading Style Sheets” already suggests, cascading is an important concept of the language. Cascading is a way to control declarations in case of conflicts (when multiple dec-larations apply to the same element). In order to solve these conflicts cascading uses importance, specificity, and the source order.

2.5.1 Importance

The importance of a declaration depends on the style sheet where the declaration is specified, as there are more types of style sheet than only the one of the website itself. A style can be declared in one of five different style sheets, those can be seen in the list below. The list shows the different types of style sheets increasing in importance, so the last ones are more important than the first ones in the list, meaning those will always override conflicting declarations defined in in the earlier defined style sheets [W3].

1. User agent style sheets

Built-in style sheet of the browser. 2. Normal declarations in user style sheets

(15)

Table 2.1: Selectors and their specificity

Example selector a b c d Specificity

h1 0 0 0 1 0001

#header > .left 0 1 1 0 0110

ul li:hover 0 0 1 2 0012

div #left#top#blue 0 3 0 1 0301

A style sheet defined by the user (the visitor) for all websites, this is useful for people with disabilities. For example people with bad sight can give all websites larger fonts, or people who are colour blind can alter the colours used by websites.

3. Normal declarations in author style sheets The style sheet of the website.

4. Important declarations in author style sheets

Same as above, only with the !important tag this time. 5. Important declarations in user style sheets

Same as above, only with the !important tag this time.

2.5.2 Specificity

A selector has a low specificity when it is likely to match many elements, such as the div selector for example. As you might expect, when a selector more specific, maybe even referring to only a single element, the specificity is higher, such as #header > .left-box h1. The specificity value is made up of four numbers, lets refer to them as a, b, c, and d. a is 1 when the declaration is within a style attribute directly on the element, otherwise it is 0. b counts the amount of ID selectors. c counts the amount of class selectors, pseudo-classes and attribute selectors. Finally dcounts the amount of element and pseudo-elements [W3]. Some example selectors are shown inTable 2.1, presenting exactly how their specificity is calculated.

2.5.3 Source order

When two declarations are assigned to the same element(s) and both have identical importance and specificity, then the source order will be used as the deciding factor, deciding which declara-tion will eventually be applied to the element(s). The declaradeclara-tion that appears later in the style sheet will be used, overriding all conflicting declaration that come before it [W3].

2.5.4 Challenging

Although cascading and inheritance are the key features of CSS, they are also the features which can make CSS challenging to implement correctly, which could lead to code smells within the style sheets, with code smells meaning a weakness in the design. Code smells for CSS range from having empty rule sets, to more severe code smells such as having duplicate or even dead code.

That cascading and inheritance are challenging to implement correctly is pointed out by Punt et al. . In his research he shows that approximately 25% of the rules within style sheets are guilty

(16)

of the “undoing style” code smell [PVZ16], which is a code smell that relates to duplicate and dead code. The “undoing style” code smell means that a property is set to value “A”, then later to a different value “B” (one or multiple times), to be later set back to value A again. This makes all the “B” declarations obsolete since only value “A” will be used no matter what, wasting both space in the style sheet and wasting computational power. This is one of the code smells which is a direct consequence of cascading, as it allows for this kinds of behaviour.

(17)

Chapter 3

M

3

model

The M3 _{model is a simple and extensible model for capturing facts about source code for future}

analysis [BHK+₁₅_{]. The goal of the M}3 _{model is to create a unified form for storing facts about}

source code, with M3 _{models for different languages having predictable shapes. However, the}

M3 _{model gives accuracy a higher priority than reuse, as accuracy is required for conducting}

insightful analyses. When accuracy is lost in the early stage of fact extraction from source code, the downstream analyses may show interesting but nevertheless meaningless information [BHK+₁₅_{]. Therefore it uses two types of intermediate representations to reason about code,}

these are the abstract syntax tree (AST), and a relational layer. The AST is specific to the language, while the relational layer is more abstract, allowing its structure to be reused for different languages. Since accuracy has the highest priority, the AST is first prescribed as it allows for the most detailed analysis, and only then the relational layer where it is possible to abstract.

Benefits of the M3_{model are that it is fully typed and serializable as readable text. This makes}

it easy to search through, read, and store. Another benefit of the M3 _{model is that it makes}

use of the source code locations provided by Rascal. These source code locations are hyperlinks, which refer to pieces of source code. These source code locations are clickable in the Rascal Read, Eval, Print Loop (REPL), which is like a console, allowing users to navigate directly to the related source code.

The M3 _{model’s core is designed to be language independent, facilitating not only volume}

metrics, browsing visuals (drill-down) and generic aggregation over containment relations, but also dependence between artefacts and thus impact and coupling/cohesion analyses [BHK+₁₅_].

The M3 _{model’s core has to be extended to a support a language, this is done by creating}

a front-end for that specific language. This front-end extends the M3 _{model’s core AST and}

relational layer.

3.1 Abstract Syntax Tree

The AST is based on algebraic data-types in Rascal since they are easy to extend by adding constructors. Adding constructors allows users to extend the M3_{model with new programming}

language constructs for specific programming languages. The M3 _{model’s core already comes}

with 5 standard sorts, these are: Expression, Declaration, Statement, Type, and Modifier. These should be used as much as possible without adding (too much) additional sorts, since keeping the amount of sorts as low as possible keeps the AST simple in use, and makes extracting facts easier since the structure of the AST is predictable despite the supported language.

(18)

The AST nodes also support annotations, which can store additional information related to the node. The list below shows the annotations per sort, and for what their intentions are.

• Declaration

@src: Containing a physical source code location. @decl: Containing a logical source code location.

@typ: Contains a sort called TypeSymbol used to represent any kind of abstract value that variables and expressions in a language may produce.

@modifiers: Contains possible modifiers as the Modifier sort @messages: Contains error messages and warnings.

• Statement

@src: Containing a physical source code location. @decl: Containing a logical source code location. • Expression

@src: Containing a physical source code location. @decl: Containing a logical source code location.

• Type

@name: Contains the name of the Type.

• Modifier

There are no annotations for the modifier sort.

3.2 Relational layer

The relational layer only contains binary relations, these binary relations are almost all between source code locations. The M3 _{model’s core provides 8 relations by default}1_{, these are shown in}

the list below.

• Declarations: Maps declarations to where they are declared. Contains any kind of data or type of code declaration.

• Uses: Maps source code locations of usages to the respective declarations.

• Containment: Maps what is logically contained in what else (not necessarily physically). • Types: Assigns types to declared source code artefacts.

1_{https://github.com/usethesource/rascal/blob/master/src/org/rascalmpl/library/analysis/m3/Core.}

(19)

• Messages: Shows error messages and warnings produced while constructing a single M3

model.

• Names: Maps logical source code locations to readable names.

• Documentation: Assigns comments and Javadoc to declared artefacts. • Modifiers: Assigns modifiers to declared artefacts.

Although there are 8 relations defined in the core of the M3_{model, only the first three of them}

are actually “necessary core relations” [Vin]. This means that only 3 relations are required for a correct implementation of the M3_{model. These 3 are the containment, declarations, and uses}

relations.

3.3 Rascal

The M3_{model is part of the Rascal meta-programming language. A meta-programming language}

allows the creation of meta programs, which are programs that use other programs as input. In our case, these programs are style sheets. Meta programs can be used for a lot of different tasks from counting the lines of code, to detecting code smells, or implement refactorings.

There already exist tools which provide solutions for these kinds of tasks. From grep for simple tasks, to complete parsers for the more complex tasks. However, Rascal is specifically designed to support the complete domain, allowing both the analysing and manipulating of source code. Although these are two different areas, Rascal completely merges them both in, living up to its self-proclaimed title as the “one-stop-shop” [KLvdP12]. Rascal can be used via its Eclipse2

plugin, or via its standalone command-line console3_.

2_{http://www.eclipse.org}

(20)

Chapter 4

CSS Parser

The M3 _{model for CSS requires the CSS to be parsed first, the generated parse tree will then}

be converted into an AST, or used to create the relational layer. Due to the limited amount of time and the focus of this thesis being on the M3 _{model and its CSS analysis and manipulation}

related capabilities, the decision has been made to use a third-party CSS parser. A third-party CSS parser will save time and energy, better spent on the M3_{model. The third-party CSS parser}

has to comply with the following two requirements to be interesting for this project. • Runs on JVM

Because Rascal is also runs on the JVM, making it compatible. • CSS3 Support

CSS3 is the latest standard for CSS, adding new specifications since 2012. Although some specifications are still drafts, they are already supported by most browsers, and are therefore used a lot for web applications. To give an example, the box-shadow CSS3 property is already used by 52% of the 1,218,301 web pages scanned by the Bing search engine1_.

Therefore the CSS parser has to support a large subset of these CSS3 specifications in order to be able to parse CSS that is used in the industry now. To be more specific, the CSS3 specifications defined in the CSS Snapshot 2015 should be supported. This gives a clear point of reference since the CSS Snapshot 2015 defines the current scope and state of Cascading Style Sheets as of late 2015 and is given the “Stable” maturity level by the W3C [W3Cb].

The following CSS parsers have been considered for this project: • SAC: The Simple API for CSS2

SAC is built in Java and maintained by the W3C. Unfortunately, since its latest release is from 2002, it does not yet support CSS3. Therefore SAC was not an option.

• CSS Parser3

CSS Parser is built in Java, and is still being maintained. It also supports a considerable subset of the CSS3 specifications.

1_{https://developer.microsoft.com/en-us/microsoft-edge/platform/usage/} 2_{https://www.w3.org/Style/CSS/SAC/Overview.en.html}

(21)

• jStyleParser4

The jStyleParser is also built in Java, and is actively maintained too, supporting a consid-erable subset of the CSS3 specifications.

For this project the jStyleParser has been chosen due to its CSS3 support, development status, and documentation. Furthermore, the jStyleParser uses the ANTLR3 parser generator5 _for

parsing the CSS, and at the time of choosing a CSS parser, the jStyleParser was being updated in a forked repository to use a newer version of ANTLR, being ANTLR46_{. Since I already have}

some experience with ANTLR4, making modifications to the parser should take less effort, and thus be time efficient.

4.1 CSS Support

Although the jStyleParser supports a lot of CSS specifications, it did not support all specifications defined in the 2015 CSS Snapshot [W3Cb]. The specifications in the list below were not supported completely or only partially by the jStyleParser. Because of this the parser has been modified in order to supports all the specifications noted in the 2015 CSS Snapshot [W3Cb].

• Namespaces

The @namespace rule is used for declaring a default namespace and for binding namespaces to namespace prefixes.

• CSS Counter Styles Level 3

The @counter-style rule is used to define custom counter styles for use with CSS list-marker and generated-content counters.

• CSS Animations

CSS animations can be used to animate the values of CSS properties over time. It uses the @keyframesrule to specify the values for the animating properties at various points during the animation. These @keyframes rules were not yet supported by the jStyleParser. • CSS Values and Units Level 3

Only the calc function used for calculating mathematical expressions was not yet sup-ported.

• CSS Speech

The Speech specification was not completely supported since the dB unit is not yet imple-mented by the jStyleParser.

4.2 Vendor specific changes

Even after implementing all CSS specifications defined in the 2015 CSS Snapshot [W3Cb], parsing style sheets from the sample set (Appendix B) still resulted some parse errors. These where mostly due to vendor specific properties or functions (with vendors being a different term for browsers).

4_{http://cssbox.sourceforge.net/jstyleparser} 5_{http://www.antlr3.org}

(22)

Even when a CSS specification is still in its experimental phase, browsers tend to implement them. They do this in order to test and perfect the functionalities described in the specifications on their layout engine (such as Webkit for Safari and Google Chrome, Gecko for Mozilla Firefox, and EdgeHTML for Microsoft Edge) prior to completely supporting the specification.

An example of a CSS specification, which is at the time of writing in its experimental phase, is the CSS animations specification that currently has the “working draft” maturity level7_{. Even}

though the specification is still experimental, most major browsers already support it8_.

How-ever, browsers do not implement the specification as-is, they create their own version of the new feature (which could be a new property, rule, or function). These vendor specific features can be recognized by their vendor prefix beginning with a dash, the browser or layout engine tag, follow-ing by another another dash, and finally the property, rule or function. This implementation will then be used to experiment with, and will finally be applied to the original (non vendor prefixed) feature when it gets assign a more stable maturity level. To give an example,Listing 4.1shows 4 different types of the “animation” property with the first 3 being vendor specific.

Listing 4.1: Vendor specific animation declarations

1 # box {

2 - w e b k i t - a n i m a t i o n : slideUp 5s infinite; /* Safari & Chrome */

3 - m o z - a n i m a t i o n : slideUp 5s infinite; /* Firefox */

4 - o - a n i m a t i o n : slideUp 5s infinite; /* Opera */

5 a n i m a t i o n : slideUp 5s infinite;

6 }

These vendor specific prefixes are not directly an issue when parsing the CSS since grammat-ically they are fairly similar. However, some browsers such as Internet Explorer add vendor specific features, which are not based on any official W3C specifications. This sometimes con-flicts with the grammar of CSS. Two of these concon-flicts where found when testing the jStyleParser with CSS from the sample set (AppendixB). Since these vendor specific features conflict with the grammar of CSS, the jStyleParser was not able to parse these features. These two were the expressionfunction (Listing 4.2) and targeting hack (Listing 4.3), both introduced by Internet Explorer.

The expression function (Listing 4.2) was Internet Explorer’s attempt at creating a function that would be able to calculate mathematical expressions. Fortunately the expression function is no longer actively maintained9 _{by Microsoft (due to introduction of the official W3C calc}

function). However, 9 of the 50 websites from our sample set (Appendix B), still used the expression function. It was used a total of 369 times, which was measured using a regular expression based search through all the style sheets. Due to the high usage of this function, support for the expression function has been added to the jStyleParser in order to acquire the most accurate representation of the style sheet, as otherwise the entire declaration would be ignored since it would not be able to parse.

Listing 4.2: Expression function

1 l e f t : expression ( document.body.clientWidth /2 -oDiv.offsetWidth /2) ; The targeting hack (Listing 4.3) is Internet Explorer’s way to target a declaration for a specific version of Internet Explorer. The example below contains two padding declarations. If the user visits the website using Internet Explorer 10, the second padding declaration would overwrite

7_{https://www.w3.org/TR/css3-animations/}

8_{http://www.w3schools.com/cssref/css3_pr_animation.asp} _& _{http://www.w3schools.com/cssref/css3_}

pr_animation-keyframes.asp

(23)

the first. Other browsers would ignore the second padding declaration since it is grammatically flawed and apply the first. Even though it is not an official feature of CSS, the targeting hack was used a total of 175 times by 22 of the 50 websites in our sample set (Appendix B), which also was measured using a regular expression based search through all the style sheets. Due to the high usage of the targeting hack, the decision has been made to modified the jStyleParser in order to ignore the hack (the backslash followed by the version of Internet Explorer as can be seen on line in theListing 4.3), while still being able to parse the related declaration.

Listing 4.3: Internet Explorer’s targeting hack

1 . left {

2 p a d d i n g : 3px 0 1px 11 px;

3 p a d d i n g : 4px 0 1px 11 px \0;

4 }

Then besides the unsupported situations in the jStyleparser, the AST also contains a specific situation in which is loses some information, regarding vendor specific @keyframes rules. Since the CSS3 animation specification is still in development, browsers implement these kinds of features using their vendor prefixes (§ 4.2). Properties and function with a vendor prefix are supported by the parser and can also be printed by the pretty printer. However, vendor specific @keyframes rules such as @-moz-keyframes are parsed with success, but are pretty printed without their specific prefix, resulting in just a normal @keyframes rule.

4.3 Remaining changes

Besides changes made to the jStyleParser due to its lack of CSS3 support or because of issues with some vendor specific features, there were still some remaining changes which had to be made in order to fully align the jStyleParser with the M3_model.

• Visitor pattern

The only way to traverse the AST that jStyleParser parsed, was by manually looping through all the nodes as shown in the jStyleParser demo10_{. To make the AST more easily}

traversable than their manual approach, a visitor pattern was implemented. Furthermore by implementing the visitor pattern a more structure solution was created, allowing for easier modifiability.

• @import rule

The @import rules are used to import other style sheets. The jStyleParser decided to actually import and parse the other style sheets that are referred to using @import rules. A style sheet with several @import rules would therefore result in a single parsed style sheet. However, this is exactly how W3C describes user agents to treat an @import rule; “If an @import rule refers to a valid style sheet, user agents must treat the contents of the style sheet as if they were written in place of the @import rule.”11_{. But as a result of}

this, the @import rules are not present in the AST. Although this kind of behaviour makes sense in case of a browser, the M3_{model should also contain import rules in order to stay}

as accurate as possible when representing the actual style sheet. • Documentation

10_{https://github.com/radkovo/jStyleParser/blob/master/src/test/java/test/ParserDemo.java} 11_{https://www.w3.org/TR/css3-cascade/#at-import}

(24)

In the ANTLR4 grammar of the jStyleParser, comments are set to be ignored. Although comments may not be useful when you simply want to parse the actual CSS, for the M3

model, comments are very valuable. CSS supports comments with the following format: /* Some text here */, and these are therefore the only types of comments parsed by the jStyleParser. Single line comments with the following format: // Some text here will be ignored since those are not valid comments in CSS12_{. Valid comments are now linked to}

nodes in the AST to which they most likely relate, these nodes are the style sheet, rule set, the different at-rules, and declaration nodes. The comments are linked to these nodes using the @documentation annotation.

• Locations

The M3 _{model is heavily reliant on source code locations for all the software artefacts.}

Unfortunately the jStyleParser did not keep track of these locations for rules, declarations, and other software artefacts. Therefore all the location information was retrieved using the ANTL4 parser and added to the AST nodes. A location consists of a offset, length, start line, start column, end line, and end column, these are all necessary for a complete source code location value in Rascal.

• Error messages

When the parser is not able to parse a piece of CSS, it should ignore that part and try to recover from the next semicolon or right curly bracket it encounters. These errors should then be present in the M3_{model, to apprise the user of any parsing errors that could have}

resulted in an incorrect/incomplete AST or relational layer. Since the jStyleParser didn’t log all these parse errors, it had to be modified in such a way that it stores these parse errors, to be later passed on to the M3 _{model and placed in the @messages relation.}

• Empty rules

The jStyleParser had to be modified since it ignored empty rules. Even when a rule does not contain any declarations, it should still be parsed so that the M3 _{model can give the}

most accurate representation of the actual style sheet. Since empty rules are a code smell in some coding conventions13_{, ignoring them would not allow the M}3 _{model to be able to}

detect them. Therefore the jStyleParser has been modified so that also empty rule sets are parsed, making sure the M3_{model represents the style sheet most accurately.}

12_{https://www.w3.org/TR/CSS2/syndata.html#comments}

(25)

Chapter 5

Implementation

During the development of the CSS front-end, a couple of design decisions had to be made related to the AST structure and relational layer. These design decisions will be explained and motivated in this chapter.

5.1 Abstract syntax tree

For the creation of the AST for the M3 _{model, the nodes had to be one of 5 predefined sorts}

(Statement, Declaration, Expression, Modifier, and Type). These 5 sorts are predefined in the M3_{model’s core to prevent a wide spread of sorts, decreasing usability and re-usability. For the}

M3_{model for CSS no additional sorts had to be created, as all 5 predefined sorts applied to CSS.}

However, some of the sorts did get additional annotations specific for CSS. Furthermore some of the predefined annotations have not been used, as those where not relevant for CSS.

The @typ annotation has not been applied to any of the sorts. The reason being for the @typ annotation, is that the type of a node can immediately be resolved from the node itself, as all types are explicitly defined in the AST. The @typ annotation could later prove useful, when CSS chooses to implement the cascading variables, which have been proposed as a new primitive value type1_{. As then the cascading variables could use the type annotation to resolve their specific}

type, since that cannot be resolved directly based on only the variable name.

The 5 predefined sorts, and the annotations that they actually use, will be explained in the list below.

• Declaration

The Declaration contains all the larger software artefacts such as the style sheets, rule sets, and the different at-rules that are all in some sense declared. Since the Declaration sort also contains the node for an entire style sheet, it will always be top node of the AST. A new annotation which has been added is the @documentation annotation, since declarations can have comments related to them. Furthermore it uses the @src annotation to display its physical source code location, the @decl annotation to display its logical source code location, and the @messages annotation to display any parse errors from the jStyleParser. • Statement

There is only one type of statement in CSS, which is an assignment statement, also known as a declaration in CSS (not to be confused with the Declaration sort). A CSS declaration

(26)

look as follows: border: none (a property and value, separated by a colon), and it assign a value to a property. For the Declaration sort, two additional annotations have been added, these are the @documentation, and @modifiers annotations. The @modifiers annotation is used to contain any optional !important tags which a CSS declaration can contain. Furthermore the @src and @decl annotations are also used here.

• Expression

By definition, expressions in software are a combination of values and operators that pro-duce another value. By this definition, selectors can be considered as expressions, since they are made up from simple selectors, pseudo selectors, and attribute selectors which can be seen as values, and the combinators can be seen as operators. These selectors express a specific path to a one or more Document Object Model (DOM) elements, but also produce another value, the specificity value. Therefore, selectors are of the sort Expression. Be-sides selectors, media expressions are also considered expressions since they are converted to boolean values, used in @media rules, which are conditional rules similar to an if state-ment. The Expression sort has been given an additional annotation, the @combinator annotation, which is used for selectors. It has been placed in the annotation since it is not always present for each simple selector, simplifying the AST design. Furthermore the Expression sort uses the @src and @decl annotations too.

• Type

Type contains types for both the values of declarations as well as the types of (simple) selectors. Selectors can have types such as: class, ID, or element, related to the .left, #header, and span selectors respectively. Values can have types such as: angle, length, string, or uri, related to the 1deg, 1px, ‘‘this’’, and url(logo.jpg) values respec-tively. Besides these two groups, there is one single node which does not belong to both. This is the media query, which is used as the value of a @media rule. Additional anno-tations for the Type sort are the @operator annotation, and the @src annotation since the Type sort did not have it by default. The @operator annotation is used for values of declarations, as those sometimes require an operator such as a comma between values. This is not common for most values, since most values are divided by spaces only. However, the occasional occurrence of an operator will be kept in the annotation, keeping the AST simple because then it would not have to be embedded in all values nodes.

• Modifier

The Modifier soft contains three nodes. First there is the important modifier, which could be added at the end of a declaration to state that the declaration should not be overridden by other declarations even if they are higher on the cascading priority list. Some CSS coding standards state that using the important modifier should be avoided (with some exceptions)2 3_{. Second there is the combinator modifier, used to combine selectors into a}

single selector. Finally there is the operator modifier, which is applied when a declaration can contain multiple values that are not only separated using spaces but other symbols too.

As can be seen in the AST in Appendix A, the modifier sort is never referred to from any other node. This is because the modifier nodes are added to other nodes using annotations. This is a design decision which has been made to prevent that the AST would become complex, having to deal with occasional modifier nodes. Furthermore annotation also

2_{http://cssguidelin.es}

(27)

make the AST easier to use since detecting whether a node has a certain modifier is now possible with a one liner. An example of this with the important modifier can be seen in Listing 5.1, where the existence of the important modifier is detected on declarations.

Listing 5.1: Function that detects the usage of the important modifier

1 l i s t[loc] importantChecker (D e c l a r a t i o n stylesheetAST ) = [ d@src | /

d:declaration (str property , l i s t[T y p e] values ) := stylesheetAST ,

d@important ?];

5.1.1 Pretty printer

The AST allows to be easily restructured, and by doing so, refactor the related CSS. To con-vert these changes from the restructured AST back to actual CSS, a pretty printer had to be developed. For the development of the pretty printer, the ADT2PP tool was used [Zay]. This is a tool that generates a minimalistic pretty printer in Rascal, based on the AST from the M3

front-end for CSS. With the basic pretty printer generated by ADT2PP, only the specific CSS syntax had to be configured.

To make sure the printed CSS would be well formatted, Google’s CSS formatting rules were applied4_{. The formatting rules of Google were used because those are very specific and complete,}

even describing where spaces should be placed. These formatting rules consist of the following 8 rules.

1. Alphabetize declarations 2. Indent all block content

3. Use a semicolon after every declaration 4. Use a space after a property name’s colon

5. Use a space between the last selector and the declaration block 6. Separate selectors and declarations by new lines

7. Separate rules by new lines

8. Use single quotation marks for attribute selectors and property values

Besides the formatting rules, the decision has been made to print colour values as hexadecimals as long as the alpha value would be 1, which states that the colour is non-transparent. For (partial) transparent colours (with a alpha value below 1), the rgba function will be used instead, since it is specifically for representing transparent colours, as those cannot be represented using hexadecimals.

In case developers would want to use their own formatting rules instead of Google’, they can easily be recreate and modify the pretty printer in order to apply other formatting.

(28)

5.2 Relational layer

The relational layer is built upon binary relations, which are mostly between source code lo-cations. The M3 _{model comes with 3 generic core relations that are necessary for the correct}

implementation of the Relational layer [Vin], how these relations are used for CSS is explained in the list below.

declarations: Contains code entities, and relates their physical source code location to their logical source code location. In other words, it maps code entities to where they are declared.

containment: Displays the logical containment of code entities. Examples are: style sheets that contain rule sets, and rule sets that contain declarations.

uses: There are currently two situations in which the uses relation can be applied. First there are the font declarations, which refer to fonts that are declared in a @font-face rule. In the same sense there are the animation declarations, which refer to their related @keyframes rules. However, these two situations are very specific. According to statics from build-with.com, the @font-face rule is only used by 0.3% of the websites on the internet5_.

Statistics for the @keyframes rule are not present yet, however, since the CSS animation specification is still undergoing development (it has the “Working Draft” maturity level) as opposed to the “Candidate Recommendation” maturity level of the @font-face rule, its usage will most likely be even lower than 0.3%. Nevertheless, the uses relation has been implemented to support the two mentioned cases. In the future, CSS variables could also be added to the uses relation since cascading variables have been proposed as a new primitive value type6_.

The optional relations and how they are used in the M3_{front-end for CSS is shown in the list}

below.

names relation is used to map names to locations, in case of CSS, names are given to classes and ID selectors as those are the only places where the developers can freely decide which names they want to use. Furthermore the name of the entire style sheet is also used, as its file name is also chosen by the developer.

documentation: Maps comments to their related rule sets, declarations, at-rules or style sheets. modifiers: Is used to map modifiers, such as the important and combinator modifier, to their

related declaration or type.

types: Is not used for the M3front-end for CSS. The reason for this is that all types have been explicitly defined as nodes in the AST. This is possible due to the fact that CSS has a limited amount of types that can all be directly resolved since there are no variables (yet) in CSS, therefore the types relation has been made obsolete for now. Although this can change in the future when the cascading variables get accepted as a new primitive value type7_{. Then the cascading variables could use the types relation to resolve their specific}

type, since the types of variables cannot be resolved directly based only on their variable name. In the AST the @type annotation would then be applied for the same goal.

5_{http://trends.builtwith.com/docinfo/Font-Face-Rule} 6_{https://www.w3.org/TR/css-variables-1/}

(29)

messages: A special kind of relation, as it is different from the relations declared above. It contains a list of messages, parse errors to be specific, and those relate to the entire M3

model. These errors contain a short error message and a source code location pointing to where the error occurred in the style sheet, allowing developers to easily locate, and possibly fix the parse error.

Inspired by the uses relation, a new relation could be implementing in the future, making the CSS have a relation with the related HTML it uses to assign its styles to. Unfortunately for now, no HTML front-end for the M3_{model exists at the time of writing, making these kinds of}

relations not yet possible. Nevertheless, a combination of both CSS, HTML, and even Javascript M3_{front-end’s would create interesting possibilities for relations, improving the possibilities for}

analysing and manipulating web applications. For example, the detection of unused CSS rules as researched by Adegeest [Ade15] or the detection of the AB*A code smell as researched by Punt et al. [PVZ16].

5.2.1 Logical code locations

The relational layer contains relations between source code locations. These source code locations can have one of two forms. First there are the physical code locations, as can be seen inListing 5.2. Then there are the logical code locations, as can be seen inListing 5.3.

Listing 5.2: A physical code location

1 | home: /// workspace / youtube.com / style / main.css |(119953 ,16 , <1 ,119953 > , <1 ,119968 >) >

Although the path up to the CSS file itself is easily readable in the physical code location (Listing 5.2). The position information at the end consists of a couple of integers making it hard to read and to remember. Furthermore, there is no information whether a location refers to a rule set, selector, or a different code entity. Because of this, the M3 _{model uses the logical code}

locations which are much more readable (Listing 5.3). These logical code locations are created for all source code elements that are in some sense declared [BHK+₁₅_].

Listing 5.3: An ambiguous logical code location

1 | css + declaration: /// style.css / overflow |

By looking at the logical code location shown in Listing 5.3, we can immediately see that it refers to an overflow declaration. Although these logical code locations are an improvement regarding readability, a problem is introduced when applying them to CSS. As logical code locations for Java applications are all unique, every class, method, or variable, has a unique signature path. This is not the case for CSS, since CSS can have multiple occurrences of the same selector, declaration, or rule. This is due to the cascading characteristic of CSS. To prevent that logical code locations cannot be distinguished from each other, numbers have been added at the end of each logical source code location, incrementing for each logical source code location that already exists. This will result in logical code locations as the one shown inListing 5.4.

Listing 5.4: An unique logical code location

1 | css + declaration: /// style.css / overflow (6) |

Appending line numbers at the end was also briefly considered, but since CSS can be minified8

when used in production, the line numbers would not be very helpful. Furthermore, style sheets

(30)

can be very large having over tens of thousands of lines of code. Remembering these large line numbers would require more effort than remembering smaller numbers, making them a bit easier to comprehend.

Special characters

Source code locations do not support all characters which are used by CSS, examples are the #, > characters, and spaces too. These are characters which are all used in selectors. A selector such as this shown inListing 5.5, it would be created into a source code location as shown inListing 5.6, where all the unsupported characters would be encoded, greatly affecting the readability.

To solve this issue, the prefixes for ID’s (the dot) and classes (the hashtag), and all the combinators have been replaced with related keywords. The prefixes for ID’s and classes have now been replaced with “id:” and “class:” respectively. Combinators have been replaced with their abbreviation between parenthesis. “CH” for child, “GH” for general sibling, “AS” for adjacent sibling, and “DS” for descendant.

The selector fromListing 5.5now results in a better readable source code location which can be seen inListing 5.7.

Listing 5.5: Selector containing unsupported characters

1 # left > . header + # top ˜ div span

Listing 5.6: Logical code location containing unsupported characters

1 | css + selector: /// style.css /\%23 left \%3 E.header +\%23 top ˜ div \%20 span |

Listing 5.7: Transformed logical code location

1 | css + selector: /// style.css / id:left (CH) class:header (AS) id:top (GS) div (DS ) span |

5.2.2 Documentation

The @documentation annotation in the M3_{model contains comments, taken from the style sheet.}

These comments relate to code entities such as rules or declarations, most likely explaining some taken design decision that is not immediately clear from reading the CSS. Since a comment without any context is most likely not very informative, the comments used for the relational layer must be related to a code entity. To be able to relate comments to code entities, their position related to a code entity has to be taken into account. Comments are used or ignored depending on their position. To find out which positions are common positions for comments in style sheets, a search has been conducted on CSS coding conventions. The result of this search is that comments on the positions noted in the list below are used and linked to their related code entity, and that comments on other positions are ignored.

• At the beginning of the style sheet

A comment at the beginning of the style sheet is recommended by multiple coding con-ventions9 10 11_{. Overall a comment at this position contains information such as a table of}

contents (in case of large style sheets), authors, a general description and a licence.

9_{https://make.wordpress.org/core/handbook/best-practices/coding-standards/css/} 10_{https://www.drupal.org/node/1887862}

(31)

• Above rule sets and at-rules

Different coding conventions require comments which are related to rules, to be positioned right above the rule12 13_.

• Above declarations

Multiple coding conventions require comments to be on separate lines14 15 16_{. Although,}

there are some other coding conventions which allow comments related to declarations to be on the same line, right behind the declaration17 18_{. In a ideal case, comments both}

above and behind a declaration should be able to be linked to the declaration. However, the part of ANTLR4 grammar which is used for the declarations, does only allow for the creation a relation with comments right above the declarations. To be able to support comments behind declarations, the grammar would require excessive changes. Since the ANTLR4 grammar for the declarations is equal to the official CSS grammar [W3Cc], the decision has been made to oblige the official definition of the CSS grammar. Therefore not to rewrite the ANTLR4 grammar, thus only supporting comments above declarations. Listing 5.8shows a small style sheets with comments in the correct positions for being linked their related code entity.

Listing 5.8: Example CSS with comments in the correct positions

1 /**

2 * fancy.css v2.26

3 * Author: Nico de Groot 4 * MIT License

5 */ 6

7 # wrapper {

8 p a d d i n g : 10 px 5px 10 px 5 px;

9 /* ! important is required for IE 10 */ 10 w i d t h : 100% ! important;

11 } 12 13 /**

14 * Standard block style 15 */

16 . block {

17 b a c k g r o u n d : red;

18 }

5.3 Example

The small style sheet displayed in Listing 5.9 will be used as the input for the creation of an M3 _{model. The created AST is shown in} _{Listing 5.10}_{, and the relational layer can be seen in}

12_{https://www.drupal.org/node/1887862} 13_{https://api.backdropcms.org/css-standards} 14_{https://github.com/cbracco/css-conventions} 15_{https://github.com/ThinkUpLLC/ThinkUp/wiki/Code-Style-Guide:-CSS} 16_{http://wiki.liquid-contact.com/index.php?title=CSS_Coding_Conventions} 17_{https://make.wordpress.org/core/handbook/best-practices/coding-standards/css/} 18_{https://api.backdropcms.org/css-standards}

Analysing and manipulating CSS using the M3 model