A Test Suite Reduction Method for User-Session-Based Testing With Client-Side Logging

(1)

A Test Suite Reduction Method for

User-Session-Based Testing With

Client-Side Logging

Niels Boerkamp

niels.boerkamp@student.uva.nl

August 3, 2018, 34 pages

Supervisor: dr. Ana-Maria Oprescu Host supervisor: Jaap de Vries

Host organisation: Voorraedt,https://www.smarteventmanager.nl

Universiteit van Amsterdam

(2)

Abstract

We propose a new reduction method for user-session-based testing. With user-session-based testing, we monitor what a user is doing in the application and use that information to test the application. Eventually, it will become inefficient to test the application with all the recorded user-sessions. For example, among the recorded user sessions might be duplicates, or user-sessions that might not be using any significant features. To increase the efficiency of a test suite, we apply test suite reduction. With test suite reduction we create a subset of the test suite which, ideally, has the same coverage and detects the same errors, but with fewer test cases.

Existing reduction methods used user sessions that were collected on the server using request logs. For modern web applications, it is not sufficient any more to solely rely on request logs. Older web applications were server-side generated and, therefore, all user actions could be observed from the request logs. This is not the case anymore due to the change of architecture and use of JavaScript in those modern web applications.

The method we propose utilizes the additional information that comes with client-side logging. With client-side logging, we can obtain more detailed information about what the user does with the application. For example, we can count the number of actions that were performed, or observe what HTML-elements were touched by the user. This additional information enables for more thorough decision making about what user-sessions to put in the reduced test suite. The proposed reduction method creates a reduced test suite that retains its coverage (0% coverage loss) and still detects 80% of the faults when tested on the bookstore application.

(3)

6.1.2 Most elements . . . 21 6.1.3 Charts . . . 21 6.2 Dijkstra . . . 23 6.2.1 Most actions . . . 23 6.2.2 Most elements . . . 23 6.2.3 Charts . . . 23 7 Discussion 25 7.1 Results. . . 25 7.1.1 Pre-selectors . . . 25 7.1.2 Test selector . . . 26 7.1.3 Code coverage . . . 27 7.2 Threats to validity . . . 27 7.2.1 User-sessions . . . 27 7.2.2 Thresholds . . . 27 7.2.3 Test application . . . 27 7.3 Reflection on usefulness . . . 28 7.4 Limitations . . . 28

7.5 Answering the research questions . . . 28

8 Related work 30 8.1 Synthesis table . . . 30 8.2 Concept Analysis . . . 31 8.3 HGS . . . 31 8.4 Greedy. . . 31 8.5 Genetic algorithm . . . 31 9 Conclusion 32 Bibliography 33

(5)

Chapter 1

Introduction

It is hard to think of our daily lives without the use of web applications anymore. Some examples that indicate our dependency on these web applications are from webshops, social media, and Software as a Service (SaaS)-applications. Previously we went to physical stores to make a purchase. Nowadays, we often go to a webshop and make our purchase there. Moreover, the usage of social media increased significantly. According to Zephoria [32], Facebook has 2.2 billion monthly users, and this is likely to keep growing. The estimated market value of SaaS-applications [28] indicates how we started to use web applications more. The SaaS market was estimated to be worth 73.6 billion U.S. dollars in 2018, and this is still growing since this was 10.3 billion more than last year.

With this heavy usage of web applications, it becomes even more important for software developers to make sure their software works correctly since malfunctioning software can disturb the daily lives of people and businesses severely. Besides disturbing people and businesses malfunctioning software also has a financial impact. According to the National Institute of Standards and Technology, the an-nual estimated cost for inadequate infrastructure for software testing is between 22.2 and 59.5 billion dollar[20].

Modern web applications started to offer more features that were previously only limited to desktop applications. To facilitate this, (business) logic had to be added to the client-side as well. To structure the additional client-side logic, JavaScript libraries and frameworks are used. In the statistics from StackOverflow [27] we can see that the popularity and usage of these frameworks and libraries increased significantly since 2012.

As mentioned, it is important to make sure the web applications work correctly, since the heavy use of it and the costly consequences malfunctioning software could have. A technique used for testing web applications is user-session-based testing. With user-session-based testing, we monitor what a user is doing in the application, and use that information to test the application. In previous research [9, 22, 24] those user sessions were collected using server-side logging by monitoring the request logs. For older web applications it was sufficient to observe the requests send between front-end and back-front-end, but this is not the case anymore due to the transition of (business) logic. With front-end logic, there are user actions that do not trigger a request to the server anymore. Therefore, for accurate and re-playable user-sessions, we also need to monitor what happens on the client-side. Client-side logging might not only be necessary, but there might also be some advantages.

With this research, we investigated what the advantages of client-side logging are and how these could be leveraged by a reduction method for user-session-based testing. We showed the potential of these advantages by proposing a new reduction method. Besides the investigation of client-side logging and proposing a new reduction method, we also did a pre-study wherein we analyzed new automated testing tools, which are required for user-session-based testing.

(6)

1.1 Problem analysis

In previous work [13, 24], it has already been stated that creating and maintaining scenario tests is a time-consuming task. In 2005 Elbaum et al. [9] provided a solution which makes it easier to create those scenarios tests, and they named it user-session-based testing. With user-session-based testing, we monitor what a user does in the application, and use that information to replay their actions as tests.

However, user-session-based testing comes with a challenge. It is not efficient to use all the collected user-sessions when testing the application. It would just take too long to execute all the tests in the test suite. Moreover, among the user sessions might be duplicates and user sessions that are not using any significant features. Therefore, we need reduction methods to reduce the number of user-sessions in the test suite.

Thus far user-sessions were mainly collected using server-side request logging. Recently, client-side logging of user sessions was mentioned [15,16], but it was only mentioned as an alternative for server-side logging. We investigated the possible advantages client-server-side logging might have over server-server-side logging to improve test suite reduction.

A second problem we addressed with this thesis is the use of scenario testing in continuous delivery environments. In continuous delivery environments, additions to the code base are validated by some steps. Steps that could be part of this process are compiling the code and testing it with automated tests. Additions to the code base happen multiple times per hour and, since resources are limited, executing the entire (reduced) test suite is not feasible. We investigated if it is possible to reduce the reduced test suite even further in a smart way, so it becomes better applicable in a continuous delivery environment.

1.2 Research questions

With the first research question, we investigated what benefits and additional information could be collected with client-side logging. In previous work [15,16], client-side logging has been mentioned as a replacement for server-side logging, but the possible advantages were never explored. This resulted in the following question:

• Research question 1 - What benefits does client-side logging provide, compared to server-side logging, for user-session-based testing reduction methods?

With research question 1 we learned about what additional information can be obtained with client-side logging. With the next research question, we wanted to put this into practice and find out how this can be used to create a reduction method. This brought us to the following research question:

• Research question 2 - How could the benefits of client-side logging be leveraged to create a reduction method?

The reduced test suites are sometimes still too large for continuous delivery environments. To investigate how reduction methods could be better fitting for continuous delivery environments we asked ourselves the following question:

• Research question 3 - How could the reduced test suite be reduced even further, and what is the impact of this on performance?

(7)

1.3 Solution outline

We implemented client-side logging and the reduction method to find out what the advantages of client-side logging are, and how these could be used for improving reduction methods.

For logging user-sessions, we created a JavaScript library which records user-sessions by subscribing on several events. When a user session ends, this library sends the event logs to our back-end where they are stored for later use. The most important information we collected was: on what HTML-element the user’s action was performed, what the user-input was and what the event type was.

The reduction method uses this information for decision making. The method selects one test case per web page, based on the number of HTML-elements that were touched during a user session. The user session that has touched the most HTML-elements on a web page will be used for testing. The rationale behind this is that the user session that covers the most HTML-elements is likely to detect the most faults.

For the further reduction of the test suite, to make it more applicable to continuous delivery environments, we assigned a score to each web page. This score indicates how important it is to test that web page. With a threshold we determine what web pages should be included. To calculate this score we used two different graph algorithms; the Cluster Coefficient and Dijkstra.

1.4 Outline

This thesis continues with a chapter with background information (2). In chapter 3, we go into detail about the difference between client-side and server-side logging, and how we have implemented logging for the experiments. With chapter 4, we investigate tools that can be used for user-session-based testing. Chapter5elaborates on the proposed method and in chapter6we show the results that we have obtained with the experiments. Those results are discussed in chapter7. Previous work that is related to this thesis can be found in chapter8. The last chapter of this thesis is the conclusion (9).

(8)

Chapter 2

Background

This chapter provides information regarding topics related to this thesis. The main subjects of this thesis are user-session-based testing and test suite reduction. Therefore, we address two sections of this chapter to those subjects. Furthermore, there is a section on the differences between web applications and websites, because this will come into play in chapter 3 on client-side logging. The last section of this chapter explains the oracle problem, and how it is related to this thesis.

2.1 Test suite reduction

Test suite reduction is about reducing the number of tests used for testing the application, while retaining the same (or close to the same) test coverage and fault detection. We want to reduce test suites, because it sometimes becomes impractical and inefficient to execute the entire test suite on the application. Test Suite Reduction is formally described by Harrold et al. [12] as follows:

”Given a test suite T S, and a set of test case requirements r1, r2, ..., rnthat must be

satis-fied to provide the desired testing coverage of the program, and subsets of T S, T1, T2, ..., Tn,

one associated with each of the ri’s such that any one of the test cases tj belonging to Ti

can be used to test ri, find a representative set of test cases from T S that satisfies all of

the ri’s.” - Harrold et al. [12]

2.2 User-session-based testing

User-session-based testing is a technique used for creating scenario tests [29]. Scenario testing goes with multiple names, but the most occurring names, besides scenario testing, are end-to-end (e2e) testing and regression testing. In essence, these tests are used for validating if the application is still working as intended and is working as it did before by interacting with the user-interface.

Scenario tests have proven their worth, but it remains to be, as we experiences it, a time-consuming activity to create and maintain those tests. Fortunately, user-session-based testing can help with creating scenario tests.

With user-session-based testing, we monitor the behavior of the users. Every time a user enters the web application we start a new user session. All the actions performed by the user are linked to this session. A session is ended either by a user when he/she leaves the web application or when the sessions get timed-out.

The user-sessions that are collected can be used to create scenario tests. The logs that were recorded during the user session are turned into commands that are executed on the application. Like this, the recorded user session is replayed as a scenario test. In essence, with user-session-based testing, the users of the application create the scenario tests.

(9)

2.3 Web application versus web site

In 1990 Tim Berners-Lee [4] wrote about a network of linked documents. This network eventually became the World Wide Web as we know it today. Not much later, also in 1990, the first website was published1_{. This was, compared to today’s websites, a straightforward website which consisted}

only of HTML. On this website, they wrote about the possibilities of the web and how it could share knowledge.

As the development of websites continued, developers found out purchases could also be made with web applications. This resulted in the first online pizza order [17] for the PizzaHut in 1991. One could say that this website was an early version of a web application. The PizzaHut website enabled us to do something over the web, which was not just read-only.

The degree to which a user can manipulate the content of a web page can be used to make a distinction between web applications and websites. Websites are mostly read-only and do not allow for any manipulations by the user. Web applications, however, are more complicated. In previous work Sampath et al. [24] described it as follows:

”Web applications may include an integration of numerous technologies, third-party reusable modules, a well-defined layered architecture, dynamically generated pages with dynamic context, and extensions to an application framework. Large Web-based software systems can require thousands to millions of lines of code, contain many interactions between objects, and involve significant interaction with users.” - Sampath et al. [24]

Nonetheless, this being a good definition of a web application, several things have changed since this definition in 2007. The web application used by Sampath et al. [24] is the Bookstore application [11]. This bookstore uses JavaServer Pages (JSP) which are generated in the back-end. The content of these pages can be manipulated, but every change requires a new request to the server to generate a new web page.

Currently, most web applications do this differently. Web applications can perform asynchronous requests with which parts of the web page are fetched from the server. A technique which could be used for such asynchronous requests is Asynchronous JavaScript And XML (AJAX) [18]. A good example that demonstrates these asynchronous requests is the comment sections beneath a YouTube2-video. This section is only retrieved from the server when it becomes visible to the user. The advantages of this type of web page building are performance and data usage. Web pages can be rendered faster when they are smaller and less data is used since the comment section is not requested when the user is not looking at it.

For testing purposes, it is essential to know the type of the website/web application you are testing, since this could influence the techniques you could use. In the remaining part of this document, we will distinguish three types of websites/web applications: a (static) website, web application, or modern web application where modern web applications use asynchronous requests.

2.4 Test oracle problem

The test oracle problem is about finding an automated way to distinguish desirable behavior from incorrect behavior. For the testing process, it is important to find a way to automate this step since without it a human always has to decide on correct and incorrect behavior. Therefore, it will always remain to be the bottleneck of the process. Barr et al. [2] described the Oracle problem as follows:

”Testing involves examining the behaviour of a system in order to discover potential faults. Given an input for a system, the challenge of distinguishing the corresponding desired, correct behaviour from potentially incorrect behavior is called the ’test oracle problem’.” - Barr et al. [2]

1_{http://info.cern.ch/hypertext/WWW/TheProject.html} 2_{https://www.youtube.com/}

(10)

The Oracle problem is of importance to this research because it is used to determine whether or not a user-session is replayed correctly. When test scenarios are created manually one already knows what the outcome should be, and it is clear how it can be validated, but with generated or recorded tests this is more complicated. For this research, we used the following criteria for what a correct replayed user-session should be. First, all the actions of the user-session should be performed correctly. This means that the test tool is able to find all the HTML-elements and can perform the corresponding action on it. Second, a replayed user-sessions should not encounter any errors in the console of the browser. If a single request to the server comes back with an error, the entire test case is marked as failed.

(11)

Chapter 3

Client-side logging

Client-side logging for user-session-based testing was first mentioned by Brooks and Memon [5] as an alternative for server-side logging, but the advantages of it have never been fully explored. In this chapter, more details are provided on client-side logging and how it was applied in the experiments that were conducted for this research.

3.1 Current way of logging

Over the last years, most of the user-sessions have been collected using request logs ([9,23,24]). Every time the client requests something from the server a new record was added to the logs. Depending on the type of web page these logs could vary. For static web pages, a log record consists solely of the requested URL. This is enough information for static web pages since the requested page will always be the same. For dynamic web pages, this is different. The server, based on some variables, generates the HTML for dynamic web pages. Those variables can be written in the form of key-value pairs. If want to collect user-session data from dynamic web pages, we also have to store those key-value pairs which are sent along with the requests.

3.2 Why is server-side logging not applicable to modern web

applications?

In section 2.3, we explained the difference between websites and web applications. We concluded that web applications are more complicated, because web applications, among others, offer users the opportunity to manipulate data.

However, there are also differences between web applications that can be observed. Early versions of web applications were server-side generated based on parameters (key-value pairs) sent along with the requests. This way of creating and generating web pages has changed over the years. In the introduction (1) we already mentioned that there has been a transition of logic from the back-end to the front-end. This transition influences the way we monitor the user’s behavior. The added logic in the front-end makes that we cannot solely rely on the data we gather from request logs anymore. When the application uses front-end logic, there are user actions that will not get noticed when one only looks at request logs.

For user-session-based testing all user actions are valuable and, therefore, cannot be missed. If we want accurate and re-playable user-sessions, we need those actions to be logged as well. The logging method used for this research is a solution to this problem. This logging method is able to log actions that are not necessarily triggering a request to be sent to the server.

(12)

3.3 Client-side logging

As mentioned in the previous paragraph, a different method is used for collecting user-sessions. This is necessary since request logging is not up-to-date with industry’s modern web applications. Modern web applications consist of client-side logic with which the state of the application can be altered. This makes that solely looking at request logs is not enough anymore if we want to replay user sessions accurately.

The logging method that should be used cannot be based on requests anymore, but, instead, should use events. Event logging for web applications was first mentioned by Brooks and Memon [5] in 2007, but they did not use it due to the heavy load of data it would cause. Later research was able to manage the load of data [15, 16], but there was barely anything written about how this was accomplished. The most recent paper wherein event based user-session logging has been applied was by Khanna et al. [15].

3.3.1 Implementation of client-side logging

In this section, we specify how we have implemented client-side logging for modern web applications with JavaScript1_{. This JavaScript library is able to log user-session, including the triggered events,}

and sent it to a back-end.

When the user enters the main page of the application, typically the URL: ’/’, a user-session is started by requesting a new session id from the back-end. This session id is important since it is used for linking actions to a user session. The session id is generated by the back-end, in the format of a Globally Unique Identifier (GUID)2_.

With this session id in place, we can start monitoring the users’ behavior. To monitor the behavior, we need to add events to every element on the web page that might be of interest for the user-session (i.e., buttons, input fields, clickable links). For our purposes, native JavaScript events on HTML-elements suffices. Click-events are added to buttons and other clickable HTML-elements, and FocusOut-events are added to all sorts of input fields. The click event is straightforward and is triggered once the user clicks on something. The FocusOut-event triggers once the user leaves an input field. When the user leaves an input field, we are able to get the value that the user just entered.

All events are applied to the HTML-elements every time the page is fully loaded. Since most modern web applications also use asynchronous web requests (e.g., AJAX [18]) with which they can obtain new HTML, events need to be added to these HTML-elements as well. This (asynchronous) HTML is injected into the web application to make the web application feel more dynamic and responsive, but this is something we had to keep in in mind while developing this JavaScript library.

Our implementation always start a new session on the main page of the web application. A session is also ended once the user returns to this main page. There is also the possibility of a session becoming timed-out. A session is marked as timed-out when there has not been any activity from the user within 15 minutes from the last logged action.

While the user goes through the application, the event-logs are temporarily stored in the local storage of the browser. The event-logs are waiting in local storage to be sent to the back-end where all the other sessions are stored as well. There are multiple moments at which the logs are sent to the server. In this implementation, we configured it to send the logs every 2 minutes to the back-end and any remaining logs once the session is ended.

Once all the logs are stored in the back-end, a final request is sent wherein we close off a session and mark it as completed. If one session is marked as completed, a new session is be started immediately.

3.3.2 Data model

Important for user-session-based testing is the data set since it contains all the collected user sessions. In the previous section we already described how the data is collected, but in this section we show how the data is structured. Table 3.1 and 3.2 clarify in what format the user sessions are stored. These structures can store all the information that is needed for user-session-based testing.

1_{https://developer.mozilla.org/nl/docs/Web/JavaScript} 2_{http://guid.one/guid}

(13)

ID An unique identifier for a session Start Time stamp when the session started

Last update Time stamp of the last update of this session End Time stamp when the session has been ended

Table 3.1: Data structure of a session

ID An unique identifier for a log

Session ID An id which links this log to a session Event type Type of events that might be triggered by the

user. (e.g. click, input & leave web page). Time stamp Time stamp when the user performed this

ac-tion.

XPath XPath [3] to the element that was targeted by the users action.

Value Some HTML-elements have a value. The value of an input field could be stored here.

(14)

Chapter 4

Tools for user-session-based testing

An essential aspect of user-session-based testing is the part wherein the sessions are replayed. While replaying the collected sessions, we try to find errors that are introduced by changes that have been made to the application. Those user-sessions could be replayed by hand, but it would be much more convenient to have a tool which can automate this. One could develop such a tool themselves, but there are already some useful tools available. In this chapter, we investigate some of the tools that are currently available.

The web automation tools can roughly be divided into two groups. The groups are distinguished by the manner in which the actions of the web automation tool are executed on the browser. For both groups, we elaborate on their workings and provide examples.

4.1 Web driver-based

Web driver-based automation tools use web drivers to control browsers. With a web driver, commands can be sent to a browser. Each browser (e.g., Chrome, Firefox & Safari) knows how to process such a command because a web driver needs to be made for each browser specifically. Web drivers are built according to the web driver protocol [30]. This protocol defines how browsers can be approached remotely.

This protocol often calls the web driver a ’remote control interface’. This makes sense since this is precisely what a web driver enables you to do. A web driver enables you to control a browser remotely (i.e., from code), so you can perform automated tasks.

Tools that use web drivers for controlling browsers have the advantage that this immediately enables multi-browser support. If the tool is also adhering to the web driver protocol [30], then, for example, a Chrome driver [10] can easily be replaced with a Safari driver [6].

4.1.1 Selenium

A well-known tool in the industry that utilizes web drivers for web automation is Selenium [25]. The development of Selenium started in 2004, and the tool is still being updated. Selenium has many applications, but is mainly used for scenario and load testing. As said before, tools that use web drivers can be used across multiple browsers. Support for cross-browser testing is one of the main reasons Selenium is still very competitive as a test tool.

The manner in which web drivers work has many advantages (e.g., multi-browser support), but there are also some disadvantages. For Selenium, the web driver is the bottleneck of their performance. The speed at which commands can be executed on the browser is determined by the web driver. Selenium can not influence this.

To demonstrate this we take a simple action, a click of a button in the Chrome browser. It takes the ChromeDriver already several seconds to perform this task. In scenario testing, many of these (seemingly) simple tasks are combined to create a scenario. If a click of a button already takes several seconds, you can imagine that executing an entire scenario can take quite some time.

(15)

On purpose, we called the task above ’seemingly’ simple, because in the background there happens a lot when this task is performed. The first thing the browser needs to do, if not done already, is to navigate to the right URL. When navigated to the right URL the driver needs to wait for the page to be fully loaded. Depending on the network connection, or the complexity of the web page this could vary in time. After this, a request for the specified button can be made from the driver to the browser. With this information, the browser goes through the HTML and looks for elements that match the provided criteria. After all these steps, the driver is finally able to send the last request to perform the click of the button.

All these steps above make that it takes several seconds for a seemingly simple task to be performed by the web driver. To make Selenium (and web drivers) workable in a commercial environment, some improvements need to be made. As mentioned earlier, it is not possible to improve the speed at which a task, and therefore, a scenario test can be executed. However, the time in which an entire test suite can be executed can be improved. Running Selenium on multiple threads is possible. This will enable us to run scenario tests in parallel. One drawback of this approach is that the application under test needs to support concurrent users. Fortunately, this is not be a problem for most business applications.

4.2 Browser-based

Browser-based tools do not use web drivers to instruct the browser. Instead, they communicate directly with the browser. Some browsers have an automation API. Browser-based tools utilize that API to control the browser. Communication via this API is faster than using a web driver.

For this gain in performance, some sacrifices were necessary. Something that was lost is the cross-browser support. The automation API is specific to a particular cross-browser, and, therefore it is harder to make the tool work across multiple browsers.

4.2.1 Cypress

Cypress [8] is a browser-based tool that has been released recently (March 2015). This tool has been built from the ground up so the latest features could be used for maximum performance. According to their website1 _{Cypress runs ’much, much faster’ than Selenium. Most other and web driver-based}

tools run outside the browser and send request remotely. Cypress does this differently by running in the same loop as the browser. This enables for native support and access to the web application. This is also why the tests can be executed ’much, much faster’.

The language in which the test scripts for Cypress are written is JavaScript. The scripts need to be written in JavaScript since Cypress runs on Node.js2. Cypress offers less flexibility regarding pro-gramming languages compared to Selenium. Selenium offers seven different propro-gramming languages in which tests can be written (Java, C#, Python, Ruby, PHP, Perl and JavaScript).

4.3 Tool comparison

To compare the performance of both tools we executed two scenario tests on both Selenium and Cypress. The scenarios we have written targeted the bookstore application [11], which we also used for the experiments. In these scenarios, we login to the application with the guest account and we create a new account. The steps of these scenarios are listed in table4.1and4.2.

The results we obtained (table 4.3) by executing the scenarios with both test tools confirm the statement made on the website of Cypress. Cypress claims to be ’much, much faster’ than Selenium. From the results, it appears that Cypress can run the login scenario 6.7 times faster than Selenium. The scenario wherein the account is created is executed 22 times faster than with Selenium. With this information we can conclude that Cypress indeed runs ’much, much faster’.

1

https://www.cypress.io/ 2_{https://nodejs.org/en/}

(16)

Action Element Value 1 Navigate /login.jsp

2 Input #username guest 3 Input #password guest 4 Click #submit

Table 4.1: A scenario for logging in to the bookstore [11] application.

Action Element Value

1 Navigate /registration.jsp

2 Input #member login Jan 3 Input #member password pass 4 Input #member password2 pass 5 Input #first name Jan 6 Input #last name Janssen 7 Input #email jan@janssen.nl 8 Input #address Streetname 1 9 Input #phone 0123-456789 10 Input #card number 147963 11 Click #submit

Table 4.2: A scenario for creating a user in the bookstore [11] application.

Application Login Create user Cypress 0.77s 2.81s Selenium 5.20s 63.00s

Table 4.3: Time wherein the scenarios in table4.1and4.2can be executed per application.

Despite the impressive performance of Cypress, there are still valid reasons to use Selenium. Cypress is a tool in development and does not have all the features Selenium has to offer yet. One of these features, for example, is the support for finding an element on a web page by XPath[3]. Moreover, cross-browser support is a recurring argument in favor of Selenium.

(17)

Chapter 5

Method

This reduction method is specific for user-session-based testing. In this chapter, we go into detail about the workings and specifications of this method.

The first section explains how the application graph is constructed. This graph is necessary for all further steps that need to be performed in this reduction method. The second section is about the pre-selector. During this step, some of the web pages are omitted from testing because we mark them as not important for testing. The last section is about the test selector. In this step tests are selected for web pages that past the pre-selector.

5.1 Application graph

The first step of this method is to construct a graph of the application. A graph can best be explained as a network of nodes that are connected to each other. A node in the graph represents a unique URL of the application. Two nodes are connected when you can go from one URL to the other. The application graph is the foundation of this method and based on this graph all the reduction is made. The application graph is a directed graph. A directed graph means that the connections between the nodes are one-way connections. If one can go from A to B, one can’ t necessarily go from B to A. If you would want to go from B to A you are required to have a second connection that goes the other way around.

Creating such a graph is a time-consuming task. Furthermore, it is difficult to maintain since it is likely to change when the application changes. Fortunately, we can utilize the recorded user-sessions to construct the application graph. The user-sessions can tell us what pages can be visited and in what order. Those sessions contain enough information to construct the directed application graph.

5.1.1 From user-sessions to application graph

To construct a graph from all the user-sessions, we create a graph for each user-session first. A graph of a user-session contains all the page transitions that have occurred during that session. Nodes in this graph also contain the information about the HTML-elements that the user has used on that URL. This information will be valuable later on when we start reducing the test suite. An example of how two graphs are merged is shown in figure 5.1. The resulting graph is used to merge other user-session graphs with.

With all the user-sessions merged into the application graph, we have an overview of the pages and elements that are covered by the test suite. It is important to note that each node in the application graph still knows what sessions have passed that node.

5.2 Pre-selector

The next step of the reduction process is the pre-selector. The pre-selector, as the name implies, is used for filtering out nodes of the application graph in this stage of the reduction process. The nodes

(18)

Figure 5.1: An example of how two user-session graphs are merged.

that are filtered out by the pre-selector are probably not relevant for the reduced test suite. For this method, we experimented with two different pre-selectors.

The two pre-selectors we experimented with are well-known graph algorithms. The first algorithm is Dijkstra’s algorithm [7]. Dijkstra’s algorithm calculates the distance from one node to all the other nodes in the graph. The second algorithm that we used is the Cluster Coefficient [31]. The Cluster Coefficient algorithm calculates a score for each node which indicates how related nodes are to each other.

5.2.1 Dijkstra

Dijkstra’s algorithm [7] calculates the shortest path from one node to all the other nodes. In this reduction method, we use Dijkstra to calculate the number of page transitions that are necessary to reach a particular page. In most cases ’/’ is the homepage of a web application. We can use Dijkstra’s algorithm to calculate the number of steps from the homepage because we stated that each node of the graph is a unique URL.

However, this knowledge about the number of page transitions has not reduced the test suite yet. To reduce based on the distance we use it as a threshold. Only nodes with a distance from the home page higher than the threshold are included in the next step of the reduction process. This results in a subset of nodes with a distance from the home page that is at least higher or equal to the threshold. The intuitive reasoning behind Dijkstra as a pre-selector is that nodes that are near the homepage are probably navigational pages and/or have features that are not very specific yet. The farther away one gets from the home page; the more specific the content and features on that page will get. We are interested in those features because when we can test those, most of the features on the pages near the homepage are most likely also covered.

5.2.2 Cluster Coefficient

The second pre-selector we experimented with is the Cluster Coefficient [31]. With the Cluster Coefficient, one can express how connected a graph is. A clique [21] will get a score of 1, and a star, which is the opposite, will get a score of 0.

The Cluster Coefficient indicates how many of the node’s neighbors have the same neighbors. For our method, the Cluster Coefficient finds clusters of features in the application graph. A cluster of features will mean that those pages have a strong cohesion. Which implies it will be likely that those features share the same code, and, therefore, we could cover multiple features with one test case without losing much code coverage.

(19)

graph. Only nodes above a certain threshold will get a test case assigned. A higher threshold indicates that only nodes with a high cohesion will be selected for the next step.

The equation to calculate the Cluster Coefficient for one node in the application graph is shown in figure5.1. In this equation, Nv is the number of links between the neighbors of v, and Kv represents

the degree of v.

CC(v) = 2Nv Kv(Kv− 1)

(5.1)

5.3 Test selector

The test selector is the last step of the reduction method. With this step, we select a test case for each node that passed the pre-selector which has been described in the previous section. A user session can only be selected for a node if it performed at least one action on it. Furthermore, a test can only be added to the reduced test suite if it is not already since we do not want any duplicate user sessions in the reduced test suite. We experimented with different test selectors for this reduction method. We looked into the following test selectors: most actions and most covered elements.

5.3.1 Most actions

The most actions-test selector selects a test case based on the number of actions that were performed by the user. A session wherein the most actions were performed (i.e., clicks and field inputs) has most likely used the most features of the application. The number of actions is measured per node. In essence, the most actions-test selector selects the test for a node wherein the most actions were performed on that particular page.

5.3.2 Most covered elements

Most covered elements is similar to most actions, but this test selector looks at the number of distinct HTML-elements that has been touched by the user, instead of the number of user actions. This is different because a user session that consists of many user actions does not necessarily cover the most elements on a web page.

5.4 Experimental setup

In this section, we describe the experimental setup. This includes, among others, the applications and tools that were used to validate this method.

5.4.1 Applications and configuration

The application where the proposed method was tested on was the bookstore application [11]. This is a JavaServer Pages (JSP) application which can be hosted with tomcat1_{. This application is a}

frequently recurring application for testing reduction methods (e.g., [9, 24]).

Like mentioned in section3.3.1, a client-side logging library has been developed to log the events. This small JavaScript-library needed to be included in every JSP-file. The JavaScript-library sends the logged events to a back-end with an interval of 20 seconds. This back-end is able to receive RESTful requests with data structures provided in table3.1 and 3.2. The data that arrives at the back-end is stored in a database with a similar data structure.

This same back-end is used to replay user-sessions in the Chrome-browser2_{. With the Selenium}

WebDriver NuGet Package3 _{we were able to control Selenium with C# code. Instead of generating}

a script that is executed by Selenium, we instructed the browser with commands that are created

1_{http://tomcat.apache.org/}

2_{Google Chrome version 67.0.3396.99 (Official build) (64-bits)} 3_{https://www.nuget.org/packages/Selenium.WebDriver/}

(20)

based on the user session. The commands that can be generated are typically the click and the input command. Each command finds the corresponding HTML-element by XPath and performs its action on it. This is either clicking or inserting the value that was logged during the user session.

When executing the Selenium commands, we monitor if the user-session can be executed like it was recorded. A user-session is marked as ’failed’ when Selenium cannot find an HTML-element, it cannot insert the value, or an error occurs in the browser’s console. This is relevant for monitoring the number of faults that are detected by Selenium.

The code coverage is measured with OpenClover4_{. OpenClover is attached to the JSP-files when}

the application is built. The code coverage-results are flushed to a local OpenClover database with an interval of half a second. When all the user-sessions are replayed on the bookstore application, a report can be generated.

We applied manual fault seeding. Andrews et al. [1] discovered that manually seeded faults are harder to detect when compared to faults seeded by mutation algorithms. Therefore, we think we got this possible threat to validity covered. The manual seeded faults were of different types. We seeded faults in the database queries, logical operators were inverted and ’magical strings’ were mutated.

5.4.2 Metrics

In this section, we list the metrics that have been applied to measure the effectiveness of the proposed reduction method.

• Code coverage [14]

For the effectiveness of the reduced test suite, we want to know what part of the application is covered by the tests. The code coverage is measured with OpenClover. There are different definitions for the term code coverage. In the context of this thesis, we use statement coverage. With statement coverage, we check per line if it is covered or not.

• Detected faults [12, 14]

The number of detected faults is relevant because this can be used to determine how effective the new test suite is. This can be expressed as the percentage of faults which have been detected by the reduced test suite, compared to the original test suite.

• Fault Detection Effectiveness loss (FDE) [14]

This metric indicates how well the reduction method performs regarding fault detection. |F | indicates the number of faults exposed by the original test suite and |Fred| stands for the number

of faults detected by the reduced test suite.

Fault Detection Effectiveness = |F | − |Fred|

|F | ∗ 100 (5.2)

• Fault Detection Density (FDD) [22]

The Fault Detection Density indicates how many faults are detected per user-session. This metric is useful when we compare our results to other reduction methods. Let tfi be the

number of faults detected by ti. Given a set of test cases, ti∈ T and a set of faults F detected

by test cases in T .

Fault Detection Density =tf1+ tf2+ ... + tfn

|T | ∗ |F | (5.3)

5.4.3 Pipeline

At first, we needed to record the user-sessions to create the initial test suite. To collect user-sessions, we asked colleagues to go through the application and order several books, create an account and perform other actions they would typically do in a webshop.

When the user-sessions were in place, we could start with the experiments. Each experiment started with cleaning up the results from the previous experiment. At the start of every experiment, we needed

(21)

to reset the clover database, so a new report with code coverage could be generated without being flawed with data from a previous experiment. Furthermore, we needed to set the thresholds of the pre-selectors according to the configuration of that experiment.

After this, we could run the reduced test suite that was created by the reduction method. When all the user-sessions of that reduced test suite were executed on the bookstore applications, we could get the values for our metrics. From OpenClover we could read the statement coverage, and from the Selenium results, that were stored in the database, we could read how many faults were detected. With this information, we could calculate other metrics like Fault Detection Effectiveness and Fault Detection Density.

We repeated these experiments with different configurations until all thresholds and variations were covered.

(22)

Chapter 6

Results

The proposed method has been experimented with in different configurations. The results of each configuration are shown in this chapter. The results presented in this chapter are discussed in chap-ter7.

6.1 Cluster coefficient

As explained in chapter5, the cluster coefficient is a score that indicates how related nodes in a graph are to each other. With this method the cluster coefficient is used as a threshold to determine which tests should be included in the reduced test suite.

6.1.1 Most actions

Table6.1shows the results of the proposed method with the ’Cluster Coefficient’ as the pre-selector and ’Most actions’ as test selector.

Coefficient Tests Reduction Detected errors Code coverage Lost coverage FDD FDE

0.0 7 61.11% 80% 66.9% 0.14% 0.25 0.00 0.1 7 61.11% 80% 66.9% 0.14% 0.25 0.00 0.2 7 61.11% 80% 66.9% 0.14% 0.25 0.00 0.3 7 61.11% 80% 66.9% 0.14% 0.25 0.00 0.4 6 66.67% 60% 60.2% 10.14% 0.21 25.00 0.5 6 66.67% 60% 60.2% 10.14% 0.21 25.33 0.6 2 88.89% 40% 36.3% 45.82% 0.38 50.00 0.7 1 94.44% 20% 25.5% 61.94% 0.25 75.00 0.8 1 94.44% 20% 25.5% 61.94% 0.25 75.00 0.9 1 94.44% 20% 25.5% 61.94% 0.25 75.00 1.0 1 94.44% 20% 25.5% 61.94% 0.25 75.00

Table 6.1: Pre-selector: Cluster Coefficient, Test selector: Most actions

6.1.2 Most elements

Table6.2shows the results of the proposed method with the ’Cluster Coefficient’ as the pre-selector and ’Most elements’ as test selector.

6.1.3 Charts

Figure 6.1shows the code coverage that has been achieved with both test selectors and the Cluster Coefficient as pre-selector. In this chart your could see that there is no significant difference between

(23)

Coefficient Tests Reduction Detected errors Code coverage Lost coverage FDD FDE 0.0 6 66.67% 80% 67.4% 0.00% 0.29 0.00 0.1 6 66.67% 80% 67.4% 0.00% 0.25 0.00 0.2 6 66.67% 80% 67.4% 0.00% 0.25 0.00 0.3 6 66.67% 80% 67.4% 0.00% 0.25 0.00 0.4 5 73.33% 60% 60.4% 10.39% 0.25 25.00 0.5 5 72.22% 60% 60.4% 10.39% 0.25 25.00 0.6 2 88.89% 40% 36.5% 45.85% 0.38 50.00 0.7 1 94.44% 20% 25.7% 61.87% 0.25 75.00 0.8 1 94.44% 20% 25.7% 61.87% 0.25 75.00 0.9 1 94.44% 20% 25.7% 61.87% 0.25 75.00 1.0 1 94.44% 20% 25.7% 61.87% 0.25 75.00

Table 6.2: Pre-selector: Cluster Coefficient, Test selector: Most elements

both test selectors in terms of code coverage. Figure 6.2shows the number of tests in the reduced test suite for each Cluster Coefficient threshold.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 20 40 60 80 100 66 .9 66 .9 66 .9 66 .9 60 .2 60 .2 36 .3 25 .5 25 .5 25 .5 25 .5 67 .4 67 .4 67 .4 67 .4 60 .4 60 .4 36 .5 25 .7 25 .7 25 .7 25 .7

Cluster Coefficient threshold

Co de co v erage of reduced test

suite(%) Original Most actions Most elements

Figure 6.1: Code coverage of Cluster Coefficient with Most actions and Most elements.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 2 4 6 8 10 7 7 7 7 6 6 2 1 1 1 1 6 6 6 6 5 5 2 1 1 1 1

Cluster Coefficient threshold

Num b er of tests in reduced test suite

Most actions Most elements

(24)

6.2 Dijkstra

The results that have been obtained with Dijkstra’s algorithm as a test selector are shown in this section.

6.2.1 Most actions

Table6.3shows the results from the experiments where Dijkstra [7] has been used as pre-selector and ’Most actions’ as test selector.

Distance Tests Reduction Detected errors Code coverage Lost coverage FDD FDE

0 7 61.11% 80% 66.9% 0.14% 0.20 0.00

1 6 66.37% 60% 66.9% 0.14% 0.23 0.00

2 3 83.33% 40% 58.6% 12.53% 0.20 50.00

3 2 88.89% 20% 56.4% 15.82% 0.30 75.00

Table 6.3: Pre-selector: Dijkstra, Test selector: Most actions

6.2.2 Most elements

Table6.4shows the results from the experiments where Dijkstra [7] has been used as pre-selector and ’Most elements’ as test selector.

Distance Tests Reduction Detected errors Code coverage Lost coverage FDD FDE

0 6 66.67% 60% 67.4% 0.00% 0.23 25.00

1 6 66.67% 60% 67.4% 0.00% 0.23 25.00

2 2 88.89% 40% 56.4% 16.32% 0.30 50.00

3 2 88.89% 40% 56.4% 16.32% 0.30 50.00

Table 6.4: Pre-selector: Dijkstra, Test selector: Most elements

6.2.3 Charts

Figure 6.3shows the code coverage that has been achieved with both test selectors and Dijkstra as pre-selector. In this chart your could see that there is no significant difference between both test selectors in terms of code coverage. Figure6.4shows the number of tests in the reduced test suite for each threshold.

(25)

0 1 2 3 0 20 40 60 80 100 66 .9 66 .9 58 .6 56 .4 67 .4 67 .4 56 .4 56 .4

Dijkstra distance threshold

Co de co v erage of reduced test

suite(%) Original Most actions Most elements

Figure 6.3: Code coverage of Dijkstra with Most actions and Most elements.

0 1 2 3 0 2 4 6 8 10 7 6 3 2 6 6 2 2

Dijkstra distance threshold

Num b er of tests in reduced test suite

Most actions Most elements

(26)

Chapter 7

Discussion

In this chapter, we discuss the results that we have obtained with the experiments that have been conducted. Furthermore, we reflect on the usefulness of this research and elaborate on the threats to validity. Finally, we provide answers to our research questions.

7.1 Results

In the section below we discuss the results that we have obtained during the experiments and have written about in chapter6.

7.1.1 Pre-selectors

In chapter5 we have explained that the pre-selectors are responsible for determining what pages are important for testing. In essence, pages that are important for testing consist of distinct features and have a strong cohesion with other pages. Both pre-selectors seem to perform well. However, there is some difference that can be observed.

Dijkstra

The Dijkstra algorithm gives us the distance for each node from the home page. We think of this distance as the number of page transitions that it takes to reach a particular web page. This distance is used as a threshold to filter out some of the web pages. In our experiment, we have looked at different thresholds to determine what works best in terms of test suite reduction.

While conducting the experiments, we observed that Dijkstra’s results are better than expected. However, the rationale for using Dijkstra as a pre-selector is still based on several assumptions. One of these assumptions is that pages further away from the homepage provide more specific features (those specific features we are interested in for testing). For some applications, this will be true, but there are most definitely applications for which this is not true.

Nonetheless, the results of the Dijkstra algorithm on the bookstore application are good. The code coverage turns out to be higher than Concept Analysis and the other methods in the synthesis table8.1. The downside of this pre-selector is that it might only work for applications similar to the bookstore application. Future work should clarify this.

Cluster Coefficient

The Cluster Coefficient is the second pre-selector that has been investigated. The Cluster Coefficient is an algorithm which can determine the cohesion between pages by looking at the edges of a graph. The Cluster Coefficient will be calculated for each node in the graph. With a threshold, we determine for what pages a test case should be selected.

The Cluster Coefficient seems to perform better than Dijkstra since it is able to achieve the same code coverage with fewer user-sessions. Where Dijkstra needed seven user-sessions to achieve a code

(27)

coverage of 67.4% the Cluster Coefficient only needs six sessions. This makes that the Cluster Coef-ficient pre-selector perform better.

Moreover, we think, by intuition, that the Cluster Coefficient will be applicable to a broader set of web applications since it actually looks at the cohesion of web pages. Instead of making the assumption about the cohesion like we did with the Dijkstra algorithm.

Thresholds

In this section, we will look at the thresholds that have been used for the two pre-selectors.

The code coverage measured with Dijkstra as pre-selector seems to hold well if the thresholds are changed. The difference between the lowest and highest is only 10%, as indicated by figure6.3. The fault detection goes down very gradually. Every step 20% of the detected faults is lost.

The threshold for the Cluster Coefficient has been chosen between zero and one with an interval of 0.1. Figure 6.1 shows that code coverage and fault detection up to 0.3 is very high with almost no loss in code coverage and fault detection. From this point, it slowly declines. Code coverage gradually lowers to around 25%. The number of faults that are detected by the reduced test suite goes down a bit faster. Especially, between the coefficients of 0.5 and 0.7.

From the results, we can conclude that the thresholds do not have to be used as static values. From the figures in chapter 6 we can see that the code coverage and fault detection go down when the thresholds are changed, but it never becomes unworkable, especially the code coverage. This provides us with some great opportunities for continuous development.

Most industry software development teams have something like a build-pipeline. Every time a change has been made, and this has been submitted, the code will go through this pipeline. The pipeline will validate the code and the working of the application, by compiling it, executing unit tests and, possibly, scenario tests. To keep it practical, a pass through this pipeline should be kept short, since the code will be submitted multiple times per hour.

As mentioned before, a run of the entire test suite on the application can take quite some time, and this would be impractical for the pipeline. If you have a large application with many nodes/web pages, some of the thresholds might still result in a test suite that is too large. Therefore, one could set a threshold that fits those passes through the pipeline wherein the time is still manageable, and the effectiveness of the reduced test suite is still acceptable. From table 6.1 and 6.2 we see that coefficient 0.6 has the best ratio between reduction and effectiveness of the test suite since it has a Fault Detection Density of 0.38. When preparing for a release, one could change the threshold again for a more thorough test of the application.

Changing the threshold of the pre-selectors provides the ability to change how thorough you want to test the application. This will be a handy feature that fits in an Agile development environment wherein you want feedback on your work quickly.

7.1.2 Test selector

The test-selector has the responsibility for selecting a test case for each of the remaining nodes of the application graph. Two types of test selectors have been used to find the best results: most actions and most elements.

Comparison

The results confirmed the intuitive reasoning wherein we said that ’most elements’ would be slightly better than ’most actions’. The ’most actions’ test selector does not work as well because a session with many actions does not necessarily go through a lot of the features of the application. For example, one could be clicking between two pages several times without doing much. This makes the user-session less interesting for testing purposes. Most elements seem to score better on both code coverage and number of faults detected.

(28)

7.1.3 Code coverage

The results of the proposed reduction method are above our expectations. Primarily, the low loss of code coverage is something which was not anticipated. Therefore, it is essential to find out how this came to be.

The significant difference between previous reduction methods, like Concept Analysis [24], and this reduction method is the use of the additional information that comes with client-side logging. Prior reduction methods try to find the minimal set of tests with which they cover all the URLs and requirements. None of these methods look at what happens during those user-sessions and what elements on the web page are touched/visited. The proposed reduction method can look at what happens during the user-sessions because the logged user-sessions have more details (event logs instead of request logs). This additional information enables for more thorough decision making.

We believe that having more information and, therefore, being able to make a better decision about what user-sessions to use to test the application with is the explanation for the higher code coverage.

7.2 Threats to validity

No work is free of issues and threats. In this section we address the issues we have encountered, and what we have done to mitigate this.

7.2.1 User-sessions

Despite having the same application and metrics as used in previous research for validating the proposed method, the comparison is still challenging to make. The main reason is that we have not got the same user-sessions that other researchers have used. The used user-sessions for reduction can be of influence on the effectiveness of the reduction method. For example, a few large user-sessions can achieve the same code coverage as many small user-sessions. It is for this reason that multiple metrics have been applied to make it as best comparable as possible. However, to make the best comparison, one should use the same user-sessions to compare the reduction methods. Nonetheless, we are confident that our results are trustworthy. In the end, the test suite consists of the same use cases; create an account, search for a book and, order a book.

7.2.2 Thresholds

In section 7.1.1 we already talked about the thresholds and how they worked for the bookstore application [11]. In this section, we will talk about the potential applicability of those thresholds to other applications.

The distances we used as a threshold for Dijkstra are very likely to specific for the bookstore application. We have used distances between zero and three as thresholds. Three as a maximum, since there was no page further away from the homepage. With the experiments, we have shown that Dijkstra can be used as a pre-selector. However, if one wants to use it for a different application the effectiveness of the distances needs to be explored again.

In contrary, the thresholds used for the Cluster Coefficient are likely to be application independent. The calculated value for each node depends on the neighbors of a node. The size of the graph will not affect the values calculated for each node. This makes that the Cluster Coefficient pre-selector is application independent. The effectiveness of the thresholds on other applications needs to be investigated in future work.

7.2.3 Test application

Despite, we said in chapter 3 that the bookstore application [11] is not representative anymore for the modern web applications in industry, we still used it to test our reduction method. We used the bookstore application to show the backwards compatibility of the newer client-side logging method. Moreover, testing our reduction method on the bookstore application was better for the comparison with other reduction methods.

(29)

Another issue is that there are not many open-source modern web applications. A modern rich web application should have a dynamic and responsive user interface, which is driven by JavaScript. Most of those applications use a JavaScript framework to support this.

7.3 Reflection on usefulness

A contribution of this thesis, is the comparison of Selenium and Cypress. Cypress is a new tool that can be used for automated testing. In chapter 4 we explored the possibilities of both tools and did a performance test. In these experiments, Cypress showed great potential. However, it missed some important features we needed for our reduction method. For example, the possibility of finding an HTML-element by its XPath. Cypress has only just been released (March 2015) and it still heavily in development. Future work might be able to take advantage of this (open-source) automated test runner.

Moreover, with this thesis, an implementation for client-side logging is provided. In previous work [5,

15, 16] client-side logging was mentioned and applied, but no details on the implementation were provided. The description and details about how client-side logging was implemented are also part of the contributions of this thesis.

This thesis also showed how the additional information, gathered by client-side logging, can improve decision making for reduction methods. Selecting user-sessions based on the number of elements and user actions has not been done before. This implementation of leveraging client-side logging for developing a reduction method might be relevant for future work.

7.4 Limitations

While developing the reduction method and client-side logging implementation we came across some limitations. Those limitations will be addressed in this section.

A limiting factor of our reduction method is the way we construct the application graph. We construct the application graph based on the user-sessions that have been recorded. This means that the application graph only knows about the features that have been used by the users. Features that are not used by the users and, therefore, not recorded, cannot be tested. However, one could argue that unused features are not relevant for testing since the user will not notice it when an error occurs there.

The second limitation concerns the client-side logging implementation. The JavaScript library assumes that all user actions are performed within the application. Only clicks and inputs can be observed. Actions that are executed by the browser, or by the user via the browser cannot be recorded. An example of this is direct navigation via the browser during a user-session, since the logging implementation assumes all navigation is done by clicking and value input within the application. This is something that needs to be improved in future work.

7.5 Answering the research questions

In this section, answers to the research questions are provided.

• Research question 1 - What benefits does client-side logging provide, compared to server-side logging, for user-session-based testing reduction methods?

Client-side logging is different from server-side logging since client-side logging records the events that are triggered by the user, and server-side logging records the request that are coming in on the server. Event logging has the advantage that the user’s actions are logged in more detail. Request logs, on the other hand, only show us what page a user has visited and what information has been submitted, in the form of a POST-request.

The additional information that can be collected with client-side logging will enable a reduction method to make more thorough decisions about the usefulness of a particular user session.

(30)

It needs to be said that server-side logging is easier to implement. Most servers have a built-in option for request logging. This makes server-side logging easier to use. Client-side logging mostly requires a handmade implementation.

• Research question 2 - How could the benefits of client-side logging be leveraged to create a reduction method?

With this research question, we aimed at finding interesting information in the event logs, which is not available in the request logs. It turns out that two aspects are interesting for the reduction method; the number of actions that were performed by a user and the HTML-elements that were touched during the session. When in doubt which user-session to choose to test a particular web page, these two pieces of information turn out to be helpful.

The performance of the reduction method, based on these additional pieces of information, turns out to be performing very well. The results that are obtained during the experiments seem to be higher than the results of other reduction methods in the synthesis table8.1.

• Research question 3 - How could the reduced test suite be reduced even further, and what is the impact of this on performance?

In this research, we also wanted to investigate if it was possible to apply sampling on the reduced test suite. This means that we want to reduce the reduced test suite even more, with the consensus that code coverage and fault detection will go down. Sampling of the reduced test suite will become helpful in environments wherein continuous development/testing is applied.

With Dijkstra and the Cluster Coefficient, we have calculated a value for each node in the applica-tion graph. Experiments have been conducted with different thresholds. The height of this threshold determines how thorough the application is tested. This seems to work best with the Cluster Coeffi-cient, since, as noted in the discussion, the Cluster Coefficient calculates the actual cohesion, instead of assuming it as we did with Dijkstra.

To determine which threshold works best, the Fault Detection Density is helpful. This metric shows the ratio between detected errors and reduction. In a development pipeline, one wants to have as few tests as possible, which detect as many errors as possible. From table 6.1and 6.2 we can conclude that a threshold of 0.6 works best since this will create a reduced test suite with a FDD of 0.38.

(31)

Chapter 8

Related work

In this chapter, we elaborate on information and papers related to this thesis. First, we investigate other reduction methods that were used for user-session-based testing. The results that were achieved by these methods will be put together in a synthesis table. This table shows the results of methods that were compared and validated in previous work with the addition of more recent papers.

Moreover, we elaborate on methods that are closely related to our approach. The reduction methods we looked at are Concept Analysis, HGS, Greedy and a genetic algorithm.

8.1 Synthesis table

Table 8.1 presents the results obtained by previous research [26], which we filtered based on our requirements. The properties of a test suite reduction method we are interested in are the reduction rate, code coverage and percentage of detected faults. Also more recent papers were added to this table.

%-of original suite %-method cov %-stmt cov %-cond cov %-faults detected

Random* 4 54 59 31 89

Greedy* 1,5 54 55 29 75

HGS* 1,5 38 37 18 51

Concept* 4,5 54 59 35 92

GA [19] n.a. n.a. n.a. n.a. 80

Table 8.1: Results obtained by previous research [26] are marked with *, filtered using our set of requirements.

• %-of original suite

The percentage of the original test suite shows how many tests cases are selected from the original suite.

• %-method coverage

The percentage of methods that is covered by the reduced test suite. • %-statement coverage

The percentage of statements that is covered by the reduced test suite. • %-condition coverage

The percentage of conditionals that is covered by the reduced test suite. • %-faults detected

The percentage of faults detected by the reduced test suite in comparison to the number of faults detected by the original test suite.

A Test Suite Reduction Method for User-Session-Based Testing With Client-Side Logging