tApp: testing the future

(1)

tApp: testing the future

by Karsten Westra University of Groningen

March 17, 2014

(2)

Change log

Version Date Author Comment

0.1 01-02-2013 Karsten Westra - Initial version/plan.

0.2 16-07-2013 Karsten Westra - Added layout for application chapter (6).

- Illustrated basic layout using example. Minor detailed description of general workflow.

0.8 29-08-2013 Karsten Westra - Changed title "Specification: tApp" to "tApp:

Testing the future".

- Added change log.

- Added acknowledgment.

- Expanded applications chapter (6).

0.8.1 30-08-2013 Karsten Westra Chapter6:

- explained device package - explained device browser.

0.8.2 05-09-2013 Karsten Westra - Explained and implemented report browser and graphs.

- Stakeholder summary.

- Added references chapter 0.8.3 07-09-2013

t/m09-09-2013

Karsten Westra Expanded chapter 7.7 with detailed cases:

Monkeytalk demo app, Fit for free, LG Klimaat.

0.9 10-09-2013 Karsten Westra Written discussion.

1.0 11-09-2013 Karsten Westra Written conclusion 1.0.1 02-10-2013

until 05-10-2013

Karsten Westra Thorough review.

1.1 07-10-2013 Karsten Westra Added chapter with related work.

1.2 09-10-2013 until 11-10-2013

Karsten Westra Thorough review of references to chapters, goals, figures and tables.

1.3 14-10-2013 Karsten Westra * Added specification, distribution, reporting layout to chapter5(previously known as implementation).

* Reviewed chapter6+ added references to relevant requirements, figures and sections.

1.4 15-10-2013 Karsten Westra * Reviewed size and placements of figures in chapter6.

* Reviewed and added references to figures 1.5 16-10-2013 Karsten Westra Merged Chapter 5.4 with 6. Rewritten

implementation chapter and called it application.

Rewritten introduction and reviewed references.

(3)

Version Date Author Comment

1.6 17-10-2013 Karsten Westra Reviewed old applications chapter and renamed it

’experiment’.

1.7 17-10-2013 Karsten Westra Review and rewrite of chapters7 and8.

2.0 18-10-2013 Karsten Westra Reviewed and rewritten discussion and conclusion.

2.1 22-11-2013 Karsten Westra * Changed structure of chapter 2.7 into Specification, Distribution and Reporting.

* Added 7.1.2, 7.1.3 and 7.1.5 to discussion.

* Renamed reference labels.

* Changed al usages of you by a user, a developer and so on.

2.2 19-01-2014 until 12-02-2014

Karsten Westra Refined chapter2:

* Added general comments about automated testing and how and why to tackle fragmentation with it.

* Elaborated on how other researchers tackle fragmentation.

* added section on the "perfect solution" in conclusion subsection.

(4)

Acknowledgements

First of all I would like to thank my main supervisor (Alex Telea) for giving the large amounts of feedback and suggestions. He really helped me with the process from idea to working prototype.

Thank Peperzaken, the app development company who I work for. Thanks a lot for having the patience to let me finish my study next to work. And thanks for lending me a workspace setup with test devices and apps. It really helped me to try everything on a range of devices with existing apps.

Thank to my family who always supported my choices and wishes. You motivated me whenever possible when things got really challenging.

Thanks to you all for helping me get to where I am now!

(8)

Abstract

Mobile phones are a fast growing technology on the market. Nearly everybody has a smart phone nowadays. There is an astonishing amount of different device types to choose from. It is important to adequately test an app on these device types. Owning them all is not feasible either.

There are two possible solutions: own a subset of devices on the market or try to potentially reach all of them without owning them. We presents a solution in the form of a testing tool called tApp. This tool makes execution of a test on a device without owning it possible. Furthermore we tried if we can group certain sets of devices and predict behavior of software based on similar device types. We elaborate on the entire process from ’blank’ app to test to inspecting results and automating the execution process. The final goal was presenting these results in an insightful way.

(9)

1 Introduction

It is difficult for developers nowadays to develop an app for a mobile platform that runs flawless on every device out there. This is caused by the sheer amount of different types of devices that exist on the market. When wanting to test an app for iOS, the mobile Operating System (OS) designed by Apple, it comes down to about five different phones and three types of tablets, which makes a total amount of eight devices. The more difficult case of testing an app for Android, the Mobile Operating Systems designed by Google, has significantly more devices on the market. This is often referred to as fragmentation.

The amount of different devices types (phones and tablets) on the market that have the Android operating system created by Google running on it is high. It is so high that an adequate overview of these types is very difficult to obtain. Let us begin with the notion that there are at least ten manufacturers that create different device types. Each of them has about five device types on the market (10 manufacturers * 5 devices = 50 Devices).

Then it becomes apparent that this is five times more than the eight devices types Apple has on the market. And this is a very rough estimation of how many devices running Android exist today. This estimation is far from accurate. There are probably more.

The next difficulty to note is that manufacturers of device types that run Android each have there own slightly different version of the Android OS. Apple also releases new versions but the amount of different OS versions is not as high as with Android. Accu- rate testing and/or trying to guarantee that a developed app works on all devices is an amazingly expensive and time consuming task.

Figure 1: Mobile testing in practice

Business experience reflects that testing an app consists of a list of actions that have to be performed in an app in a certain order. A test subject receives a device type and a

(10)

list of actions. These actions are executed on this device and strange behavior is noted.

When this is all done then a test subject has to repeat it all again with the remaining 49 devices. The first problem here is that this approach is incredibly time-consuming. This, in turn, creates economic problems e.g. passing such costs to customers.

With this notion we also have to conclude that it is not feasible to own all these 50 devices.

Owning them all would get rather expensive. Another issue is that a device only shows that an app crashed. It does not show what piece of code causes this crash. But this is unfortunately what a developer wants to know. If a less technical person is the tester than it is difficult to pass on this information to a developer.

1.1 Scope

We argued that the testing process of mobile apps is an expensive process. To decrease this process we propose a software testing tool called tApp. tApp will try to simplify and generalize the testing process with different Operating Systems and device types. With detailed test reports tApp will help developers to see what goes wrong in an app they built. These test reports will also show when and where things go wrong. Furthermore we will try if tApp can predict behavior of an app on a certain device type, or group of device types, based on past results. Ultimately we want to try and define a statement about future test results based on a smaller set of device types.

1.2 Goal

tApp aims at providing the infrastructure, tools, and techniques allowing developers and end users to create and use a "bidirectional device-to-app stability mapping". The mapping encodes whether a given app runs stably on one or more devices, and conversely, which apps are run stably by a given device. End users can use this mapping to assess the stability of an app on their device types. Developers can use the mapping to assess the stability of their newly developed apps. The next list summarizes the top-level requirements. They are given a number so we can refer to them later. A example might be reference G1 , which refers to Goal 1. The top level requirements for this mapping are as follows:

• Ease of test result inspection (G1): end users and developers should have an easy-to-use way to query the mapping, e.g. find out which apps are stable on a device type, on which device types is a given app stable, and similar;

• Level of detail (G2): the mapping (and presentation) should allow browsing the contained information at several levels of detail, e.g. by organizing device types, applications, and stability reports in a hierarchical manner;

• Queries (G3): besides the above simple queries, the mapping should support more advanced queries, such as finding our apps, or device types, which behave similarly with respect to stability;

• Ease of test execution (G4): developers should have an easy-to-use way to add new stability information about their app(s). In particular, test-running the app on a family of device types should be a lightweight process, which is executed as

(11)

automatically as possible, and which does not require physical ownership of the device types.

• Scalability (G5): the presented solution should be scalable to accommodate a large number of apps, users, and device types (conservative bounds are tens of such instances).

An end user should not care why an app will or will not run on their device type, but only cares if it runs on their device type. To be able to answer such a question of the end user, we need to say something sensible about an end user his/her device type with respect to what we have seen before in the available test cases. This could be achieved by defining equivalence classes. These classes divide device types into groups that should show similar, stable, behavior. When a user then wants to know if an app runs stably on their device type we only have to compare the information of his device to the known device types and test cases in tApp’s database. Based on this we can predict if an app will show stable behavior. Chapter4elaborates further on what we mean by stable behavior and equivalence classes.

1.3 Thesis structure

This document proposes a tool that tries to solve the difficulties with testing that exist in the mobile app development field. We start by looking at existing tools on the market an describe their features in chapter2. After looking at existing tools we identify stakeholders of the system in chapter3 and explain the requirements that each of these stakeholder might have. Chapter3 explains on a conceptual level what problem tApp addresses and clarifies how the stakeholders fit in the picture. After that chapter 4 proposes a design and in chapter 5 we elaborate on the implementation of the test tool. To verify if the approach works we discuss possible applications of the app in the form of use cases of the test tool in chapter6. When this verification is completed we discuss what parts of the requirements tApp covered and which proved to be difficult in chapter7. Finally in chapter8we conclude on tApp as a framework and its results.

(12)

2 Related Work

In the previous chapter we mentioned that testing a mobile application on the abundance of device types on the market is time-consuming. We begin by looking at activietis in the acadmic world to see if we can find existing theories, solutions and/or ideas that can be used to propose a possible solution that tackle the fragmentation issue in the mobile testing field. Besides that we look at existing test automation tools currently available on the market that could simplify this testing process. The most well known tools used are monkeyrunner, uiautomater, robotium, Telerik test studio, seetest and monkeytalk. Finally we will explain what each of these tools offers a user and assess their strengths and weaknesses with respect to our goal. Finally that we assess a "perfect solution" and what kind of skill set a user of the "perfect" tools needs to be able to work with them.

2.1 Automated testing

An important question for a developer to ask to himself is if it is feasible to put a lot of effort in black-box testing of an application. There are only few references to theories and solutions to tackle the fragmentation issue by automating the test execution process. This could be explained by the fact that smartphones are relatively new. Another reason could be that a lot of solutions might work but prove to be difficult in use or prove to be hard to implement. At the time of writing there is no out of the box solution that solves all our challenges.

The software development industry begins to see that testing is important to stay at the top of the ranking of the ever growing amount of apps in different app stores. A few negative review give a lasting impression that a brand will never survive. It is also virtually impossible to recover from a bad name. This makes a well tested app essential to its success. To achieve a perfect status there really is only one theoritcal solution: test an app on all deveice types that exist in the world. We already mentioned that this is not feasible. Baride and Dutta[2], Dubinsky and Abadi[3] and Haller [1] all note that a good solution currently might be a cloud based test platform. However the decision which range of device types is suffcient to cover a significant part of device types that are used in the world is a difficult hurdle to take. The perfect solution would be to connect every device out there to a "test cloud" wirelessly and chooce a representative subset of them all to cover all relevant devices to test with.

2.2 Tackle fragmentation

The main challenge that we noted is the fragmentation issue. There are many different combinations of device types running different versions of an Operating Systems built by different manufacturers.

These abundant amount of device types make developing an app for these platforms very time consuming. Sice different devices type have different device traits it becomes vital it is tested properly.

Untested apps might lead to low ratings in an app stores (if they do not function properly. Which inevitably means no user will use an app.

2.2.1 Proposed solutions

Baride and Dutta[2] propose a cloud based system with emulators and real devices to accurately test a mobile app. This approach indeed has the advantage that a developer can test an app on a phone connected to a cloud. They also mention that there are many aspects of an app thatshould be tested.

Business based app (or apps that communicate over a network) are vastly more complex and require more extensive testing. Another usefull notion is that an automated script should be abstracted

(13)

from the UI of the app since there are to many different devices running different operating system.

However automated testing is the solution according to Baride and Dutta[2].

Dubinsky and Abadi [3] have made an assesment of what parts of testing in mobile development are important. They list all the issues that need to be addressed and propose an agenda on how to tackle these key drivers. Many of them point out that we have to cope with many device type platforms and diversity of device types running these platforms.

A solution proposed by Ridene and Barbier[4] is a Domain-Specific Modeling Language (DSML) which they call MaTEL (Mobile Applications Testing Language). A modeling language in which they can uniformly specify the behavior of an application which is in theory platform specific. Their solution offers a smart theorie on how to control sensors in a device type (e.g. Wifi, GPS, etc.). They approach changing a sensor setting as a set of actions on a device leading to the correct setting of a sensor. Navigating a phone is basically the same as navigation through an app.

A very detailed description of the test process of a mobile application is described by Haller [1]. The trade off between different types of tests, time spend to test extensively and user reviews obtained afterwards is an extremely difficult process. Haller notes that to make this process easier to grasp a developer wants to automate different parts of this testing process.

2.2.2 Useful theories

A possible solution that all researcher propose is some sort of cloud based test platform that connects real devices to a test bed and, remotely, start tests on these device types. This could be done by attaching simulators/emulators to some sort of cloud. But the large amount of different device types prove in practice that they do not accurately mimic the behaviour of all device types on the market.

This due to the different amounts of (small) difference in versions of the operating systems out there.

Since device types are so different you could argue that a useful testing tool needs some sort of general specification of a test. A test can then be (automacilly) executed on a device connected to a cloud. With an analysis of the current screen and applied image processing a test tool could determine UI elements. These UI elem and gestures performed on them are in theorie a solid platform independent script executable on a device type. Such a specification could be writen down by using a DSML. Another approach is to use natural language to descibe these scenarios instead of a Domain Language. One could argue that a DSML is a more structured language for test specification. If one however has the goal to save a developer time, natural language can be used to let someone else then an developer write down a test.

There are a lot of proposed solution on how to create a testing framework, preferably in a cloud, that automate the execution of a test on a certain device type. Nobody every really elaborates on the step after test execution. How do we acurately present results from a test so that an analyst immediately sees what parts were successful or failed in a test run. The amount of device types on the market is large. This would seem that it is a very important part of the testing process that should get some extra attention. There however is no real list of references of researchers who elabotrate on how a test report should look like. This could be because there is no perfect solution on how to automate test execution on a representative set of device types.

2.3 Existing tools

When a developer would look for a (commercial) tool that makes automated application testing possible then there are a few tools that offer some of the theories explained in the previous section.

We will now list some existing testing tools and their functionalities. We furthermore can see im-

(14)

plementations of some interesting theories which are proven to useful and feasible to implement in practice.

2.3.1 Monkeyrunner

Google provides a tool [5] for app developers in which they can test their apps. This tool is called monkeyrunner. It tests apps on a functional level. Google offers an API, written in python, to their device types. This means that monkeyrunner can initiate gestures like pushing a hardware button or executing a gesture on a touch screen if it is present. It is not so much coupled to an actual User Interface of a certain app. It is more of a remote device controller. That is execute an action on a hard- or software part of a device type and see what an app does if this is executed. Monkeyrunner runs outside and independent of an app. It is thus not necessary to have the actual source code of the app under test. It works more or less out of the box.

Monkeyrunner is started from a developer’s workstation. A test written in python can be executed on multiple devices connected to that workstation. A developer can create a program which installs an app, then runs it, sends different gestures to it and takes screenshot during the process. A device must be connected to this workstation using the Android Device Bridge (or ADB) provided by Google.

ADB is an interface between a workstation and a device type. A workstation can communicate with the operating system running on an Android device type through ADB.

Monkeyrunner is specifically designed with Android in mind. A user needs to be a prog rammer to use of monkeyrunner. The fact that it runs outside of an app gives it potential for very generic specification of what to test. It works quite well but focusses to much on controlling the device rather than controlling the app under test. In the previous chapter we mentioned that physical ownership of many device types is expensive. Monkeyrunner needs a connection to ADB which means that physical ownership is required.

2.3.2 UIAutomator

Another tool [6] that Google provides for developers using Google’s operating system is uiautomator.

This testing framework gives developers opportunities to efficiently create automated functional tests for multiple android device types. It supports Android API level 16 which effectively means version 4.1 and above. Many device types currently on the market are still using older versions of Android.

This means that using uiautomator already excludes a large group of device types we can actually test.

Uiautomator requires a connection to a workstation through ADB. With respect to monkeyrunner, uiautomator focusses more on controlling the UI instead of controlling the device type it runs on.

It is still based on the same principle of controlling the device. But uiautomator gives more control in testing a user interface than monkeyrunner does. The API that uiautomator offers has more functionality that lets a user select a certain User Interface element like a button.

Another very strong feature that uiautomator offers is that a part of the tool shows a visual analysis of a user interface. It does this by creating a screenshot and analyzing the views that are currently visible. Selecting a specific element is easy since the tool shows which are available and information about their properties. It also shows what type they are and where they are located on the device type’s screen. uiautomator relies on accessibility support of Android. Components are identified by the text on labels and the content description of a UI component. This requires specific knowledge of the apps UI structure and naming of the UI elements. Al test are packaged in a single jar file which makes reuse of parts of the scripts difficult.

(15)

uiautomator is, like monkeyrunner, a good and simple tool for testing an app by controlling a device type it runs on. An improvement over monkeyrunner is that it focusses more on the UI. It gives more freedom in where to execute a gesture on a touch screen. A downside is that it only works on higher versions of Android.

2.3.3 Robotium

Robotium [8] is open source testing framework for Android apps. An added bonus over monkeyrunner and uiautomator is that in the latest version it also supports testing hybrid apps. These apps use html for the UI instead of the native features Android offers. So Robotium can also test PhoneGap based apps.

Robotium is focussed on the UI of an app. A developer sends gestures to an app using a simple java framework. This framework can send all kinds of actions to a specific UI component in an app. This is different the monkeyrunner and uiautomator. They could only send a gesture to a certain point on a touch screen. Robotium can control a UI component without exactly knowing where it is located on the screen. It propagates this to Robotium and lets the framework search for the component and execute the gesture there. Robotium can also verify certain properties on UI components because of this.

A developer knows how his/her app works. A developer can write a test to verify a certain result of a screen that appears after a certain button is pressed. This gives a lot more flexibility than monkeyrunner and uiautomator offer. Because of this the API Robotium offers is more simple then the APIs offered by monkeyrunner and uiautomator. Monkeyrunner and uiautomator depend on the developer to execute gestures on the location of a UI component.

For Robotium to work a Android device type needs to be connected to a workstation through ADB. This means we need physical ownership of the device. It is not necessary to have the source code of the app undergoing the test. So Robotium can also test apps that are pre-installed by the manufacturer of a device type.

The results of a test is a unit test report. This comes down to a list with the created tests and some marks that show if it is successful or not. It does not take screenshots like monkeyrunner and uiautomator. The visual result in the form of a screenshot of a device is useful to immediately see what is successful or not.

2.3.4 Seetest

An interesting commercial tool created by Experitest takes a somewhat different approach. This tool is called Seetest [7]. It takes a visual approach with respect to mobile testing. They offer support for testing multiple platforms that have device types on the market. Besides Android and iOS they also offers support for Windows Phone and Blackberry. They rely on the accessibility features of the platforms they run on to recognize UI elements and execute actions on them. They however also introduce another way to search and find UI element. They use image recognition of the screen.

It does this by recording gestures on a connected device type. A device has to be connected to a workstation or VPN cloud as Experitest calls it.

The source code of the app to test is not necessary. A developer can simply provide an application binary and record tests from it there. Specification of a test is done in a visual editor. A user can simply connect the device, provide a binary and push a record button. All actions executed on the device are recorded by the tool. A developer can distribute his tests to al the supported platforms.

So a specified script is portable enough to be executed on all device types running the supported platforms. This makes it a very strong tool for testing the same app on different platforms.

(16)

Reports of the tests are presented as a list of the recorded actions. This list shows the command as executed on the device, a symbol indicating its status and a screenshot after execution. It also shows a highlighted area on the screenshot.This is the area where the user executed his action. Better said it is the UI element that Seetest identified when it was recording.

Another interesting feature that Seetest offers is the ability to set up a device hub. With this device hub a costumer of Experitest can create a private cloud with device types connected to them to test on. This hub can be accessed from anywhere in the customer’s VPN. This potentially encloses all the device types that are available at the customer for testers to work with. It would still not enclose all available device types in the world. But one could envision creating a subset of device types that cover what is generally available on the market.

Experitest created an interesting tool that takes the visual approach in UI recognition and and script specification. However the image recognition do not always make it somewhat difficult to work with. The correct UI elements are not always correctly recognized. Distributing tests over different device types running different platforms is very interesting. The added feature of creating a user’s own device cloud make it a tool worth investigating. Down side, it is a commercial tool that comes at a price.

2.3.5 Telerik Test Studio

A different approach in testing an application is used by Telerik. They offer a test studio for iOS apps called Test studio [9]. This tool can reliably test native, hybrid and web apps. A test can be created on the device type itself. There is an app available in the app store that gives a developer the possibility to specify a test. Through this app a user can specify a set of actions to be executed.

They are executed in order. The results are then sent back to a web portal. The test specifications can also be sent to the web portal. The app that telerik offers can also synchronize scripts that were created on another device type. The cross device type playback of a recorded script and the web portal synchronization give flexibility with respect to ownership of the device. In theory anyone with the credentials for an account can execute the scripts that are connected to that account.

Telerik explicitly mention on their website that they do not use image based detection to find a UI element and execute actions on them. They use object based recording. It would be easy to assume that they use the accessibility features of iOS like Robotium uses from Android. Using image based recognition like Seetest would mean that a redesign of the UI of a developer’s app would lead to a redesign of the test specification. However UI redesign does not necessarily mean redesigned functionality of an app.

The web portal stores and show all the created reports of a certain app. It lists the crash reports of an executed test. Besides that it also gives a simple overview of how many test succeed or fail.

It does not show screenshots of the screen of the device type under test like monkeyrunner and uiautomator do. In term they show a list of commands and show if it succeeded or failed. Another welcome feature is the overview of all reports that have been executed. This gives a simple overview of which tests have been executed where and what were their results.

2.3.6 Monkeytalk

The last tool we will look at is Monkeytalk [10] created by Gorilla logic. They offer an open source tool that is free to modify and use. It offers support for tests across different platforms. They support the largest native platforms, namely iOS, Android. At time of writing these are the largest platforms on the market. They furthermore also support testing websites and web apps. Tests can be executed on all the supported platforms with a provided Integrated Development Environment (IDE). Tests

(17)

are written in a simple command language. This language is a simple form of english in which a script can be specified. Another scripting language in which a developer can specify the test is javascript.

Another strong point is that a script can be recorded directly from a device. A developer pushes a record button and the commands come back and are combined in a script. It uses accessibility features to recognize UI elements in these scripts. A recorded/specified script can be executed on all the platforms that monkeytalk supports.

Since developers are busy people it seems like a good idea to leave test specification to other parts of a development team. However this needs a side note. Writing down a test script that works on all supported platform is rather difficult when someone does not consider naming conventions for the UI elements across platforms. A none developer generally might not know this. Since there are different fallback methods like, content description and text on labels makes it possible to give all UI component names that are understood by all platforms. This however needs to be carefully communicated with all developers across a team. This generally requires more effort than other tools.

To execute a test a user has to prepare an app with a submodule (or so called agent) that listens for the commands to execute. Having the source code is required for this tool to work. The agent is a basic network component that listens for test commands through a network connection. This requires a little bit of extra preparation. But in return a developer can execute a test whenever they want wherever they want. Given that the IDE can reach it through a network connection. This makes it a very strong tool. In theory a developer could execute a test on all the device types in the world given that it has a network connection. In practice there is no network connection that covers the entire world. As briefly mentioned before an app has to be reachable by the IDE.

The reports that monkeytalk provide are a html pages with different tests on it. The reports show the commands that were executed and if they were successful or not. When a command fails a screenshot is taken of the screen at the time of an error. This gives a good visual overview that makes it rather easy to see what went wrong. Failures are detected faster then when having to read through a stack trace that the and Android IDE gives when a developer is debugging the source code of an app.

2.4 Conclusion

After looking at different fields of research we should have enough information to define a "perfect solution" of a testing tool that tackles the fragmentation issue. There is no out of the box solution yet. However existing implementations of certain theories tell us that it is feasible to develop a tool that covers the needs of a developer of an app. A good statement to make here is that to create a tool that makes testing less time consuming is very time consuming. The perfect solution has an encridibally large feature set. From which some features have not been thought out yet in great detail.

We have however learned that the perfect solution is a cloud based tool that preferably connects all the device types in the world. A test run can be started from a device and remotely. In the report that a test run presents is immediately clear what went wrong in the app under test. It furthermore gives an overview of behavior of an app across different device types.

We looked at some of the different tools that are available on the market for testing purposes. We furthermore look at how they implement the theories we found before looking at these tools. They all differ in support of different platforms, methods of recognizing UI elements and presenting results.

Some support only Android or iOS. Others chose portability and support script execution across multiple platforms. We will now briefly summarize the core features we will use in our implementation of tApp. More on tApp in chapter 3.

(18)

2.4.1 UI element recognition

Monkeyrunner and UIAutomator take the approach of controlling a device instead of the actual UI of an app. With respect to these two Monkeytalk takes the approach of focussing on controlling the app instead of the device. The feature that Telerik’s Test studio and Monkeytalk offer in the form of actual script specification (or recording) on an actual device makes script specification easier and more visual. When offering support across different platforms the only challenge that remains is detecting the same UI element in the same app on a different platform. Many of them succeed in this by using accessibility. However none of them recognize everything correctly close to 100% of the time.

2.4.2 Presentation of issues

An approach that nearly all tools take in presentation is listing executed commands and their statuses.

Showing a screenshot at time of error and success, like monkeyrunner, uiautomator, Seetest and Monkeytalk show, makes inspecting reports rather visual and intuitive. What all the tools seem to do is present report overviews in a web portal. This gives an overview of what happened in a test on a device type. What all tools appear to be missing is what happens across these different devices.

How does an app behave across different device types? It might work well on some, and not at all on others. However no tool really shows this in an overview.

2.4.3 Cover representative group of device types

Most of the tools we examined need to be connected to a workstation for testing to be successful or even start. Seetest takes an interesting approach by giving the ability to setup a private device cloud.

But this is still limited to the device types that the organization who wants to setup this cloud owns.

Monkeytalk takes an interesting approach to the distribution of a test. The network component that listens for commands ad sends result gives flexibility to where the device is in a network. This makes the range of where a device can be and who owns it larger with respect to a usb cable connected to a workstation.

2.4.4 Nice to haves

A combination of all the features that all the tools offer are fairly useful. However there is no tool that offers the right combination of features to make it tackle the fragmentation problem that exists.

Support across multiple platforms is welcome but this requires some compromises in development of an app. UI element recognition does not appear to be trivial. A welcome web portal that shows app behavior across platforms and device types is not really present. Connection to a workstation is not ideal if a developer wants to test an app on all the mainstream device types on the market.

Monkeytalk developed by Gorilla Logic is closest to what a developer wants in a test tools. In chapter 3 we will discuss the basic principles we have seen in the available tools to specify and execute a test. We will discuss why the current tools lack some features to effectively tackle the fragmentation problem. Besides that we will propose a new tool, based on Monkeytalk’s ideas, that combines the best features from all examples we have seen in the existing tools.

(19)

3 Concept

Given the testing challenges mentioned in the previous chapter, our main research question is as follows:

How can we provide tooling support that reduces the costs, and enhances the quality, of testing mobile apps on the Android platform?

We will answer this question practically and pragmatically, by developing such a tool called tApp.

Since we cannot cover the entire list of potential requirements emerging from the above research question, we focus on a subset of these. Specifically we aim to provide efficient and effective ways for users and developers to explore the correspondence, or mapping, between a set of tested apps, and a set of device types. This offers high-level insight into how the various aspects of tested apps behave on the tested devices. As such, our redefined research question is

How can we provide insight into the test results of a set of apps on a set of device types?

Concrete examples of the use-cases covered by our proposal are answering questions such as: Which aspects of a given app work well (or not) on a set of device types? Which is the set of device types where a given app aspect passes (or fails) testing?

To begin to understand how we can get an answer to this research question we first need a conceptual flow from test specification to test report. This chapter explains a test pipeline that separates different steps in tApps workflow. It furthermore defines the stakeholders of the test tool, the requirements that they have and a priority which we use to prioritize the work that needs to be done.

Based on the observations in chapter2the functionality tApp offers is separated in three different components. Figure2 below shows the main pipeline of the most ideal test tool. Each of the roles uses the system in a different way. Furthermore a stakeholder might not be interested in all of the output that each component offers. Therefor the next section identifies three stakeholders.

Figure 2: Test pipeline

tApp offers a three step process for testing a mobile app. A developer starts with the specification step. In this step one or more scenarios are written for a certain app. A scenario in this case is a list of actions to be executed on the device. Think of actions like pushing a button, and more. We’ve seen multiple approaches in chapter 2 for this specification. We could specify a scenario in code, record from a device itself or specify it in English.

The scenario and the executable app are combined and exported to the distribution. This is a component which is responsible for multiple actions. They do not need to be dependent on a platform. In the ideal tool a scenario can be specified once and executed on all supported platforms.

(20)

When looking at Monkeytalk and Seetest, we noticed that some sort of web portal is useful for a good overview of a set of executed tests. tApp adopts this notion by offering a distribution perspective. In this perspective a tester can register a device, browse all registered devices and prepare and enrich an exported specification from the specification step. The package can be enriched with a selection of known device types to run on, a set of sensors that can be switched on or off. The distribution finally collects the results and stores them.

An analyst is the final stakeholder in tApp’s workflow. He or she can access test results from various executed test scenarios. He can furthermore query the result set and see how an app behaves on different device types. They other way around is also possible. An analyst can see how a device behaves with different apps. A choice is offered to visualize these results so that it is clear which devices and/or parts of an app need more reviews from a developer.

3.1 Description of stakeholders

Because of different use of components in tApp we identify three stakeholders. This division implicates that tApp will probably incorporate or offer three different views for tApp. These views can be used separately by the stakeholders. The three stakeholders and a description of their concerns can be found in table1 below.

Stakeholder Description

App developer A developer that wants to create tests and find out if his/her app shows stable behavior on a wide range of devices.

Tester A test donates some time with his/her device to give developers the possibility to test the behavior of an app.

Analyst Customer, for which the app under test was built, who wants to see a test result about which devices can run an app and explains behavior in a set of scenarios.

Table 1: identified stakeholders

3.2 Stakeholder requirements

The general system requirements are decomposed further on a per-role basis. Table2 below refines the general requirements for each of the roles identified in the previous section. Furthermore the requirements link back to the goals given in chapter1.2. The G and the major number before the dot refer to goals in chapter 1.2. The minor numbers are sub goals that cover a small part of the main goals.

(21)

Stakeholder Requirement

App developer Interoperability/Extensibility (G5.2): a test should be executable on the largest available operating systems on the market (Android, iOS).

Usability (G4.1): It should be possible to easily specify a test suite that is (automatically) executable on many devices.

Customizability (G2.1): A developer can customize device options of a test package for optimal test coverage (manipulate sensors and network connections).

Portability/Interoperability (G5.1): A developer should be able to publish tests which are executable on most of the available devices on the market.

Tester Usability:

1. (G2.2) See relevant information on the behavior on a combination of apps and devices.

2. (G4.2) Run a specified test with as few actions as possible.

Scalability (G5.3): Run a test on many devices without difficulties.

Analyst Usability:

1. (G2.3) See whether the app that is released is stable enough to see the light of day.

2. (G1.1) tApp should show clear and easy to read test reports Testability:

1. (G1.2/ G2.4) Get a overview of flaws that exist in the app to be released.

2. (G3.2) Get an overview of stability of similar apps on the same devices with respect to behavior of apps on similar devices in past test results.

Level of detail (G2.5): an analyst should be able to change the level of detail that a report shows with as few effort as possible

Simplicity (G3.1): See behavior of an app on a specific device with respect to other devices.

Table 2: requirements for each stakeholder

3.3 Summary

There are three stakeholders that interact with the system. The app developer has created an app and want to easily create a test, run it on a device customize device settings and publish these to the world in the specification perspective. The tester wants to run the actual exported scenarios and see a general overview of a test on devices in the distribution perspective. Finally an analyst wants to be able to see an overview of app behavior and details of a single run in the reporting perspective.

That is globally what table1 and 2 describe. Table 2 is used extensively in the remainder of this

(22)

thesis to keep track of what we covered from these requirements.

(23)

4 Design

In the previous chapter we identified or main research question and identified the stakeholders for tApp. It focusses on developers and testers of mobile apps that want to guarantee that their app is functionally perfect. Model 3 below proposes a design for a possible implementation of tApp.

This chapter will furthermore explain the global functionality that each component/subsystem in the model offers. We will extensively use flow charts to illustrate all the different components of the tool. We will point out the workflow that guides the route from the specification of a test to getting desired test results. We will explain how each component in figure 3 should be used. This model below shows the test pipeline from the previous chapter in more detail. A detailed list of all the requirements can be found in appendixA.

Figure 3: tApp’s proposed architecture

(24)

4.1 General data flow

The three steps in the test pipeline defined in chapter3can be seen in figure3. A developer creates a specification in an editor. Here he/she also inserts a location to an executable app. The app and specification are exported to the distribution system in a test package. This package is further explained in chapter4.1.2. In the distribution a developer can prepare his test for publishing and enrich the scenario with specific devices to run on and settings for these devices. A tester can list and run tests from the device package. It shows all published projects provided by the distribution system and gives the option to download and run a project.

All data necessary for running a test is retrieved from the distribution in the form of a test package. When the test is finished results are stored in the distribution system. This supports our goal to support easy of test execution mentioned in chapter1.2. When a test is specified, distributing it is achieved by selecting a test to execute from a list and push start to actually execute it.

The reporting system retrieves the results from the distribution and visualizes the results for the analyst. These visualizations support our bidirectional device-to-tested-app mapping introduced in chapter1.2.

4.1.1 Project and versioning

To distinguish between different scenarios and their results we introduce a project in tApp. An app and a specification can change. This also means that the test results from certain scenarios can become obsolete. A project is a component that stores a certain scenario and test results that it receives.

When a specification and/or app change the previous test results are archived. This introduces the notion of a version and with it history. A version is a combination of a specification, executable app and test results. When the specification of a scenario changes the old version is replaced by a new one.

The project is a container that stores these versions and makes the distribution able to distinguish between them. The distribution gives an analyst the option to visualize the history. This is useful for a good overview of how certain problems with a version of an app are fixed in newer versions.

This improves the easy of result inspection with respect to different versions of an app introduced in chapter1.2.

4.1.2 Test and device package

To make access, storage and listing published projects easier tApp introduces two structures that make it more convenient to handle test execution and specification . We start by defining a test package. This is a container that wraps a test specification, executable app and devices and settings.

These can be downloaded and executed on a device. A mobile device should be capable of listing and running the available test packages. For this a device package is introduced. This package lists al the published tests and gives the option to start an app. Before running a test it also notifies the distribution system that it can expect test results in the future. This is to prevent test runs without test results in the distribution component.

4.2 Detailed data flow

When a developer starts tApp he first has to create a project, or open an existing one. When a project is opened a user sees a view which contains three perspectives. A developer can choose which of the perspectives below he/she wants to open.

• the specification perspective (decomposed in section4.3.2).

(25)

• the distribution perspective (decomposed in section4.3.3).

• the reporting perspective(decomposed in section4.3.4).

Each perspective is a step in the test pipeline explained in chapter3 and is further decomposed in the remainder of this chapter. The following flow chart in figure4shows the basic workflow in tApp.

There are a few actions that contain a ’*’. These actions are decomposed in chapter 4.3. They explain the usage of the different perspectives and their workflow in more detail.

Figure 4: General workflow

A developer should start in the specification perspective. In this perspective it is now possible to specify a scenario. This can be a script or a suite. In a suite a developer can add multiple smaller scripts to be executed in sequence. This can be done by recording actions on a device. A developer can switch between editors at will. More details on the specification perspective in section4.3.2.

When a developer has exported their scenario in the form of a test package he/she can switch to the distribution perspective. It offers the possibility to publish the test package that they just

(26)

created. Optional device settings can be added to the project for a selected scenario. Besides device specific setting he/she can add a range of device types in tApp’s database that he wants to run his scenarios on. They could also set permissions to (dis)allow other devices to run the test for optimal customizability. Everything is now ready to publish the test. More details on the details of the distribution perspective in section4.3.3.

Finally there is the reporting perspective. In this perspective an analyst can look at al the test reports that are available. He/she can customize what kind of reports he want to see and also the level of detail as introduced in chapter 1.2. Think of behavior of an app on different devices or different devices running an app. More details in section4.3.4.

4.3 Refined design

This chapter explains different necessary components. They consist of refined workflows from the previous sections in this chapter.

4.3.1 Create a project

Initially a user has to create a project or open an existing one. The developer initially has to provide a project name, the platforms it wants to creates tests for and a connection to one or prepared app binaries for each platform. This makes it easier to query the mapping as introduced in chapter1.2.

The following model refines the action "Create new project" illustrated in figure5.

Figure 5: Create new project

(27)

4.3.2 Specification perspective

In the specification perspective a developer and/or tester can specify his tests in an editor. A developer can specify a test by:

1. recording actions on a physical device.

2. writing a script.

A device could be connected to tApp through a network connection. A developer adds a device by entering its IP address. tApp and the device can now communicate. tApp can record scripts through this connection. The developer can push a record button. When this button is touched record mode is on. The device recording is converted, under the hood, to a script. In the script view a developer can make changes to the scenario by changing this script.

Figure 6: Specify test

The changes made in script or by recording can be viewed realtime in all two editor views. This written scenario can then be executed on all devices by exporting it as a test package. More on this in section4.3.3.

(28)

4.3.3 Distribution perspective

When a developer has specified a test he/she now has to publish this test to the world. The specification perspective gives the option to export a test package. With a switch to the distribution perspective he can tweak the publish settings. A developer can set extra options like a specific device, or set of devices, to execute the tests on. The only requirement here is that these devices have to be registered with tApp. Beside choosing a device, there is also an option to choose different sensor and connection settings of a device that should be switched on or off during the test. Model 7refines the action "Prepare test publish" in figure 4.

Figure 7: Publish test

With the push of a button a developer can then publish the scenario including the device settings.

In the distribution perspective a developer can prepare everything that is necessary to start a public test run with this exported test package.

When a tester wants to run a published test tApp asks to download a device package that runs the tests and sends back test results to the data collection component. A specific test can be downloaded as a test package that contains the published executable scenario. The returned data is stored and analyzed by the distribution perspective which is able to show the test result to the user. This approach mainly supports goal 1 and 4 in chapter1.2, namely ease of result inspection and easy of

(29)

test execution.

4.3.4 Reporting perspective

After running a scenario the reporting perspective gives an analyst an overview of the results of all (past) tests and devices they were executed on. When a test is completed an analyst should see in the blink of an eye what works good or not. Besides that it should be able to get an overview of devices on which the app does (not) work. tApp should additionally be able to show a report that can be presented to a customer or user who might have less technical knowledge. Model8 below shows how this supports the goals ease of result inspection (G1) we mentioned in chapter1.2.

Figure 8: Show test report.

The test results that were obtained by the test runs are displayed in a simple but detailed manner.

This makes sense as we defined the level of detail as one of our main goals in chapter1.2. A tester is able to view different types of reports of the test runs. The report can be customized based on the viewers needs. This is useful to make querying (Goal 3) the mapping and controlling the level of detail (G3) defined in chapter1.2 easier. The test viewer shows:

• Details per device: which app runs stably and which do not.

• Details per app: which device runs an app stably and which do not.

• Details per OS version: which version are capable to run an app in which are not.

4.4 Summary

In this chapter, we proposed a design for a test tool based on the concept described in chapter 3.

We deducted this concept from features that existing tools, described in chapter2, offer. We defined

(30)

three perspectives that support specification, distribution and reporting of tests. We have taken ease of result inspection (G1), controlling level of detail and querying the mapping (G2 and G3) and ease of execution (G4) into account. With these goals and the design based on them in mind we will explain the necessary components for a successful implementation in the next chapter.

(31)

5 Application

In the previous chapter we proposed a design for our test tool. It devised the tool in three perspectives;

namely the specification, distribution and reporting perspective. This chapter gives an overview of the decomposed functionalities that tApp offers an elaborates on what is necessary to come to an implementation based on our design from chapter 4. We start this chapter by defining what a scenario or test is in tApp. When we have this definition we explain how this scenario flows from empty scenario until report. We explain how a developer can specify a report. After that we explain to distribute this specification. And finally this chapter elaborates on how a report viewer is implemented. During this process we keep the concept and goals goals from chapter3in mind.

5.1 Specification

We repeatedly mentioned the word scenario or test in the previous chapters. However the definition was kept abstract. Previously we identified a scenario as a test that has to be executed on a device type. Concretely a scenario (in tApp) is a is a Monkeytalk script, or chained list of scripts called a suite. This script contains commands that can be executed on a device type. Desirably the commands should be platform independent to support execution among multiple platforms. We mentioned this in requirement G5.1 in table 2. In chapter 2 we mentioned monkeytalk as one of the tools that follow an interesting approach to the test cycle. Because of the approach Gorilla logic takes with monkeytalk [10] and the fact that is open source we will use their system to implement the features of tApp we identified in the previous chapters.

Monkeytalk scripts support multiple platforms. Our initial research question mentioned Android as our primary focus. However keeping multiple platform support open for the future is useful. A script can be specified by recording actions on a device [11] or writing in monkey script [12] and javascript [13]. A script or suite are stored in respectively a mt file or mts file. These are files with mt(s)extension. The mt stands for MonkeyTalk. The extra s stands for suite. These scenarios can be exported in a test package. The exported package can be executed on many devices. A test or scenario should be seen as a set of chained actions that fulfill a task in an app.

5.1.1 Test package

To start a specified test as we defined in the previous section, a test package is downloaded by a device type. This is done using a device package which is explained in section5.2.2. This test package contains all the data that is specific to an app and the test that is about to be executed. The test package contains:

1. Location to a binary. An URL to where the app binary that is used for testing purposes can be found.

2. A test suite. This suite contains the test scripts that a developer has specified. These can be executed on the binary.

3. Result location. An url that contains the location of where the results should be sent to.

4. A list with device settings. This device list contains settings that should be toggled on or of for the current scenario.

The device package then runs the test projects based on the data in the test package. When the scenario is executed it collects the results and sends them back to tApp’s distribution unit. This

(32)

system has a data collector that stores the test results. The reporting component uses the information collected by the data collector. More details on the device package can be found in section5.2.2.

5.2 Distribution

When a developer has specified a scenario that meets his/her requirements it has to be distributed to available device types. This section explains how a specification, or actually an exported test package, can be distributed to registered devices. This device package approach makes test execution easier and improves scalability as mentioned in requirements G4.2 and G5.3 in table2.

5.2.1 Test execution

As mentioned earlier tApp offers functionality to specify and distribute a test scenario for a mobile app. This test can be executed given a binary app of some platform. For the actual execution of the test scenarios there is instrumentation available for monitoring the recorded data. Afterwards the result data is stored in tApp’s distribution system. Before starting to think about how we can look at reports we first need to define how monkeytalk can test for us. To export a test a developer needs to provide a prepared binary. This binary is downloaded and started by the device package.

The device package then executes the test in a background service provided by Android.

5.2.2 Device package

For the connection between tApp and a device we need to define a component that is able to register the current device type, download and start an app, run a test on the device and send back test results.

The device package is a communication component between a device and the tApp system. This is typically and app that runs on a device type. Starting the package initiates an initial registration of the device for storage in our database.

The device package can browse through published tests, start a test, collect device data and send back results. A device package contains:

1. a component that collects device info. To be able to predict behavior and divide mobile devices in equivalence classes we need information about hard- and software. This can be obtained through the android.os.Build [14] component.

2. a component that can change device settings. Think about (re)running a test with different sensors (dis)engaged. Think about turning sensors and network connections on or off.

3. a component that collects test results. This component stores test results in such a way that they can be send back to tApp for analysis. With this information a test report is created.

4. a component that can run a scenario. A server like component that can send requests with commands that initiate the automated test from the binary.

The device package shows all public test runs. Specific test settings and properties are downloaded in a test package when a tester starts a test. A tester can choose to participate in a test. A message is shown to a user which can then (dis)allow participation in a test. This approach helps us simplify requirement G4.2 and requirement G5.3 from table2 on easy of test execution with as few actions possible.

(33)

5.2.3 Collected device information

When a tester creates an account he/she gets the possibility to connect the current device type to their account. This step analyzes the device type’s information. This analysis of a device only needs to happen once. When a test package is downloaded the device package only needs to check if the current device is the one that the user connected to his/her account. If it is an unknown device, tApp need to register another device and analyze its information. The distribution system processes the registration data and stores it. The report viewer can then generate test reports based on the users desires using the collected data from different devices. tApp will try and collect the following info if available:

1. Hardware

(a) CPU type and speed.

(b) Memory type and amount.

(c) GPU type and speed.

(d) Screen dimension and resolution.

2. Software

(a) OS information. Think of version, amount of memory, CPU power.

(b) Fingerprint that identifies an OS build. Could be used for equivalence division purposes.

(c) Device manufacturer and type.

All this information is used to generate a more detailed test result later on. This approach simplifies requirement G2.2 from table2 about finding relevant information of device types on the market.

5.3 Reporting

When a developer has specified a test and a tester has provided a prepared binary everything is ready for execution of the test. The device package gives the tester control when execution of a test.

The collected data from these tests can be examined in the reporting perspective. We first start by defining why an analyst wants to view a report in the report viewer. And besides that how an analyst can query the data tApp obtained from executed tests. We first define what the main goal of tApp is and how we can see this in a report. Then we elaborate on what the device package measures and how. Finally we define some boundaries of what types of reports tApp is able to show.

5.3.1 Stable behavior

What a developer ultimately wants is stable behavior of his/her app. This means that the app runs like the developer specified it in his/her requirements. A tester should be able to perform a tasks and actions that a developer implemented without the app crashing or slowing down. The user should not be obstructed by the app in his/her ability to finish a desired task. An app is stable if it succeeds in the majority of the specified tests in tApp. What this majority is needs to be assessed by the analyst. Because of this the reporting perspective must show reports that are easy to read.

(34)

5.3.2 Verify expected values of components

A test in monkeytalk is execution of a set of commands [15] on a device. When these are successful a test is considered successful. A command can be a physical action like pushing a button. It can however also be the verification of a certain piece of data in the app. Monkeytalk gives testers the ability to verify expected values. When a test command is issued two types of verification occur:

• Does a component exist in the view of an app.

• Do certain expected values of these components match the ones that appear in the app.

For the verification of expected elements monkeytalk offers different verification commands to test these values. These commands offer support for overviews of flaws (G1.2/G2.4) that exist as mentioned in table2.

1. Verify - Verify that the component’s value is equal to the argument.

2. VerifyNot - Verify that the component’s value is NOT equal to the argument.

3. VerifyRegex - Verify that the component’s value matches the Regular Expression provided in the argument.

4. VerifyNotRegex - Verify that the component’s value does NOT match the Regular Expression provided in the argument.

5. VerifyWildcard - Verify that the component’s value matches some wildcard expression provided in the argument.

6. VerifyNotWildcard - Verify that the component’s value does NOT match some wildcard expression provided in the argument.

5.3.3 Execute native code

Monkeytalk offers more support in inspecting behavior on a specific device type as requirement G3.1 in table2 dictates. For reaching the deeper, harder to test, areas of an app monkeytalk offers the execution of native code. A developer can add an action in his/her app to read values in user settings and preferences. An Android app can call certain functions on a custom class. The only requirement there is that the class is available in the app under test. With this custom code one could think of verifying values in storage, settings and everything the native platform offers. This gives more insight in flaws of an app that cover more parts of an app then visually available.

5.3.4 Boundaries

So to conclude we will summarize the boundaries that tApp has when testing apps. tApp has different levels of test reporting:

• Minor detail results: Runs OK/Crash.

• General detail results: Verify expected values of views.

• High detail results: Test expected values not visually present on the screen.

tApp: testing the future