From monoliths to microservices

(1)

The decomposition process

Dennis Kruidenberg

kruidenbergdennis@gmail.com

August 28, 2018, 49 pages

Supervisor: Ana Oprescu

Host organisation: Avanade Nederland,https://www.avanade.com/nl-nl

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

5 Evaluation and Results 27 5.1 Evaluation. . . 27 5.1.1 DDD Sample App . . . 27 5.1.2 Experiments . . . 27 5.1.3 Use Cases . . . 28 5.1.4 Suggesting Services. . . 28 5.1.5 Experiment setup. . . 28 5.2 Results. . . 29 5.2.1 Automated input . . . 29 5.2.2 Manual input . . . 30 5.2.3 Combined input . . . 31 6 Discussion 32 6.1 Research Questions . . . 32 6.2 Threats of validity . . . 33 7 Conclusion 34 7.1 Future Work . . . 35 Appendices 36 A Byte Code Injection 37 B Generated JSON files 40 C Suggested Services DDD Sample app 43 C.1 Texual representation of suggested services . . . 43

C.1.1 Automated input . . . 43

C.1.2 Manual input . . . 44

C.1.3 Combined input . . . 44

C.2 Decomposition Questionnaire rationale . . . 45

C.2.1 Automated input . . . 45

C.2.2 Manual input . . . 45

(4)

Abstract

Many companies are migrating their applications to the microservices architecture to increase their scalability and manageability. During the decomposition process, which is a prerequisite for the mi-gration, an application has to be split up into multiple services. Currently, this is a complex, manual and unstructured task that is relying on the expertise of the system architect. Many current research efforts address the need for a structured methodology to make the process more reliable and objec-tive. This thesis builds on this topic by investigating how the decomposition process can be (further) automated and how the suggested microservices can be validated.

In order to automate the process, we need to be able to describe the monolithic application as complete as possible and pass this information to an existing decomposition tool. We made use of static and dynamic analysis to describe the monolithic application. Static analysis was used to find the relations between classes. The dynamic analysis was used to find out how the program behaves during a use case. We also made use of information available on GitHub to find out how often the code is changed and by whom. Our research found that the generated input was comparable with the manually generated input. Our implementations will have the advantage over manual input when used in large and complex projects.

During the literature study, we found metrics that are used to validate microservices. However, we concluded that the metrics should be used as a guideline for the system architect to detect possible flaws in the decomposition.

(5)

Chapter 1

Introduction

As the cloud is becoming the platform of choice to run large-scale applications, multiple organisations are experiencing drawbacks. For example, in August of 2008 Netflix experienced a major database corruption which interrupted their DVD shipment process for three days [2]. This lead to the decision that vertically scaled single points of failure, such as their relational database, needed to be replaced with highly reliable, horizontally scalable, distributed systems in the cloud. Other organisations such as Amazon, Uber and Groupon also migrated to a microservices architecture to overcome drawbacks of a vertically scaled architecture[3].

A microservice architecture is a cloud-native architecture that realises applications as a set of small services. The term ”microservice” was first discussed in May 2011 at a workshop of software architects [4]. The term was used to describe what the participants saw as a common architectural style they have been researching. The principle of microservices emerged from the industry and was not intro-duced by a single organisation or party. The microservice architecture provides multiple advantages such as proving better scalability, having economic benefits and improved organisational structure. However, there is no free lunch in computer science, the challenges of a distributed system such as storing data, testing and combined up-time also need to be addressed.

The companies that choose to migrate to the microservices face a difficult task. The process of migration from a monolithic application to microservices is large and difficult [5]. The decomposition process is one of the many steps in the migration. This step is a prerequisite for the migration process that has been identified to be an extremely challenging and complex task [5]. The process is usually done by hand. Researchers are addressing the decomposition challenge by providing a structured methodology. One data-driven approach splits business logic into microservices [5]. Another decom-position tool provides three formal coupling strategies based on traditional software decomdecom-position and applies this to microservices [6]. The decomposition process can also be solved in an iterative manner wherein each step the granularity of the network is changed to find the best performing composition [7]. Literature and industry-driven requirements form the basis of the Service Cutter [8]. The Service Cutter is a decomposition tool that uses these requirements to create a graph that resembles the connectivity between attributes. This graph is clustered to suggest the structure of the microservices.

(6)

1.1 Research Questions

This research addresses the challenges of the decomposition process by providing an automated way to decompose an application with the help of the Service Cutter. We do not intend to improve the Service Cutter. We use this framework to provide the actual cuts. Furthermore, objective metrics are needed to validate the suggested services. These observations lead to the following research questions:

RQ1: How can we automate the decomposition process? RQ2: How can the suggested microservices be validated?

1.2 Contributions

This research effort brings the following contributions:

• The research helps to mitigate the challenges of the decomposition process by providing a semi-automated decomposition process based on the Service Cutter. Our implementation can be adapted to work with other decomposition tools or projects

• The services that are suggested by the decomposition tool are validated with the metrics found in the literature.

The provided automated decomposition process means that the industry has better guidance in the design phase of the migration process and will be useful as more and more companies are moving their system to the cloud-native microservices architecture.

1.3 Outline

Chapter 2 provides the theoretical background of the microservice architecture by highlighting the advantages and challenges. The contributions done by other researchers regarding the size and the migration process will be discussed in Chapter3. This chapter also critically looks at decomposition tools and what metrics are used to validate microservices. Chapter 4 describes the research process and elaborates the decision made during the project. In Chapter 5 we will show the result of a decomposition of an application. Chapter6discusses the results and answers the research questions. Chapter7 concludes the research and states possible future work.

(7)

Chapter 2

Background

In this chapter, we will discuss the theories behind microservices. This information can be used to help understand the research. We present the definition of a microservice and where they are used. The characteristics of microservices are also explained by highlighting the advantages and challenges.

2.1 Cloud Computing

Cloud computing is a term used to describe the action of using shared system resources to perform computations or host applications. Instead of running an application on local hardware, the applica-tion runs on a collecapplica-tion of online, virtualized resources owned by a company. These resources can be accessed on demand and include both computing and storage capabilities [9]. Examples of cloud computing providers are Microsoft Azure[10], Amazon AWS[11] and Google Cloud[12]. Cloud com-puting brings many advantages such as the fact that the resources are dynamic and can scale when demand increases. This creates a flexible and efficient environment for an application. However, this efficiency is not always possible for a monolithic application. Due to the nature of such an interwoven system, if one component requires more computing power, the whole system needs to scale[13]. The system wide scaling includes the components that do not require extra resources. This defeats the purpose of the flexible and efficient environment of cloud computing. The microservice architecture tries to overcome this problem as we will discuss in Section2.3.4.

2.2 Microservices

The term ’microservices’ is gaining popularity in the cloud computing environment[14]. The mi-croservice architectural style is based on a simple principle; each mimi-croservice has to do one thing and do it well. Microservices are small individual components that together form a distributed system. Microservices can be deployed on local hardware but typically runs in the cloud.

2.2.1 Definition

Although there is not a clear consensus on the definition of a microservice, the definition of Lewis and Fowler is the most used [15]:

The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, of-ten an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a bare minimum of centralised man-agement of these services, which may be written in different programming languages and use different data storage technologies.[16]

(8)

standard way of building a system. The software is developed as a single unit that contains all the logic and is often not developed with modularity in mind. Any change in the code base would require rebuilding and deploying the entire system. In the case of microservices, the logic of the application is split up into multiple independently working services. These services communicate via a simple communication protocol. A service is defined as follows; services are out-of-process components which communicate with a mechanism such as a web service request, or remote procedure call.[16]. Services can be updated and deployed independently from the rest of the system as long as the communication layer does not change.

2.3 Advantages and challenges

The microservice architectural style has many advantages but also brings challenges. In this section, we will go over the characteristics of microservices.

2.3.1 SOA

The microservice style is very similar to the Service Oriented Architecture (SOA). SOA is a design approach where multiple services collaborate to provide some end set of capabilities[17]. This defini-tion looks a lot like the definidefini-tion of microservices. Just as microservices, SOA tries to combat the challenges of a large monolithic application. Despite this effort, there is a lack of consensus on how to do SOA well. The microservice approach emerged from real-world industry use [17]. It takes the understandings of system and architecture to do SOA well [17]. Microservices are applied SOA, just as Scrum is applied Agile software development.

2.3.2 Types of microservices

Generally, there are two kinds of services supported by the cloud provider; stateful and stateless. In computer science, the state of a program refers to the condition of entities at a certain time. In other words, a state describes the data that is used by the program. The response of a stateless service is independent of the state of the system since it does not require previous states or stores information. An example of a stateless service is a data processing service. The service receives data as input, processes it and returns the processed data. In such a service there is no reason to capture the state since the data is returned. Any subsequent request to the service would not require previous data. In case of a stateful service, the state of previous requests is captured. This information is used in a subsequent request. A log-in service is a stateful service. In this example, it is required to store the state of the password to verify it is correct.

Moreover, there is also a combination of these two types of services, called a hybrid microservice. Such a microservice is a stateless microservices but has a cache. The cache is used to store small amounts of data temporally. A subsequent request can use this data (if needed). The cache prevents the service from sending a request to a stateful service. The hybrid services improve the performance of both the service and the network since there is no extra data request.

2.3.3 Communication

The services need to communicate with each other via the network. This is realised through a language-neutral and platform-neutral protocol. The most commonly used protocol is REST. The language-neutral aspect of REST allows engineers to implement services in different programming languages and on different platforms. Engineers have a great advantage to choose the language that fits their needs and do not have to take into account other services.

REST uses the standard HTTP protocol. Each resource is represented as a URL. Through different HTTP commands such as POST, HEAD, PUT, GET and DELETE it is indicated what actions need to be taken. What the action exactly does is not standardised in the REST style, but is defined by the developer of the service. The REST standard is not a protocol like HTTP but is more of an

(9)

architectural style [18]. Every service can have a different implementation of the REST, but they all do follow the same style. REST is stateless, so every action is on itself and no data is stored. SOAP and JSON are the most used formats for REST messages. HTTP and JSON formats are supported in almost every programming language which makes REST very diverse.

2.3.4 Scaling

An advantage of cloud computing is its scaling capacities. There are two types of scaling in the cloud; horizontal and vertical scaling. Vertically scaled services raise their throughput by increasing the underlying resources such as the CPU or RAM. There is a limit on how much the application can grow with vertical scaling. Traditional monolithic applications generally make use of this type of scaling. In case of horizontal scaling, more instances of the service are introduced. A load balancer will divide the traffic between the instances. When a monolithic application scales horizontally, parts that do not require scaling are also duplicated. This creates a lot of wasted resources and decreases efficiency. Microservices are more efficient in this aspect since the horizontal scaling is only applied to the service that required scaling.

Apart from performance, horizontal scaling can also be used as redundancy. If one of the instances fails, the load balancer can redirect the traffic to the other instances. Netflix applies this principle on a global scale [19]. Netflix has set up three exact copies of their system in three different geographical regions. This method increases performance as the users have lower latency to the services, but more importantly, it introduces redundancy. When there is a crash in one of the regions, the traffic can be redirected to the two other regions which can horizontally scale their services to handle the higher load. Horizontal scaling of stateful nodes is difficult. Scaling read operations can be done by duplicat-ing data. A load balancer can direct the requests to different stateful nodes to manage the high load. Horizontal scaling does not mitigate the problems of writing data as every instance needs to handle every request.

2.3.5 Testing

The distributed network of microservices requires a new way of testing. Apart from the standard way of functionality testing, the network also needs to be tested. The network needs to handle outages correctly to ensure a high uptime of the system. Chaos engineering can be applied to test this. Chaos Engineering tries to uncover systemic weaknesses [20]. By artificially crashing services, the engineers can observe how the network reacts. The design of the architecture can change based on these obser-vations.

The microservices itself also need to be resilient for failures, and other services need to know how to react if a service crashes. The resilience is needed due to the principle of combined uptime. For example, if ten microservices are chained in a network, and they all have an uptime of 99.99%, they have a combined uptime of 0.999910 = 99.90% which is still very positive. However, if the services have an uptime of just 99% (which is considered good for a monolithic system), the combined uptime is 0.9910_{= 90.43% which is very undesirable.}

The testing of microservices is done in multiple stages. After building the new microservice, static, unit and integration tests are run to ensure functionality of the code. After passing these tests, a new node is created in the distributed network. This node that contains the new build is operative next to the live node that handles the traffic. Two common ways to test the update service in production is blue/green testing and canary testing. With blue/green testing, the traffic is directed to the new node. This new node will be observing for errors. If errors emerge, the traffic gets back redirected to the old node. In case of canary testing, a small portion of the live traffic is directed to the node to the new node. Again, the deployment system makes a decision (sometimes with the help of an engineer) to fully directed the network to the new node or to not deploy the new node. This approach allows the engineers to keep their feedback loop short and keep the live system from crashing.

(10)

2.3.6 Migration

The migration from a monolithic to a microservice architecture is a non-trivial task and a multi-dimensional problem [21]. Before starting the migration process, we first need to assess if microser-vices is the best fit for the application.

The main advantages of migrating to a microservice architecture are the improved scalability and maintenance. The microservice architecture further improves the organisational structure as it de-creases the complexity of the system. It allows engineers to work on a more confined part of the system and at the same time being dependent on others. However, a small application would not benefit from these advantages. The transition would work adverse and would create burdens such as creating an automatic pipeline, monitoring tools and initial setup. Challenges that cannot be solved by a small team with lesser expertise or a small budget. The microservice architecture works well at large companies such as Google, LinkedIn and Netflix. If one relatively small team manages the whole product, microservices is not the right choice. Martin Fowler states: ”You must be this tall to use microservices” [22]. Indicating that the introduction of a distributed system should be avoided as long as possible. The decision to change to or start with a microservice architecture should not be taken lightly.

Conways law

The structure of an organisation and an application are tightly coupled. This observation is also what Conway states in his law:

Organisations which design systems (in the broad sense used here) are constrained to produce de-signs which are copies of the communication structures of these organisations. [23]

The law affects microservices. Organisational structures are often taken into account when mono-liths are split up into microservices. The pattern is popular since it does not require changing the organisation which is a difficult and costly process. However, teams should structure around the microservices in the way that each team should handle one microservice [24]. These independent teams can own the whole lifecycle of the microservice allowing for a greater degree of autonomy than is possible with larger teams [25]. A more effective environment is created with this approach. Amazon and Netflix managed to change their organisational structure to fit their microservices [25]. These companies have overcome the challenge of changing their organisation to fit their microservice architecture best.

Granularity

The granularity of the network also has to be considered during the migration. The granularity describes the size and number of services in a network. A coarse-grained network has few large sized microservices while a fine-grained network has many small sized microservices. Both fine- and coarse-grained network have advantages. A fine-grained network will have better scaling, better maintainability and code quality due to the small code base of an individual service. However, the large number of services also creates overhead. Organisational structures and maintaining a good overview can also be hard in a fine-grained network. Research on the granularity and the size of a microservice is not clear as we discuss in Chapter3.

Migration Strategies

The migration from monolith can be done in two ways; gradually transition from monolith or tran-sition all at once. This first one is commonly known as the Strangler pattern. The pattern specifies that functionalities are incrementally replaced with new services. Existing features are migrated to the new system gradually. This pattern minimises the risk from the migration and spreads the devel-opment over time [26]. As features are migrated to the new system, the legacy system is eventually ”strangled” and is no longer necessary and can be safely retired.

(11)

Starting an application in the microservice architectural style is also possible. The options here are to start with coarse-grained services and as the system matures split the services further or start with fined-grained services from the beginning. However, starting with microservices does bring challenges. Boundaries between services evolve over time and are hard to identify at the start. [17].

2.3.7 Domain Driven Design

The Domain Driven Design (DDD) approach reasons about designing software based on models of the underlying domain [27]. The modelling principle is used to structure and create an overview of large and complex problems [28]. The model is based on the reality of business and has to be relevant to the use cases of the system. In DDD, problems are presented as a domain, and multiple similar domains can be grouped into a Bounded Context. This means that a Bounded Context represents a certain type of problem. Each bounded context has to be solved by a microservice. The microservice will only handle one type of similar problems.

The model that is created with the DDD also creates structure. It helps with the communication be-tween software developers during the implementations of the microservices as it allows the developers to use the same vocabulary and the same goals. The model needs to be consistent to be effective. The challenge to be consistent gets progressively harder on larger projects [27].

2.4 Service Cutter

The Service Cutter framework is a semi-automatic decomposition tool. The researchers have compiled a list of 16 coupling criteria (CC) that together form the requirements for a microservice architecture. These CC are abstracted from literature and industry experience. In Section 4.1.2we will go over each CC. These CC are used to score the input of the decomposition tool. This scoring is used to build a graph that resembles the connectivity between nanoentities. The graph is split with the help of a graph clustering algorithm to make service suggestions. After the suggestions are shown, a user can change the weight of a criteria and thereby make it more or less important. See Figure2.1for the overview of the Service Cutter where suggested services are presented and weights can be changed.

(12)

Figure 2.1: Service Cutter suggested services overview

2.4.1 Entity and Nanoentity

The Service Cutter makes use of entities and nanoentities. An entity is another word for a class in the Service Cutter framework. A nanoentity is an attribute that is part of a certain class. However, if that attribute is a class on itself, it is not assigned to that entity as a nanoentity, it is specified as a relation between the two classes. In Listing2.1 a code example is shown of the ”Leg” class. The class contains five variables. The type (public, private) does not matter in this process.

(13)

public class Leg implements ValueObject<Leg> {

private Voyage voyage;

private Location loadLocation;

private Location unloadLocation;

private Date loadTime;

private Date unloadTime;

public Leg(Voyage voyage, Location loadLocation, Location unloadLocation, Date loadTime, Date unloadTime) {

Validate.noNullElements(new Object[] {voyage, loadLocation, unloadLocation, loadTime, unloadTime});

this.voyage = voyage;

this.loadLocation = loadLocation;

this.unloadLocation = unloadLocation;

this.loadTime = loadTime;

this.unloadTime = unloadTime; }

}

Listing 2.1: Example Class LEG from the DDD-sample Application[29]

Every attribute of an object (entity) that is not user specified is considered a nanoentity. As can be seen, two variables are from the class ”Date”. Date is a java class; thus the loadTime and unloadTime are considered nanoentities. The variables voyage, loadLocation, and unloadLocation have types that are user-defined classes in the model of this application. This means that they are not nanoentities but another entity that they have a relation with. The different types of relations and how to find will be discussed in Section4.2.1.

(14)

Chapter 3

Related work

In this chapter, we introduce the work that relates to our research. We focus on the topics that are part of the decomposition process. These topics are size, migration, decomposition tools and validation techniques.

3.1 Size

In terms of the size of microservice, there is no clear definition found in the community. Many papers did not even report on size [21,30,15,8,31,32,33,34]. Th¨ones [35] reports on the size of a microser-vice in terms of LOC, but remains very vague and states that they range ”from a couple of hundred lines of code up to a couple of thousand lines of code”. Mazlami et al.[6] states that an empirical study of 42 resulted in vastly different sizes, ranging from under 100 LOC to more than 10000 LOC. The fact that few papers report on size in terms of LOC becomes clear as Newman et al. [17] states that giving a number of LOC for a microservices would be problematic as some languages are more expressive than others. Furthermore, some parts of the domain may be rightfully complex and do require more code.

The size of a microservice the result of a trade-off. For example, many papers [36, 14, 17, 37], relate the size of a microservice to the business capabilities. The size also depends on the number of microservices the developer is able to manage as Th¨ones [35] states. He argues that is it better to have bigger but fewer microservices if there is no automated deployment into production.

Another trade-off for the size of a microservice is the process of splitting up a microservice as Hassan and Bahsoon [37]. They argue that systematically addressing the trade-offs is essential for assessing the extent to which splitting up a microservice is beneficial regarding the potential value. This highly depends on the scenario in which the system is operating.

The Single Responsibility Principle is a frequently used term. Newman et al. [17] relates the size of a microservice to this principle. The principle states ”Gather together those things that change for the same reason, and separate those things that change for different reasons”. The boundaries of a service, and thus the size, should match the business boundaries. In this way, it is obvious where the code lives for a given functionality. They state that people have a very good sense of what is too big and that can be argued that a microservice is the small enough when it does not feel big anymore. Newman also talks about the trade-offs with regards to the size of a microservice. The smaller the size of a microservice, the more benefits of the microservice architecture but also increasing complexity from having more moving parts. Microservices should become smaller when the teams are better at handling these complexities. This is the same point that Th¨ones made and also Dragoni et al. [14] also acknowledges this. Furthermore, Dragoni et al. states that the size of a microservice is highly dependent on the structural design of the organisation producing it. This is in line with Hassan et al. [37] and another paper of Dragoni [36] that a microservice should only focus on one business capability.

(15)

Dragoni et al. also argues, just as Newman [17], that a microservice should be split up when it is too large in order to bring benefits in terms of services maintainability and extensibility. The exact condi-tions of when to split a microservices are not given, only that is should remain one business capability. From this literature study, we can conclude the size of a microservice is not concrete or strict. The size of a microservice depends on the business capability it resembles, the organisational structure, the team that the microservices needs to manage and the trade-off of splitting up a microservice.

3.2 Migration

Regarding migration, the most frequently investigated approach is the Domain Driven Design [8,17,

33,35,21,38]. DDD (see section2.3.7) is the main approach of dealing with large models by dividing them into Bounded Contexts and defining their relations. The Strangler approach (see section2.3.6) is applied by other papers [39,21,32]. Both have their advantages as is discussed in the background. Almost all papers mention that the migration process is a non-trivial task and should not be taken lightly.

In the empirical studies regarding the migration process, multiple types of applications are used. The types of applications used in the papers range from cloud-based architectures [33], to RDBMS as a Service [21], to critical systems of a Foreign Exchange [36].

During the migration process, there is a tight coupling between the organisational structure and the microservices that are created as we discussed in section2.3.6. This coupling is also acknowledged by Balalaire [33]. The mentions that cross-functional teams are formed to support the new services. Balalaie also notices that core teams were introduced to support individual team and manage share capabilities.

3.3 Decomposition

The decomposition process of migrating a monolithic architecture into a microservice architecture is usually done by hand [5] and is considered an art [40] that depends on the expertise of the software architect. The survey of Francesco et al. [41] found that the decomposition process is seen as a major activity and challenge in the migration process. The process is usually an unstructured one [7]. Multiple studies [8,5,6,7] have tried to address this challenge by providing a structured methodology. One of the first attempts at providing a structured way is done by creating requirements for mi-croservices[8]. The 16 coupling criteria that are introduced by the Service Cutter decomposition tool are abstracted from literature and industry experience. These requirements are used to build a graph that resembles the connectivity between the components of an application. The graph is later split up via graph clustering algorithms to suggest the composition of the microservices. The Service cutter does not incorporate information of the monolith itself and relies on user input. This is considered subjective by Chen et al.[5].

Another approach is more data-driven and tries to break business logic into microservice candidates [5]. The semi-automatic mechanism includes one manual and two automatic steps. First, the en-gineers perform a business requirement analysis to build a data flow diagram of the business logic. Their algorithm combines the operation with the same type of output data and extracts individual modules from the data flow to identify microservice candidates. The implementation has the similar approach to the Service Cutter and does perform similarly on small projects. Both implementations have not been tested on large projects yet.

Another decomposition tool introduces a formal model that makes use of more traditional software decomposition techniques [6]. The model contains three formal coupling strategies. The strategies are logical coupling, semantic coupling and contributor coupling. These strategies are embedded into

(16)

a graph-based clustering algorithm. The coupling strategies rely on information from the code base of a monolithic system to construct a graph which is clustered to find the microservices. There is no behaviour of the application captured, and only static analysis is used.

The problem can also be solved via a black box [7]. This iterative approach tries to find the best composition for the microservices without requiring enormous time to analyse business processes that do not consider non-functional requirements such as performance. In each iteration, the composition of the microservices is changed and the performance is measured. With this approach, Mustafa et al. try to optimise the granularity of the system. This approach is used to decomposition web services. As these systems are typically very well boxed, this approach is possible. For systems that are more entangled, this approach would not work.

3.4 Validating microservices

The decomposition process results in service suggestions. However, how do we know these services are correct? In order to validate the suggested microservices, we need metrics. While there are many well-known quality metrics in fields such as object-oriented design, microservice research lacks such metrics. Researchers are validating their microservices with different approaches. Some of these ap-proaches lack objective metrics.

The Service Cutter decomposition tool[8] makes use of a ’Decomposition Questionnaire’ to validate their microservices [42]. With questions such as ”Does the service cut comply all constraint criteria?” and ”Is the coupling between services similar?” they try to assess if a cut is correct or not. In this approach, they validate the cut done by the tool and not the suggested services. Also, the question-naire does not contain objective metrics and is therefore very subjective. The questionquestion-naire does act as a guideline for the system architect to find possible flaws in the suggested services.

Other ways of validation are tied to the core principles of microservices. Microservices have many characteristics. For example, microservices should be small and autonomous, loosely coupled, have high cohesion and should be language neutral. These principles are used to validate microservices [5]. However, during this validation, no hard metrics are introduced. They subjectively argue if the suggested microservices fit these characteristics that the microservice is adequate.

The main benefits is the improved team structure and reduced team complexity and size. To measure any improvement regarding these benefits, the Team Size Reduction metric is used [6]. This metric showcases the team size reduction of the suggested services. The team size can be calculated by looking at the number of contributors of the GitHub repository. This metric is objective and does quantify a benefit of microservices.

Furthermore, the Average Domain Redundancy metric was introduced to objectively measure the amount of repetition and duplication with respect to domain concepts in the source code of different microservices [6]. This reflects the ’has to do one thing well’ principle that is often mentioned with microservices. Since every microservice has to do one thing well, there should not be high amounts of duplication or repetition of code. The Team Size Reduction metric and the Average Domain Re-dundancy metric were found to be good indications of a good microservice [6]. Other metrics such as the contributors overlapping ratio and the external communication were found less effective.

(17)

Chapter 4

Automatic decomposition

In this chapter, we will discuss the elements that make up our research and the design decision of our implementation of the automatic decomposition. We start with explaining the architecture of the project. This is followed by the criteria and how they are automated.

4.1 Decomposition framework: Service Cutter

This research will help to further structure and automate the monolith to microservice decomposition process. During this decomposition process, we are going to make use of the Service Cutter. The framework is only used to actually cut the microservices. In our research, we will focus on gathering information about the monolithic application which can be passed to the Service Cutter. This will be done with the help of static and dynamic analysis to capture the state and the behaviour of the application. This information will be made specifically for the Service Cutter, but can be adapted to function with other decomposition tools. The analysis will give a better representation of the current system than the user input that is currently using as the input for the Service Cutter and give better service suggestions. We do not aim to make improvements on the Service Cutter.

4.1.1 Architecture of the project

The architecture (Figure4.1) starts with the monolithic application in the form of a .jar file. This .jar file is used by the three analysis programs. The static analysis produces a JSON file (model.JSON) that represents the structure of the application. The GitHub analysis makes use of the GitHub repos-itory and produces the commits.txt and contributors.txt. The contributors.txt will be used during the validation. The dynamic analysis makes use of the user specified use cases and the commits.txt and produces a JSON file (user representations.JSON) that represents the behaviour and functioning of the application. The model.JSON and user representations.JSON files can be found in ./Outputs and are passed manually to the Service Cutter.

The instructions for starting the Service Cutter can be found in the main GitHub repository of the Service Cutter [43]. The default login for the Service Cutter are username ’admin’ and password ’admin’. The files have to be loaded in the System Specification tab. The model has to be loaded first and the user representations as second. After the files have been successfully loaded, the suggested cuts can be found in the Service Cuts tab. In this tab, there is also the option to change the priorities of the CC to best fit the environment of the monolithic application. It is also possible to choose different clustering algorithms. A JSON file with the suggested cuts can be exported. This file is used by the validation program which creates the suggested .java files and validates these with the help of the validation metrics.

We have build a small script that can run the programs and generate the JSONs. The user only has to specify four parameters in the run.sh script;

(18)

1. The path to the jar of the monolithic application that has to be analysed

2. The package name to further specify which section of the application has to be analysed. 3. The path to the local Git directory.

4. The path to a directory which contains one or multiple jar files that each contain one use case.

Dynamic analysis

Static analysis

GitHub analysis

Use Cases user_

representations.JSON

model.JSON Service Cutter

Suggested Services Suggested .java ﬁles Results in Input of Input of Results in Result in Results in Validation Generated ﬁles Implemented programs Monolithic Application GitHub repository commits.txt contributors.txt Results in Input of

Figure 4.1: Architecture of the decomposition tool

4.1.2 Criteria

The Service Cutter model makes use of 16 coupling criteria (CC). These 16 CC are the requirements for a good microservice. The CC are used to build a graph that resembles the relations between the components of the application [44]. This graph is later used to decomposition the monolithic system into microservices. In order to reach our goals to automate the decomposition process further, we need to automate criteria to make them more objective, reliable and represent more information. Not all criteria can be automated since they are user dependent. Below, we will go over the definition of each criteria and assess if they can be automated.

CC-1 Identity and life cycle commonality. This criteria represents nanoentities that belong to the same identity and therefore have the same life cycle. This is based on the idea that objects are defined by their attributes. To find this common identity and life cycle we can look at classes and the relation between Composition relation between classes (see Section4.2.1). This CC will be automated by performing static analysis.

CC-2 Semantic proximity. Two nanoentities have a semantic proximity when they have a semantic connection in the business domain. This can be found by looking at Aggregation or Associate relationships between classes. The criteria can be automated by performing static analysis. This CC also states that the strongest indicator of semantic proximity is the use-cases. This statement is based on the method of Richardson that services should be partitioned based on the use cases and the Single Responsibility Principle of Martin (discussed in Chapter3.1) where ”things that change for the same reason should be grouped together”. Therefore we will also be analysing use cases to find Semantic Proximity.

CC-3 Shared owner. A shared owner can be a person or department that is responsible for a group of nanoentities. The shared owner is a result of the organisational structure that develops the application. This is user dependent information.

(19)

CC-4 Structural Volatility. Structural volatility represents how often a structure changes and therefore which nanoentities change. We cannot automate this CC by looking at future design changes, but we can look at the past. By looking at the GitHub repository of the project we can look at how often the structure has changed in the past. This information can be used to indicate how likely a structure will change in the future. In other words, if a class has not been changed for months, it is not likely to change soon.

CC-5 Latency. Groups of nanoentities that are required to have high performance and should be bundled together to avoid remote calls. The CC states that all nanoentities that a read and written in the same use case should be group together. This is based on a performance guideline that ’round trips’ should be minimised to reduce call latency. If we automate the use case analysis that we also needed for the Semantic Proximity (CC-2), we will also cover this CC.

CC-6 Consistency critically. This CC states that some nanoentities need to have a high consistency such as bank records. This information is user dependent and will not be automated in our research.

CC-7 Availability critically. Nanoentities can have different characteristic regarding their availability. Some elements need to have a high availability and without these nanoentities, the system can not function. The nanoentities with a need for high availability should be grouped together on high availability servers. The availability characteristic can only be classified by a system architect that has a complete understanding of the system. This CC will not be automated. CC-8 Content volatility. This criteria states that a nanoentity can be defined by its volatility and

thus how often it changes. High volatile and more stable nanoentities should not be in the same service. This can be calculated based on the information of the use cases. We will analyse this by injecting bytecode into compiled java filed to alter methods and track how often a nanoentity changes. This is further elaborated in Section4.3.2.

CC-9 Consistency constraint. A group of nanoentities that have a dependency on each other should be kept consistent. This is based on the Domain Driven Design principle that a cluster of associated objects should be treated as one. These constraints can be found in the Aggregate relation between the classes. This will be done by performing static analysis of classes just as with CC-1 and CC-2.

CC-10 Mutability. This criteria is not yet implemented in the Service Cutter. This criteria states that it is preferred to share immutable information than mutable information. We will not incorporate this CC in our research.

CC-11 Storage similarity. Nanoentities are classified into categories by their size. This is system specific, thus the category ”large” can be 1MB in one system and 1GB in another. Currently, this is classified by a user but can be automated. By looking at the memory usage of the application during a use case, we can classify this into groups (Large, Normal, Tiny).

CC-12 Predefined service constraint. This CC overrules the system by specifying which nanoentities should become a service. This is information that only a system architect can specify and thus will not be automated.

CC-13 Network traffic Similarity. This criteria is not yet implemented in the Service Cutter. Service decomposition should take into account how often and the size of the data that is transferred via the network as this puts a strain on the network.

CC-14 Security Contextuality. A user-defined criteria that allow the user to group nanoentities together with the same security constraints.

CC-15 Security Criticality. Data with high financial damage in case of a loss should not be grouped. This can only be determined by a user.

(20)

CC-16 Security Constraint. Groups of nanoentities that a semantically the same but should not be grouped together in a microservice because of security constraints. This is a user-specified criteria.

To summarise, we will perform static analysis on the code base of the monolithic application to automate the criteria Identity and life cycle commonality (CC-1), the Semantic Proximity (CC-2) criteria and the Consistency constraint (CC-9). The dynamic analysis will be used to automate Semantic Proximity (CC-2), Content volatility (CC-8) and the Storage similarity (CC-11). Lastly, GitHub will be used to find information about the Structural Volatility(CC-4) and automate this CC.

4.2 Static program analysis

Static program analysis is the analysis of computer software that is performed by looking at the code base. The code of the program is not executed. The static analysis is used to gather information about the structure of the monolithic application. We will analyse the class structure to get information about their relations.

In our static analysis, we will load already compiled java code in the form of a JAR. With this approach, there is no need to alter the original code which is undesirable. Once the classes of the JAR are loaded in our run-time environment, we will make use of Java Reflection [45]. With this library, we can search the loaded class and find their fields. By analysing the constructors and the fields of the classes, we can find the relations.

4.2.1 Relations

The relations between the classes are based on the relations of the Unified Modelling Language (UML). The UML specifies eight types of relations. We are only interested in two of those relations for the two criteria we are discussing. CC-1, which states that entities that share the same life cycle should be grouped, focuses on the Composition relation. Composition relation implies a relationship where the child cannot exist independent of the parent [46]. For example, the classes House (parent) and Room (child). The Room cannot exist without the parent and thus share the same life cycle. CC-2, regarding the semantic proximity of entities, states that Aggregate relations indicate the same business domain and should be grouped together. Aggregation implies a relationship where the child can exist independently of the parent [46]. For example the classes Class and Student. One is the parent of the other, but the child can survive on its own.

An automatic way of finding those relations is difficult and the manual process is not trivial ei-ther. The original creators of the Service Cutter found these entities, nanoentities and relations by hand [29]. Even though this process was done with great care (we presume), they made a mistake. The Composition relation is also a transitive relation. A transitive relation states that if A has a re-lation with B and B has a rere-lation with C, A also has a rere-lation with C. Thus if A has a Composition relation with B and B has a composition relation with C, A also has a Composition relation with C. For example, the three classes Person, Hand, Finger. A finger cannot survive without a Hand and a Hand cannot survive without a Person. Thus a Finger cannot survive without a person. However, in the hand-made input of an example of the Service Cutter they stated the following:

Origin Destination Relation type

Cargo Itinerary Composition

Itinerary Leg Composition

Cargo Leg Aggregation

These relations do not hold according to the transitive property of the Composition relation. We contacted the authors of the Service Cutter and asked why these relations were made. They replied by saying that they were not entirely sure since their research was done over two years ago.

(21)

They also said: ”We considered Cargo → Itinerary and Itinerary → Leg to be of type Composition as we had the feeling that they share a common life cycle.” The word ”feeling” indicates that they thought what a service should be and create input in such a way that this would be the case. We can also argue that they were not objective and that the decomposition process will benefit by having the process automated.

We automated the classification of the relations by looking at the constructor of the class. When an object is passed as a parameter to the constructor it indicates that it can survive on its own. After all, it was created before the constructor and thus share a different life cycle. These relations are classified as Aggregate relations. If the object is created inside the constructor, they do share the same life cycle. They are created at the same time and when the parent object is deleted, so does the child object. These relations are classified as Composition relation. In the Cargo Class (Listing 4.1) this would mean that we generate the following relations:

Origin Destination Relation type

Cargo TrackingId Aggregation

Cargo RouteSpecification Aggregation

Cargo Location Composition

Cargo Delivery Composition

Cargo Itinerary Composition

public class Cargo implements Entity<Cargo> {

private TrackingId trackingId;

private Location origin;

private RouteSpecification routeSpecification;

private Itinerary itinerary;

private Delivery delivery;

public Cargo(final TrackingId trackingId, final RouteSpecification routeSpecification) { Validate.notNull(trackingId, "Tracking ID is required");

Validate.notNull(routeSpecification, "Route specification is required");

this.trackingId = trackingId;

// Cargo origin never changes, even if the route specification changes.

// However, at creation, cargo orgin can be derived from the initial route specification.

this.origin = routeSpecification.origin();

this.routeSpecification = routeSpecification;

this.delivery = Delivery.derivedFrom(

this.routeSpecification, this.itinerary, HandlingHistory.EMPTY );

} }

Listing 4.1: Example Class Cargo from the DDD-sample Application[29]

The information about the entities, nanoentities and the relations between entities is written to the model.JSON file which the Service Cutter can process. An example of such an output can be seen in ListingB.1

4.3 Dynamic analysis

Dynamic program analysis is the analysis of application by executing a program. With the help of dynamic analysis, we will gather information about the behaviour of the application. The analyses will look at use cases to obtain information about the Semantic Proximity (CC-2), Content volatility

(22)

(CC-8) and the Storage similarity (CC-11) criteria. Bytecode injection will be used on the original code to analyse use cases.

4.3.1 Use Cases

A use case is a list of actions (functions) to achieve a goal. For example, a use case for sending a letter can be; write a letter, print the letter, put the letter in an envelope and bring it to the post order company. Thus, in order to find nanoentities that have a semantic proximity we need to analyse the use cases. These use cases will be defined by a system architect of the monolithic application. The architect will create small programs that define a single use case. With the help of these use cases, we can automate multiple CC.

4.3.2 Byte Code injection

In order to find which nanoentities are used or changed during the use case, we need to log in each function that is called. This can be done by changing the original code to perform logging. In large projects, changing the code can be a very labour intensive and error prone task. We choose to perform bytecode injection to solve this problem.

Byte Code injection is the action of adding extra code to an already existing and compiled pro-gram. To perform these injections, we made use of Javassist[47]. The framework is designed to manipulate Java bytecode in an easier way. Javassist makes use of the so-called java agent. This java agent has a function called the pre-main. This is a function, as its name suggests, that is called before the classes are loaded into the JVM and the application. The method transformations are done in this function. All the classes, methods, field and other information are read with the help of the Javassist framework before they are loaded into the JVM. The framework gives us an opportunity to alter the code. The new code will help us to analyse the application without changing the original application which is seen as one of the main advantages of this approach. The performance of the original is not affected, but the added code does create extra computations.

Nanoentities write

In our implementation, we inject bytecode before and after every method. This bytecode is written as normal code, but then compiled and placed in the classes as it is loaded into the JVM. Before each method, we store the hashCode of a nanoentity into a new variable if the nanoentity is of a non-primitive type (int, double, char, etc.). If it is of a non-primitive type, we copy the value into a new variable. After the method, we walk through the nanoentities again and generate a new hashCode. If we see that the hashCodes changed, we know the value of the nanoentity changed. For the primitive types, we use the ’==’ operator to compare the cloned and the nanoentity. These operations give us insight into what nanoentities changed in which method during a use case.

Nanoentities read

In order to find which nanoentities are read during the use case, we had to compromise. We initially thought that it was possible to access the JVM or the garbage collector in a way that we could get a notification when the program tries to access a nanoentity from memory. After all, the problem ’we want to know when a nanoentity is read’ does not seem very troublesome. We thought of an approach that made use of the reference counting of the garbage collector. However, modern java garbage collectors do not make use of reference counting and trace the application to have better performance. We also read about the JVM Tooling Interface [48]. This tool allows applications access to the state of the JVM to perform profiling, debugging, monitoring, thread analysis, and coverage. This C library can be used to implement event handlers. Although this approach will (very likely) solve this problem, it is overkill for our application since it requires a large amount of knowledge about the workings of the JVM, C and profiling. We decided to go back to the Javassist framework and hashCodes for our solution.

(23)

Via Javassist it is possible to access the parameters of a function when it is called. We opted to generate a hash for the value of that parameter and compare it to every hash of every nanoentity in the program. With this approach we now if a nanoentity has been passed to a function and thus has been read.

Hash Collision

Please note that we are aware of hash collision [49]. Hash Collision is the fact the two different inputs can map to the same output. This is true as we can have an indefinite number of different objects but only a limit number of outputs. In case of the java .HashCode() function the size of an int. In our application, we only use them for nanoentities and the number of hashcodes will be small. However, as we can see in the ’Birthday problem’ [50], probabilities can be very deceiving. The probability of a hash collision of 1000 nanoentities would be 1 in 10.000 [51]. This seems number seems low and thus likely but remember that a 1000 nanoentities in a program would be a lot and we mean 1000 nanoentities and not many instances a couple nanoentities as we handle this object wise and not static. If the hash collision does become a problem in the future or with very large applications we can adapt the java agent to make use of larger size hashing (such as MD or SHA).

Size

We also obtained the size of the java objects with the help of this java agent. The size of an object in Java is not as concrete as in other languages such as C. One of the reasons is that objects are variables can be shared between objects by the JVM in order to save memory. Therefore it is more of a size esti-mation. In our implementation we made use of a Size Estimator[52] based on the article of JavaWorld [53] that looks at the entire data graph that is made by the JVM and estimates the size of the ob-ject at a node in the data graph that is the root is the obob-ject. We estimate the size after every method. A before (Listing A.1) and after (Listing A.2) of the byte code injection can be seen in Appendix

A. The injected byte code will perform a standard System.out.println() which will be piped to a file which is parsed by the program (see ListingA.3).

4.4 GitHub analysis

The GitHub repository of a project is also a valuable source of information that will be used during our analysis and our validation.

4.4.1 Contributions

The structural volatility criteria represent how often a structure changes. We argued that the past is a good indicator for the future and GitHub keeps an excellent record of the past. By looking at the number of contributions (commits) that a class and therefore a nanoentity has, we can find how often a structure changed in the past. With this data we can classify the nanoentities into categories often, normal, rarely. This data can be collected via two methods; via the git command in the terminal or via the GitHub API. We choose to use the git log terminal command to get the benefit of bash commands to count and group the commits. We also now can easily execute the bash commands via Java at run-time. We execute the commandgit log --oneline -- [fileName] | wc -lto retrieve the total number of commits per file. The information of all the files is used to classify the structural volatility in their categories.

4.4.2 Contributors

Not all information that is gathered is going to be used during the decomposition process. There is also data needed in order to validate the suggested microservices. In Chapter3.4 we found that in order to validate a suggestion we need to look at the team size reduction ratio. This ratio is

(24)

calculated by looking at the unique number of people that contributed to a class. These unique number of contributors can be found at the GitHub repository. We execute the commandgit log --format=’\%aN’-- [fileName]to find the unique contributors of a file.

4.5 Classification

After all the information about the monolithic application is gathered, we need to process this data. As stated before, the output of the bytecode injection of a use case is piped to a file. These files are read and parsed by the program. We also added the ability to temporarily pause the registration of events with the keywords STOPREADING and STARTREADING. This is useful during the use cases if a certain part does not need to be registered such as a setup or initialisation.

The information is loaded into their own UseCase objects. These objects contain information about how often a read or write occurred and the size of the nanoentities. This information is used for the Content Volatility (CC-8) and Storage Similarity (CC-11). These CC use three categories; rarely/tiny, normal, often/huge. To split the nanoentities into the correct category we make use of the Jenks (Fishers) natural breaks classification method.

4.5.1 Jenks (Fishers) natural breaks classification method

To find the categories for the nanoentities, we need to find intervals in the data. This is done with the help of the Jenks (Fishers) natural breaks classification method. The algorithm, with proofed validity and complexity, can find ’breaks’ in data with the ability to specify how many breaks it has to find. It is seen as the k-means algorithm for 1D data. Visual Paradigm [54] describes the algorithm as follows: ”classification of an array of n numeric values into k classes such that the sum of the squared deviations from the class means is minimal.” The mathematical validity and the proof for the complexity of O(k × n × log(n)) can also be found on the website of Visual Paradigm[54]. The intervals found by the algorithm can be used to classify the nanoentities in their respective category. We did not implement the algorithm on our own since there are multiple open source implementation found online. We made use of the Java port by Philipp Schoepf [55] from the original implementation of Maarten Hilferink in C [56].

After the Jenks method found intervals in the data, we can map the original data to these inter-vals and categorise the nanoentities. This same approach is used for the Structural Volatility(CC-4) where the input data is the number of commits per nanoentity.

The program can now output the information into the user representations.JSON file that can be read by the Service Cutter. An example of such a JSON output can be seen in ListingB.2

4.6 Validation

In this section, we will go over the metrics and guidelines that will be used to validate the suggested microservices.

4.6.1 Metrics

During the validation of our suggested services, we need to be as objective as possible. The optimal way of validation would be to actually develop the suggested services, launch them in a cloud envi-ronment and test them. These tests can be unit tests to see if the code still functions as it should. Chaos tests (Section2.3.5) should also be used to find possible weak points in the network. However, this research focuses on the design aspect of the migration process. This type of validation is out of scope. Therefore, we need to work with metrics and guidelines that can be used without actually running code.

(25)

In the literature study on validating microservices (Section 3.4), we found that metrics are scarce in the design process of microservices. Mazlami et al. introduced the Team Size Reduction metric to objectively measure one of the main benefits of microservices. This metric can be calculated with the help of the GitHub repository and will be used during the validation. Also, we adapted the metric for the Average Domain Redundancy slightly to incorporate it into our project. Mazalami et al. used semantic coupling to find the bounded contexts of the domain driven design where our implementa-tion works with use cases. The metric is adapted to work with the number of duplicate lines to find redundancies in the domain.

In the literature study on the size of a microservice, we found that the size of a microservice cannot be described in terms of LOC. However, the LOC does say something about the quality of the sugges-tions. For example, if 90% of the code is in one microservice, a system architect has to assess if this is correct. Therefore, we will count the LOC to find very large and very small microservices. It will not mean that these microservices are inherently bad, but they do require manual validation. This way of validation will help the system architect to find possible weak spots of the suggested services more easily.

We stated in the Chapter 3.4 that the questionnaire used by the Service Cutter is not an objec-tive way to validate the suggested services. It should, just as the LOC, help the system architect focus on possible flaws in the decomposition solution. We will recommend and use the decomposition questionnaire as a guideline for the system architect and not as an objective metric.

4.6.2 Team Size Reduction

According to Mazlami et al. [6], the formula for the Team Size Reduction (tsr) is as follows: tsr = ContributorsM icroservice

ContributorsM onolith

If we look at the overview of the architecture in Figure4.1we see that the GitHub analyser outputs two files: commits.txt and contributors.txt. For this function, we need the contributors.txt. In this file, each nanoentity has the unique names of the contributors that made changes to that nanoentity. The contributors file is used to build a list of the unique contributors of all the nanoentities in a suggested service. The same is done for the total number of contributors. The tsr ratio can now be easily calculated. The output of the metric is 0 < tsr =< 1. There is no guideline when a tsr is good or bad. This metric can be used to give an indication what the size of the team will be after the monolith has been split up.

4.6.3 Average Domain Redundancy

The Single Responsibility Principle (Section 3.1) states that each microservice should handle one Bounded Context and code duplication between the service should be as little as possible. The Average Domain Redundancy (adr) objectively measures the Single Responsibility Principle . The adr can be calculated via the following formula:

adr = P

i6=jsim(Si, Sj)

totallinesof code

The adr says something about how much redundancy there is between the bounded contexts in the domain. This tells the system architect how good the nanoentities are grouped. If the adr is low, the amount of duplicate code is low and the nanoentities are correctly grouped. Otherwise, if the adr is high, there is much redundancy and the Single Responsibility Principle of the microservices is not correctly used.

(26)

We made use of the Duplo duplicates finder [57] to find the duplicates. The blocksize of the du-plicates is set on three lines. We choose for three lines, instead of the standard four line, to include getters and setters in the adr. These small functions are an essential element of the Java language.

4.6.4 Lines of Code

The lines of code (LOC) will be used to find abnormalities in the suggested services. For example, service with extremely large or small amount of LOC. The system architect needs to check this result to assess if the LOC of the suggested service is reasonable.

We will make use of the open-source project CLOC [58] to find the LOC. CLOC is a good fit for our implementation as it is a very portable, lightweight and easy to execute file. With the command ./cloc [fileName]CLOC counts the number of lines. A small downside of CLOC is that the program gives too much information for our implementation, see Listing4.2. There is no option to just give the LOC. Therefore we had to parse the output to get the LOC. In the case of Listing4.2this would be 157.

1 text file. 1 unique file. 0 files ignored.

github.com/AlDanial/cloc v 1.76 T=0.02 s (52.0 files/s, 9621.9 lines/s)

---Language files blank comment code

---Java 1 23 5 157

---Listing 4.2: Example output of cloc

4.6.5 Decomposition Questionnaire

The decomposition Questionnaire [42] of the Service Cutter is as follows: 1. Does the service cut comply all constraint criteria?

2. Does the service cut combine as few nanoentities with diverging characteristics into one service as possible?

3. Does each service depends on as few nanoentities of other services as possible? A use case should cross as few service boundaries as possible.

4. Are the nanoentities that are part of a published language between services suitable for intra service communication?

5. Is the coupling between services similar? It is not the size of services that requires homogeneity within the system but the amount of published language between services.

6. Are there not too many services? This is called the nanoservice antipattern (REF). 7. Are there not too few services? This is a monolithic architecture.

All questions should be answered with a ’Yes’. If this is the case, the performed cuts are correct according to this guideline. This does not mean the services are perfect, but it gives an indication about the cuts that lead to the suggested services.

(27)

4.6.6 Generating java files

The Team Size Reduction metric and the Decomposition questionnaire can be performed without actual code, only the information of the nanoentities are needed. The other metrics do need actual code and thus need the code files of the suggested services. These code files need to be generated, and we will make use of a Java Parser to do this.

The Java Parser [59] is used to create a Symbol Table of the original monolithic program. In a Symbol Table, every reference of a variable is stored. This table can help to find which variables in the method map to which nanoentity. If we examine the Symbol Table for every method, we can generate a list of methods where a nanoentity is used. When this information is combined with the result of the decomposition tool, we can construct a code file for the suggested service.

This results in a java file that resembles the suggested service. However, this code file cannot be compiled as it is not working code. For the code to be working, we need to create getters and/or setters for nanoentities that are in other services. There is also a communication layer needed in order to get the code operational. Although this code is not compilable, we can use the code to validate the service as it resembles the majority of the code base of the suggested service.

From monoliths to microservices - The decomposition process