• No results found

Component Evolution Management in Maintenance Tools for Content Management Systems

N/A
N/A
Protected

Academic year: 2021

Share "Component Evolution Management in Maintenance Tools for Content Management Systems"

Copied!
69
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

“Component Evolution

Management in Maintenance Tools

for Content Management Systems ”

Panagiotis Stravopodis

pstrav@hotmail.com Summer 2015, 68 pages

Supervisor: Dr. Magiel Bruntink,m.bruntink@uva.nl Host organisation: Byte Internet B.V,http://www.byte.nl

Universiteit van Amsterdam

Faculteit der Natuurwetenschappen, Wiskunde en Informatica Master Software Engineering

(2)

Contents

1 Introduction 3

1.1 Introducing Content Management Systems . . . 3

1.2 Problem Statement & Motivation . . . 3

1.3 Research Questions . . . 4

1.4 Research Methodology . . . 4

2 Background 6 2.1 Components & Packages . . . 6

2.1.1 Components . . . 6

2.1.2 Packages . . . 7

2.2 Package Managers . . . 7

2.3 Dependencies . . . 8

2.3.1 Software Packages, Components & Dependencies . . . 8

2.3.2 Types of Dependencies . . . 9

2.3.3 Formalizing Dependencies . . . 11

2.4 Dependency Resolution in Linux Package Managers . . . 13

2.4.1 Aptitude . . . 14

2.4.2 APT and comparison with Aptitude . . . 15

2.5 Content Management Systems Structure . . . 15

2.5.1 Core System Structure . . . 16

2.5.2 Requirements . . . 17

2.5.3 Extensions - Components . . . 18

3 Bridging package managers in the context of Content Management Systems 20 3.1 Content management systems as applications . . . 20

3.2 Dependencies on Content Management Systems . . . 20

3.2.1 Pre-Depends . . . 21

3.2.2 Depends . . . 21

3.2.3 Conflicts . . . 22

3.3 Current Solutions for CMS monitoring/upgrade . . . 22

4 The Prototype Maintenance Tool 24 4.1 First Installation . . . 24

4.2 CMS System Discovery & Listing . . . 26

4.2.1 Tool Decomposition . . . 26

4.2.2 Retrieve Version(s) Information . . . 28

(3)

4.2.4 Scanning Implementation . . . 31

4.2.5 CMS Tools API (PINO) . . . 31

4.2.6 Servicepanel - CMS Overview . . . 32

4.3 Provide Upgrade Suggestions . . . 33

4.3.1 Determine Dependencies . . . 33

4.3.2 Find Possible Upgrade Paths . . . 34

4.3.3 Determine Additional Dependencies . . . 36

4.4 Support Future Versions . . . 37

5 Evaluation 39 5.1 Functionality & Usability . . . 39

5.2 Correctness, Completeness & Assessibility . . . 40

5.2.1 First Installation & Cms Overview . . . 40

5.2.2 Upgrade Paths Validation . . . 40

5.2.3 Algorithm Verification . . . 41

5.3 Performance . . . 43

5.4 Threats to validity . . . 44

5.4.1 Implementation-related threats . . . 44

5.4.2 Research approach threats . . . 45

6 Conclusion 46 6.1 Brief Summary . . . 46

6.2 Discussion . . . 47

6.3 Future Work . . . 47

6.3.1 Component dependency prediction . . . 47

6.3.2 Complete upgrade . . . 47

6.3.3 Dependency resolution algorithm enhancement & validation . . . 47

Bibliography 48

Acknowledgements 51

A Installer(Installabot) 52

B File Version Retrieval 55

C CMS Overview 58

(4)

Chapter 1

Introduction

1.1

Introducing Content Management Systems

Content Management Systems are used more and more nowadays by millions of users. CMS systems o↵er a variety of advantages like the existence of a central instance of content that enables reuse of content when needed and shorter delivery time for new publications (entries).

Without doubt, the main advantage of a CMS system is the ease of use and maintenance that the user may enjoy. According to Bob Boiko [1],“content management is about gaining control over the creation and distribution of functionality. Is a process of getting organised about creating your publication”.

Content Management Systems do not require tremendous technical expertise for their maintenance 1, nor knowledge of programming languages. The administrator panel is web-based, with strict user privileges and once the user becomes familiar with the web interface, development process is really fast.

Byte (hosting organisation), is a major web hosting provider in the Netherlands, with a focus on professionals and especially in providing hosting for CMS systems (Magento, Joomla, Drupal and Wordpress). In order to provide customers with the best user experience, a new one-click tool for installing and monitoring CMS systems is required. The tool will be accessible through the service panel (administrator panel) that Byte provides to the customers.

1.2

Problem Statement & Motivation

Problem Statement The problem that we are focusing on is known as the“Installability Prob-lem” [2]. The problem arises when given a user request on installing a software package, the system shall be able to handle the request, find and solve any dependencies that the package may introduce to the system and finally perform the required operation. A software component is a bundle of files that are about to be installed, configuration rules that will be executed during deployment and meta-data describing the components expectations. Thus, we have to take into consideration some basic principles before proceeding to the installation/upgrade of a component. To begin with, components have expectations when they deploy as they require other components in order to function properly. When a component requires another compo-nent in order to be fully functional, we have a dependency between two compocompo-nents (the first

(5)

component depends on the second one). Dependencies may also include details of packages that are incompatible with the package we are trying to install and should not be present when the package is deployed. CMS systems are web applications with strong and loose dependencies, not only between the core components of the application and the core components of the system hosting the application, but also between components (or extensions) installed specifically to enable extra features. Currently, there are no package managers available2 optimised for web applications but package managers currently used in linux distributions can be examined to decide if their basic principles can be adapted for web applications.

Motivation CMS systems, become more and more popular among the web community these days. Upgrades from outdated versions, no matter what the CMS type is, often lead to corrupted applications that are not accessible any-more. Especially, a corrupted or unsupported database may lead to complete failure of the application. Protecting the property of a user, is one of goals of a web hosting company. The ease of use that current Content management systems o↵er is unique: a user, without expertise knowledge on web technologies, is able to install, maintain and use for personal or commercial purposes a fully functional website. Especially when a CMS system is used for commercial purposes, it is of paramount importance to ensure that the system will not become unavailable because of an unsuccessful upgrade.

1.3

Research Questions

The questions that we will try to answer during the research part are:

• Can we use principles (existing solutions) from known package management systems to a custom solution for web applications?

1. What particular dependencies exist on Content management systems and how can we resolve them?

1.4

Research Methodology

Before answering the research questions, extensive literature study on package managers and especially on dependency solving capabilities they expose, is required. The steps we followed in order to answer the research questions are the following:

Examined Current Implementations: The first step was to examine current implementations (package managers) that deal with dependency solving and understand basic concepts that are used.

Designed tool for the first installation: After thoroughly investigating the domain of Content Management System installers, we designed a tool that performs the initial installation for a CMS.

Designed tool for existing installations: When systems are already installed, the tool is able to track the installed systems, identify the current versions, check for available updates and notify users for required actions and finally suggest an upgrade path for the user to perform the upgrade.

2There are some package managers in web-appications field but they mainly deal with small libraries (e.g JavaScript

(6)

Prototype solution: We implemented the tool that is able to perform all described actions in CMS systems and integrated it in the Service Panel of Byte. A part of the tool was also distributed as a pilot in production.

Evaluation: In order to evaluate our tool, we assessed the tool based on several criteria and real use cases. As it is illustrated in the relevant chapter, a part of the tool was used as a pilot and real data was collected.

(7)

Chapter 2

Background

The aim of this chapter is to set the context and basic principles upon which, the research part and maintenance tool are based on.

2.1

Components & Packages

2.1.1

Components

Although there are no specific criteria to determine what a software component is and what cannot be considered as a software component, some widely accepted definitions are used. For example, Szyperski et al.[3] describe that: A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed independently and is subject to composition by third parties. According to Clemens Szyperski and David Messerchmitt, a software entity can be considered as a component when it:

• Supports parallel execution (with other software entities). • Is not context-specific (can be exchanged).

• Can be composed with other components.

• Supports encapsulation (access only through interfaces while implementation decisions are hidden).

• Can be considered as an independent deployment and versioning unit.

Another commonly found definition regarding components was outlined by Meyer[4] and describes that:

A component is a software element (modular unit) satisfying the following conditions: 1. It can be used by other software elements, its clients.

2. It possesses an official usage description, which is sufficient for a client author to use it. 3. It is not tied to any fixed set of clients.

As we mentioned before, a software component is a bundle of files that are about to be installed, configuration rules that will be executed during deployment and meta-data describing the components expectations. A more abstract view of what a software component is, can be drawn by digging into the

(8)

semantics of software components. Software components, are software units with provided services and required services. Provided services can be considered tasks that are performed by the component, while required services are services used by the component to produce provided services. Interfaces of components are very important as they consist of the specifications of both provided and required services and declare the relation between those two. For a detailed dependency specification a match between the required services to the conforming provided services is required.

Remark: A component may have multiple interfaces, each one serving a di↵erent set of services. It is common though to consider a component to have a single interface as a collection of all those distinct interfaces.

2.1.2

Packages

According to Stefano Zacchiroli et al, Packages are abstractions defining the granularity at which users can act (add, remove, upgrade, etc.) on available software components.[5] Distribution, is a collection of packages that are maintained simultaneously and the subset of distribution conforming to the actual packages installed on a system is called package status. Package managers are able to alter this status according to the user request. A package is mainly composed of three parts:

Set of files Describes the file system encoding of the package content (executable binaries, documen-tation, data etc.) Configuration files, containing the runtime behaviour of the package when it is deployed on a system, are part of this set of files. Configuration files should be handle with extreme care as they contain local adjustments made according to the system specifications we want to deploy on.

Set of valued meta-information Contains information describing general package information (e.g name of the package, software version, description) but also the declaration of inter-package relationships. Dependencies to other packages, conflicts, feature provisions1are detailed in meta-information.

Executable configuration scripts Those scripts are regular programs performing all allowed actions on a package. The scripts are able to customize parts of the package installation so their functionality cannot be exported or enhanced in file-level. There is logic strictly connected to them! Finally, those scripts are designed in a way to provide alternatives or fixes if an installation cannot be performed or if an error occurs.

2.2

Package Managers

Package managers are a distinct form of component managers and are quite popular in modern computer systems especially for their efficient handling of complex package installations, where a lot of di↵erent and perplexed packages are required.

Package managers have some basic and absolutely required functions to perform: • Retrieve data from remote repositories and verify the required components.

• Plan the upgrade procedure (specify paths) after taking into consideration the deployment restrictions (dependencies).

(9)

• Perform the upgrade: Install or remove components, in the right order and verify the result (abort if errors occur).

There are several aspects that we have to consider in order to assess the quality of a package manager though. A package manager must create plans that are valid (do not violate restrictions) and can be fully completed, is able to handle repositories growth (performance-wise) and enables the user to declare preferences on the configuration of the components.

2.3

Dependencies

This section is focused on the formalization of package, component and the introduction of dependency term.

2.3.1

Software Packages, Components & Dependencies

As we mentioned above, according to Zachirolli et al.[5], a software component is a bundle of files that will be installed on the system, configuration logic necessary for the deployment and metadata which describe component expectations. For the purpose of this thesis and the scope of research, we will examine the metadata, as they are used by every package manager in upgrade paths creation. The usual form of metadata include:

Name: It is used as the identifier of a component that usually does not change over the time (or di↵erent releases)

Version: Indicates the specific release of a component and it closely related to the name of the component.

Dependencies: A collection of other components essential for the installation and functionality of the component.

Conflicts: Describes components that cannot be installed along with the component that is about to be configured.

Features: Describes the naming of virtual components that a component may provide. They can be used by di↵erent components to satisfy dependencies (explained in detail on the next part) but they are not allowed to conflict with other components.

A valuable approach in order to describe and understand components and software packages in depth, is to formalize their definitions and use formal specification for their description. Thus we define:

Package,unit: A package is a pair (u, v) where u is a unit and v is a version. U nits are random strings, while versions should be considered as non-negative integers.

Repository: A repository is a tuple R = (P, D, C) where P is a set of packages, D : P ! P(P(P )) is the function that describes the dependencies and C✓ P ⇥ P represents the conflict relation. With P(X), we refer to the set of subsets of X. The following conditions must be satisfied by the repository: • The relation C must be symmetric. For example (⇡1, ⇡2)2 C if and only if (⇡2, ⇡1)2 C for all

⇡1, ⇡22 P

• Two packages that have the same unit but di↵erent versions should always conflict. If ⇡1= (u, v1) and ⇡2= (u, v2) with v16= v2, then (⇡1, ⇡2)2 C

(10)

Given a repository R = (P, D, C), we are able to describe the dependencies of each package with a set of sets of packages expressed as D(p) ={d1, . . . , dk}. Assuming that p is the package that we want

to install, all k dependencies must be satisfied before proceeding. In order to satisfy a dependency di,

at least one package of the packages in di must be available. When even one of the di is the empty

set, the package p cannot be installed as the dependency cannot be satisfied.

It is essential at this point, to introduce the di↵erent types of dependencies, before proceeding further.

2.3.2

Types of Dependencies

In order to examine more dependencies in packages and how package managers resolve them, an extensive study on current package managers in Linux systems was required. Two of the most popular package managers available are dpkg and rpm. Dpkg is the low-level package manager in Debian-based systems (e.g Ubuntu, MEPIS), while RPM is the low-level package manager for RPM-Based distributions (e.g Fedora, RedHat Linux). We will focus mainly on dpkg on this thesis due to popularity and current systems of hosting organisation (Byte B.V). Another important reason is that DEB (derived from Debian) packages are easier to manipulate compared to RPM packages, as DEB packages are produced by using standard tools (e.g tar), while usual textual representation is used in their context. RPM packages on the other hand, are encoded in a specific binary format. As a result, it is harder to use them for research purposes. Another reason is that RPM package manager does not deal with dependencies at all and special front-end tools (e.g yam), are required in order to perform an update successfully. Considering dpkg being low-level as well, two main front-end (or high-level tools) were developed: Apt and Aptitude. On the following sections, we will illustrate in detail how those tools resolve dependencies.

Adopting the dpkg terminology for relationships between packages, the dependency types found in Debian systems are:

Depends: Describes an absolute dependency. When a package manager encounters a ”Depends” field, the configuration for the package will continue only if all the packages included on this category are successfully configured.

Pre-Depends: Similar to ”Depends” field but the installation of the new package is initiated only if all requirements on the field are successfully met. Dpkg must first complete the installation of all the packages required (or ensure that their configuration is done simultaneously with the current installation), otherwise the installation is not even started. The main di↵erence with depends is that Pre-depends, does not allow unsatisfied circular dependencies2. When the package manager reaches to a circular dependency while trying to perform a package configuration, the installation is cancelled.

Recommends This type describes a strong but not definite dependency, when the packages listed in this field, should be installed with the package that the package manager is currently configuring, but under some unusual circumstances the packages in the list can be missing.

Suggests: This type describes that the package we want to configure can be more e↵ective once it is combined with one or more other packages. It is valid though, to skip the suggestions and install only the package that was initially required.

(11)

Enhances: Similar to ”Suggests” but this type describes specifically that one package can enhance the functionality of another package.

Breaks: When a package declares that it breaks another one, the package manager refuses to proceed and unpack this package, unless the broken package is first uninstalled. An exception is made for packages stating that they break their selves, as this dependency is not a real dependency that may cause a break.

Conflicts: This type of dependency is encountered when a package states that it conflicts with another package and as a result, dpkg does not allow the packages to be unpacked. If one package is to be unpacked, the package causing the conflict must be removed first. Presence of configuration files of a package cannot create a conflict. Only at least “Half-Installed” packages can cause a conflict. When a package declares a conflict with itself or a virtual package [26] that they provide, the package manager will not stop the installation but on the other hand it will allow to conflict with other packages until an alternative is found. Virtual packages are packages that have similar functionality and they are listed based on that functionality. Those packages exist only theoretically (as they are not real packages), thus they are called virtual. For example, all browsers capable of browsing html files (e.g Google Chrome, Mozilla Firefox) belong to www-browser category (virtual packages tag).

Provides: When a package is about to be configured, a virtual package may be required in order the package to be fully functional (e.g a mail client may be required). In this case, the specific package that will satisfy the dependency is not important (but the presence of a package with a specific ”Provides” field is). When both concrete packages and virtual packages are mentioned, the package manager will try first to satisfy the dependency with the concrete package and if it is not successful, virtual packages will be used.

Replaces: Under some circumstance, packages may require to replace files on other packages. At first sight this seems as an unwanted situation. Sometimes though, it is essential to replace files in order to ensure proper functionality for the package we are configuring. ”Replaces” field also a↵ect the conflict resolution as it can indicate which package should the package manager try to remove first in order to resolve the dependency.

A package is a binary bundle that contains a component, data required by the component to function properly and some metadata that describe properties and requirements for the package. Dependencies are properties of the package and therefore they are declare as well in metadata context. Let’s have a look to a typical snippet from the dependency statement of a random Debian Package:

Package: firefox

Version: 1.5.dfsg+1.5.0.1-2 ... Depends: fontconfig, psmisc,

libatk1.0-0 (>= 1.9.0), libc6 (>= 2.3.5-1) ... Suggests: xprint, firefox-gnome-support

(= 1.5.dfsg+1.5.0.1-2), latex-xft-fonts Conflicts: mozilla-firefox (<< 1.5.dfsg-1) Replaces: mozilla-firefox

Provides: www-browser...

The labels used to describe the properties and requirements of the package are pretty straight-forward. The package name is firefox, the major version number is 1.5. The package depends on

(12)

packages:fontconfig, psmisc, libatk1 and libc6 but while for the first two packages there is no version specification (or restriction), libatk1 must be in version 1.9.0 or greater and libc6 in version 2.3.5-1 or greater. In Debian packages, version requirements are expressed by using the well-known comparison operands from mathematics. Package firefox requires libtak1.0-0 on version 1.9.0 or greater and it is expressed with greater-equal operand ( ). Symbols that are used by the Debian package manager to indicate package relationships are: <<, <=, =, >= and >> and from left to right they represent strictly earlier, earlier or equal, exactly equal, later or equal and strictly later. In some cases, two or more packages that perform the same functionality but are unrelated, can satisfy a dependency. Debian system uses the pipe symbol (|) to declare alternatives in specific packages. Assuming that we want to install package P that depends on either package a or package b (presence of each package can satisfy the requirement) the dependency will be expressed as :

Package: P Depends: a|b ...

Remark: Dependencies may be more generic sometimes in terms that a package may require any package performing a certain task (e.g a web browser indicated as www-browser). Then ’Depends’ field will include a virtual package that can satisfy the requirement. Making use of already installed packages is a smart way for the package manager to take advantage of already installed packages and eliminate dependencies that must be resolved, by installing or removing additional packages.

Relations between packages, in both Debian and RPM based systems are expressed in similar way, by using package metadata. Although there are some system-related properties, we can distinguish three ”universal” relations (dependency types) in both architectures, as semantics (main ”essence”) are the same but naming may di↵er slightly:

• Depends (Debian), Requires(RPM): Describes the additional required packages that should be present in order the under-configuration package to be fully functional.

• Conflicts (Debian, RPM): Describes the packages that cannot be installed in the system at the same time with the package that the package manager is trying to configure. For a successful installation, no conflicts should be present.

• Predepends (Debian), PreReq(RPM): Describes the packages that should be already installed in the system in order to start the deployment of the new package.

2.3.3

Formalizing Dependencies

Having specified the essential context in the domain of dependencies,we are able to understand and express in a more concrete way the requirements in a formal way. Zachirolli et al. [5] describe dependency solving as a demanding aspect that require extra caution. As a result the best approach on dependency solving should be to be identified and settled as a separate concern.

Indeed, the package managers that we examined on the previous section, deal with dependencies as an isolated factor. Rpm package manager especially, is unable to deal with dependencies at all, while dpkg is able to solve simple low level dependencies. However, for complicated package installations and demanding dependency solving, higher-level tools like Apt and Aptitude are required. Before examining the way that Apt and aptitude approach dependency solving, we should express the

(13)

Remark: We follow Macinelli et al. [2] concept, claiming that Pre-depends relationships can be considered as Depends relationships when we examine the repository from the distribution editor side and not for a package installation. Therefore, we assume that Pre-depends relationships are equivalent to Depends for this case.

Having in mind subsection Sof tware P ackages, Components & Dependencies, let R = (P, D, C) be the repository with P = {a, b, c, d, e, f, g, h, i, j} declaring a set of packages, D the dependency function and C the conflicts relation. As we mentioned, dependencies for each package will be described as a set of subsets. For example, dependencies for package a may be expressed as : D(a) = {{b}, {c, d}, {d, e}, {d, f}}. Package a requires package b, either c or d, either d or e and either d or f. Respectively, conflicts may be expressed as C ={(a, b), (b, a), (d, e), (e, d), (f, g), (g, f)}. It is prominent that because of the symmetric property of relation C required for the satisfiability of repository requirements, we need to include the same packages twice with di↵erent order: (a, b) but also (b, a). We can express dependencies by using conjunctive and disjunctive form. Assuming that a is the target package that we are trying to install and b1, b2, . . . , bn the requirements of the package, a can be described as a ! b1^ b2 ^ · · · ^ bs. On the other hand, if an installation prescribes more

complex dependencies, a conjunctive form of disjunctions is required. Again, if a is the package to be installed and requirements describe alternatives (at least one of the requirement should be satisfied then the requirements are expressed as: b1

i_· · · brii with 1 i  s. The previous term is satisfied when

at least one of the bji requirements is satisfied (1 j  ri). As a result package a can be described as:

a! (b1

1_ · · · _ br11)^ · · · ^ (b1s _ · · · _ brss)

Consequently the dependencies for package a will be: D(a) ={{b1

1, . . . , br11}, . . . , {b1s, . . . , brss}

In order the dependency condition to be considered as satisfied, all terms must not be empty. If even one term is empty (? 2 D(a)), a cannot be satisfied. As far as conflicts relationship, two packages ⇡1, ⇡2 with ⇡1 = (u1, v1), ⇡2 = (u2, v2) that belong to package P ( ⇡1, ⇡2 2 P ), conflict when they

also belong to C relation ( ⇡1, ⇡22 C, order is not important as C is symmetric).

The following definitions from Macinelli et al [2], are of great importance:

Installation Definition: The installation of a repository R = (P, D, C) is a subset I of P , that gives the set of packages currently installed on a system. In order to consider an installation as valid there are two conditions that must hold:

• For every ⇡ 2 I and d 2 D(⇡), I \ d 6= ?. In simple words, every package must have all required packages and satisfy all dependencies (abundance)

• Two packages cannot conflict (I ⇥ I) \ C = ?(peace).

Installability and co-installability: A package ⇡1 belonging to a repository R is installable if a

valid installation I exists, such that ⇡1 2 I. Equivalently, a set of packages Q of R is

co-installable if there is at least a valid installation such that Q✓ I. Although each member of a set Y 2 P can be installable, this does not mean that X is also co-installable as conflicts may appear. For example, if package ⇡1depends on a, package ⇡2depends on b, but a conflicts with

b, then{⇡1, ⇡2} is no co-installable although ⇡1, ⇡2 are installable and do no conflict directly.

Dependency closure: Dependency closure for a set of packages Q of a repository R, is the smallest set of packages included in R that contains Q and is closed under the immediate dependency

(14)

function D : P(P )! P(P ) defined as:

D(Q) = [

⇡2Q d2D(⇡)

d (2.1)

Dependency closure is expressed as (Q)3 and is given by the type: (Q) = [

n 0

Dn(Q) (2.2)

Dependency closure, actually contains Q, all packages that are direct dependencies of Q, all packages that appear as direct dependencies of direct dependencies of Q etcetera.

Trimmed Repository: A repository is considered as trimmed if every package ⇡2 R is installable with respect to R. When a repository is not trimmed, it contains packages that are not installable under no configuration. Since these packages behave like they are not part of the repository, there is either an issue with the metadata on those packages or they should not be there from the beginning.

2.4

Dependency Resolution in Linux Package Managers

Solving dependencies, is a difficult, repetitive problem that is considered quite challenging. As we already mentioned, dependency solving should be considered as a separate concern from other component concerns as the goal is to separate the process of dependency solving from package managers and components. In order to provide sufficient input for the resolution of dependencies, a description of all information related to the install/upgrade problem is required including:

Installed & available components: Information regarding all known components and their status (if they are currently installed or not).

User request: Detailed list of components that are to be modified (installed, removed or upgraded) including version requirements.

User preferences: Explicit definition of the criteria based upon a user may choose a solution among others.

Both APT and Aptitude, are basically enforcing that principle, as both of those tools were mainly designed to address dependency resolution and use later dpkg to perform required package manipulations. Both tools are based on the apt suite4, a high-level library which enables C++ applications to determine what actions can be performed for a given set of installed packages. Those tools are able to perform several actions such as determining required packages, download and install them or remove others. All apt-based tools make use of algorithms that resolve recursively all unsatisfied dependencies till they are all satisfied. At the same time, the algorithms ensure that the installation process will not even be triggered in case the final state of the system might be inconsistent.

A better insight on the way APT and Aptitude are able to resolve dependencies, was provided after spending time on their official repositories and examining the source code.

3Mancinelli et al.[2] prove that since the domain of D is a complete lattice and D is a continuous function, from

(15)

2.4.1

Aptitude

Aptitude5 is using two di↵erent algorithms in order to resolve dependencies:

Immediate Resolution: For each unsatisfied dependency, aptitude is trying to resolve immediately all dependencies by following the“best-first”[6] approach, according to the following steps for each one:

1. For“Recommend” dependencies, aptitude reviews if it is a new dependency or a previously satisfied recommendation. To do so, aptitude checks for the presence of the recommended packages and if they are present the dependency is considered satisfied. Otherwise it is considered as a new dependency. There is a special option(boolean value) in configuration file (Apt::Install-Recommends) that illustrates if ”Recomend” dependencies should be taken into consideration or not.

2. If the dependency belongs to multiple packages combined with the OR logical operator, each case is handled sequentially, in order of appearance.

3. For each case, aptitude tries to resolve the dependency. If the dependency is a conflict or currently installed, the case is removed. Otherwise, it installs the candidate version of current case (if the case satisfies the dependency). If the dependency is not satisfied or if there is no candidate version6, try to install the package with the highest-priority (according to dpkg), whose candidate version provides the target required by the current case.

4. If a package was installed successfully on step 3, use the algorithm (recursively) to resolve all its dependencies and then quit.

Although this algorithm provides most of the times accurate solutions to resolve dependencies, it may be unable to solve dependencies in the following cases:

• When a package is removed due to a“conflict” dependency but other installed packages depend on the removed package. Since the package is removed, new unresolved dependencies were created but the algorithm will not try to satisfy them.

• When version restrictions apply, the algorithm takes into consideration only the candidate versions.As a result, if“depends” package is available in both versions 1.0 and 1.5, candidate version is only the 1.5 and the package we are trying to install declares that it requires at least version 1.5, the algorithm will be unable to provide a solution. In general, when immediate resolution fails, interactive resolution is required.

Interactive Resolution: When a solution for a dependency cannot be retrieved by the immediate dependency resolver, the interactive resolver is called. Aptitude uses the interactive resolver, to help the user deal with complex dependencies by indicating suggestions. Interactive resolver is nothing more than a manual resolver that helps the user address issues by asking to approve or reject specific actions that are part of a possible solution. Aptitude will choose the approved action whenever it is available(possibly ignoring other actions) but it will never choose an action that you rejected.

5Aptitude dependency resolution: https://www.debian.org/doc/manuals/aptitude/ch02s03s01.ja.html

6Candidate Version may be not present if the case is a virtual package or if the metadata describing the dependency

(16)

2.4.2

APT and comparison with Aptitude

As we have already mentioned, APT7 is a high-level package manager, that works as a wrapper for dpkg. We find APT as default tool available in multiple distributions and especially in all debian-based ones. APT comes in two di↵erent versions: Apt-get and Dselect. Their main di↵erence is that, Dselect provides a graphical user interface for the user to interact with APT: The user is able to do the package selection through the graphical interface, while all actions are actually performed by APT. Thus, even though we assume we have two di↵erent flavours of APT, Dselect is just an added interface (or wrapper) to APT for improved user experience. APT is using the same algorithmic approach as the “immediate resolution” (best-first search) we described for aptitude.

While solving dependencies, both aptitude and APT are using a priority queue with potential solutions. Based on heuristics that each tool uses, solutions are assessed and sorted based on how suitable each solution can be considered to the given state of the problem. This process is repeated recursively and on each step, the best “partial solution” is removed from the queue. If the solution is “complete”, then the algorithm terminates. The main di↵erence of the tools, is in the selection of alternatives. There are two main factors that APT and aptitude face di↵erently:

Generations of successors: It refers to the next possible solution. Although we could use all possible changes required, it is not an option as an enormous branching factor8would be created, leading to multiple changes for even one package or changes on packages that are out of scope but they were considered potentially relevant at the beginning.

What heuristics should be used for solution assessment: Even though we could use the un-satisfied dependencies, they do not provide any information regarding the sequence the di↵erent actions should be performed. Assuming the installation encounters two packages, package 1 and package 2, with package 1 depending on package 2 but package 2 is missing. The best choice would be to install package 2 instead of removing package 1. APT and aptitude are using a scoring system in order to decide when they face that kind of situations. The criteria for this scoring are the simplicity of the solution (get to a solution fast) and the quality of the solution (if it is a good and satisfactory solution).

Among Linux users, there are controversial opinions regarding the e↵ectiveness and dependency solving capabilities among two tools. Under regular circumstance, both tools can perform evenly well. During online browsing in Linux-related portals, users claimed that apt has the lead as a more robust tool in dependency solving. Although there are such similar claims in the literature, there is no concrete evidence in favor of one tool or the other. The only safe conclusion is that Aptitude is more user-friendly as it provides a better user interface but that does not mean that APT is hard to use.

2.5

Content Management Systems Structure

Content management systems are designed with a basic principle in mind: Monolithic architecture in complex systems managing di↵erent types of content, is not efficient. The main goal of a CMS system is to provide a platform that will enable the user to manage di↵erent types of content like multimedia and web pages but easy user administration capabilities as well. The ultimate aim for a Content Management system is to be a completely distributed framework that not only handles

(17)

content, but also provides administration services. For the purposes of this thesis, we are focusing on Joomla! Content Management System as it is considered one of the most popular systems in the market and Byte (the host organization) is quite interested on this specific project.

2.5.1

Core System Structure

Joomla! is one of the largest open-source projects when it comes to Content Management Systems. It appeared in 2005 and it is based on an older project called Mambo. The system became popular really fast because of the framework approach that it was developed upon and the ease in integration of extra features. After ten years in production, it is currently in version 3.4.3 and almost 3.000.000 websites are powered by Joomla!. One of the most suitable approaches to understand how Joomla is structured, is to examine the official MVC (Model-View-Controller) architectural view that the contributors of the project are o↵ering:

Figure 2.1: Joomla! MVC View

MVC model is a handy design pattern widely used in interactive software systems that illustrates the architecture of a software system. The “View” component o↵ers information to the user while the “Controller” component processes the user’s interaction creating the user interface. The “Model” component, includes the information represented by the “View” and the logic that transforms this information into responses according to the user’s actions. In Joomla! the “Model” is expressed by the database, which stores the information while the schema expresses the logic (or interconnections between elements). “View” is outlined through the interface layout that the user can actually see, while the “Controller” contains the dispatcher, routes and web server that turn all information retrieved from the “Model” into an interface “View” for the user. The real components of a typical Joomla! system are:

• The database.

(18)

• The additional packages (components, modules, plugins). • The templates.

• The website.

All content of the system (except documents and images) is stored in the database. The Joomla! Framework is a collection of open source implementations that compose the base of the Joomla! core. Additional packages can be either native (created by Joomla! contributors) or third-party (created by external developers). Those additional packages including templates are usually referred as ”Extension”, as they add features on the system. We consider templates a separate category as they are strictly connected to the layout of Joomla! and the “View” model. Components are similar to small applications that provide special features to the Content Management System. Once a user might want to add a new feature to the system, an extra component is required. Modules are light and efficient extensions that are used mainly in page rendering. They are able to display their own content or be used to project data from a component. The collection of modules installed on a system is managed from a component called “Module Manager”. Plugins are small code implementations, triggered on specific events and performing specific tasks. Templates, as we mentioned above are responsible for adjusting the appearance of the website. We can consider templates to be the decorating part of the website. Finally the website is the highest-level layer of the system and is actually the user interaction end-point. A user will insert or retrieve all required data through the website layer. An abstract representation of the mentioned layers is the following:

Figure 2.2: Joomla! Layers

2.5.2

Requirements

Joomla! requires a LAMP model (or stack) in order to be functional. LAMP is a collection of software components named after their initials: L stands for Linux, A for Apache HTTP Server, M for MySQL database management system and P for PHP programming language (as the CMS is written in object-oriented PHP). Depending on the operating system of the server that will host the system, the name of the stack may deviate and be mentioned as WAMP (if the server is based on Windows) or MAMP (if the system is based on Macintosh). Joomla! also supports PostgreSQL or Microsoft SQL Server for the database system and Nginx or Microsoft IIS as web servers that can be used. In the following table there is a short list with recommended and supported versions for Joomla!:

(19)

Software Recommended Version Minimum Version PHP 5.4 + 5.2.4 + Supported Databases: MySQL 5.0.4 + 5.0.4 + SQL Server 10.50.1600.1 + 10.50.1600.1 + PostgreSQL 8.3.18 + 8.3.18 +

Supported Web Servers:

Apache 2.x + 2.x +

Nginx 1.1 1.0

Microsogt IIS 7 7

Table 2.1: Joomla! Requirements

2.5.3

Extensions - Components

It is common that while developing web-applications, we are required to extend the basic functionality or enhance current features. Code of Joomla! is designed to be extensible in order to avoid severe modifications on core base. Thus, instead of changing core code, it is more applicable to implement an extension. Extensions are managed as self-contained applications (isolated) and when an update for core files is performed, extensions are not modified.

Although extensions are considered as isolated elements, they do not live in sealed environments and mixing di↵erent extensions is possible. Joomla! supports 4 di↵erent extension types:

Plug-ins: Plug-ins are small implementations that enable code execution, responding to certain events (Joomla! core events or custom, user-created events). With plug-ins you can extend Joomla! basic functionality and they are usually needed in multiple areas of the system. Plug-ins are also commonly used to format the output of components and/or modules.

Components: Components are the main elements that are displayed on the templates of Joomla! System. Usually they enhance user-experience by adding extra information to the main content display area and can be considered as complete (separate) software applications. Joomla! is designed to run exactly one component for each page. The core functionality of the content management system is considered as a component as well. Usually, components perform challenging back-end actions like transactions with the database. Provided that an implementation is written in PHP, there are no actual restrictions on the capabilities of the actions that components may perform.

Modules: Modules are flexible extensions that are used to achieve a reusable way of including information on the Content Management System (e.g Sidebars, Content Menus). Modules may be used multiple times, in di↵erent pages and positions that the administrator is able to specify by using the Joomla! administrative panel. Modules are also used to enhance the content of a component by applying basic style formatting.

Languages and templates can also be considered as forms of extensions, as they add functionality to current system and they are reusable elements. All the main features of extensions are summarized at the following table:

Although extensions are extremely useful, they often cause various issues during upgrades. It is hard to create forward-compatible extensions as, elements of paramount importance (core Joomla!

(20)

Plug-ins Modules Components

Complexity Low to medium Low Medium to high

Visibility Usually not visible Various Places/Pages Single Page Position Various Places Various Places Main Body of Page

Configuration Low Low Extensive

Table 2.2: Extensions Overview

components) are often changed from one version of Joomla! to another. Thus, it is vital before upgrading a Joomla! core system, to update all extensions and ensure that they are compatible with the new version that we will try to install. Otherwise we may end up with a corrupted system unable to even start.

(21)

Chapter 3

Bridging package managers in the

context of Content Management

Systems

3.1

Content management systems as applications

Content management systems belong to the web-applications family. Although web applications are becoming more and more popular,there are no package managers targeting web applications. There are some package managers related to libraries (mainly in PHP and JavaScript domain), that are capable to acquire packages but they are connected to the execution time of a web application and not in the installation of it. To be more specific, those package managers are used in order to ensure that when we start the web-application, the essential packages are in place. It is obvious that those package managers work in a di↵erent perspective than the Linux package managers, as they check for the packages as soon as the application is initiated and not beforehand. As a result we need to create a “prototype” solution that will ensure smooth installation/upgrade of a Joomla! system by also checking the prerequisites declared by the system to be installed. As a solid basis for the principles of our tool, we are going to use the dpkg package manager and especially the high level tool over dpkg: aptitude (already mentioned on sectionTypes of Dependencies) Since, information regarding Content Management Systems are scarce, extensive research to the source code and channels of the maintainers of the projects was required.

A Content Management System, follows the main principles of a regular package, as described in the previous chapter. The main package is composed of the core components of the system (as we described on the Joomla! MVC d), while the whole system requires some technologies to be present in order to be functional. In this chapter we will map the research questions to the context of Content management systems and especially Joomla!.

3.2

Dependencies on Content Management Systems

Real dependency solving systems like APT or Aptitude are extremely complex and reasoning on the way those tools resolve dependencies can be hard to describe. In order to describe the dependencies

(22)

found on Content Management Systems, we need to introduce an abstract model1capable of describing dependency relationships in a more generic way. As we described in Chapter 2, there are three types of dependencies that are found (regardless the implementation of package manager) that we encounter:

1. Pre-depends 2. Depends 3. Conflicts

3.2.1

Pre-Depends

In the context of Content Management systems, P re depends dependencies, are expected to be related with the configuration of the server that we are required to install or upgrade the system and especially with packages that must be installed before initiating the installation. In case of an upgrade though, Pre-depends expectations are enriched with packages that belong to the current installed system, but we will talk about that later on. Joomla! requires explicitly a LAMP stack with specific packages and versions installed, otherwise the Content Management System cannot function properly. 2. As a result, first step towards a successful installation is to ensure that Pre-depends dependencies are efficiently covered.

There are three main dependencies (as described in background) in this category that we have to satisfy:

1. PHP package installed on the web-server.

2. Database package (MySQL, SQLite, PostgreSQL) that provides the database for the system. 3. Web-server Package (Apache, Nginx, Microsoft IIS)

Satisfying Pre-depends for a new “clean” installation is relatively easy by ensuring that general requirements for the current version are satisfied. On the other hand, when a pre-installed system needs to be updated, the dependency handling can di↵er significantly. Joomla! maintains a web-site 3 where all information regarding upgrades from one version to a newer one is provided. For most versions, there are special entries declaring the minimum requirements for the PHP package, but the other packages are not mentioned. In general the files provided by Joomla! are not well structured and essential data is missing or declared in a vague way. The prototype resolves P re depends dependencies for initial installations but for upgrades as well4.

3.2.2

Depends

There are two di↵erent cases we should examine in this category of dependencies: The case when a fresh installation is performed and the case we want to upgrade a previously installed system.

1According to D.Barrows [6], package dependencies can be reduced to satisfaction of Boolean equations, but this

approach is too extreme and unsuitable for explaining the original problem of resolving dependencies, as the reasoning is hidden by the mathematical model that is produced.

2Due to ”Loose Coupling” between packages in web-applications, we can deploy the files manually if the infrastructure

does not satisfy the requirements set by the application, but the application will not be able to function properly

3Joomla! update website: http://update.joomla.org/core/ 4Chapter 4: First Installation&Provide Upgrade Suggestions

(23)

New installation: Depends dependencies, are related only to the packages-to-be installed. In other words, once we verify the integrity5of the main package containing Joomla! system, we expect the system not to require any further actions.

Upgrade previous installed system: Joomla! packages provided for upgrades, contain only upgraded or new components. As a result the presence of components from the original installation is required.

In order to internally verify the presence of libraries and core packages,Joomla! is using a PHP package manager called Composer6. Composer is actually a dependency manager and not a package manager per se, as it is able to install required packages per project and not globally on the system. Although having a package manager is of paramount importance to ensure coherence, “Composer” can only verify the presence of files and not their integrity.

3.2.3

Conflicts

Conf icts dependencies, describe the packages that cannot co-exist with the package to be installed. It is common, components that are not maintained regularly to cause problems in systems after upgrades, as they are not forward-compatible. Assuming that we want to perform an upgrade to a given system and a component that is not supported from the newer system is present, the final system will probably fail. There is an explicit notification during web-based upgrade to first upgrade all the components(extensions in case of Joomla!) and then perform a core system upgrade. For each component, there are metadata files declaring which versions they support. The main reason that components can “break” a Joomla! system is that when a major version is released, new features are added and enhancements to existing implementations are introduced. Thus, components fail due to changes in architecture of Joomla! system.

3.3

Current Solutions for CMS monitoring/upgrade

One of the most demanding parts or this thesis was to investigate if current solutions that are already in the market and are capable of supporting maintenance operations on CMS systems, would be useful for our tool. There are currently several solutions available that target di↵erent CMS systems (e.g Watchful.li7or Installatron8) but there are some restrictions that do not enable us to use them as-are: • Those tools require actions from the users like installing components to their CMS for enabling monitoring or creating all the background environment (databases, ftp accounts etc) manually. • They can have a substantial cost while their functionality is allowed for limited number of users. • Integration in the Byte Service panel is not always possible in an acceptable way for two reasons: 1. Due to implementation bottlenecks as some solutions can only be used from their

environment as no access by external resources (e.g API’s) is allowed.

2. It is important to avoid redirecting the user to a di↵erent location with di↵erent user interface, as this can degrade the user experience quality.

5We can easily verify the integrity of a Joomla! compressed file containing the system by comparing the MD5 hash

from the official release site and from the downloaded file

6Official Website of Composer: https://getcomposer.org/ 7Watchfulli.li Joomla! Monitoring website: https://watchful.li/ 8Installatron one-click website: http://installatron.com/

(24)

• Limited number of tools supporting multiple platforms (one-for-all CMS’s tool).

Some tools like Installatron, can both support multiple platforms, but also provide external access. Thus integration in Byte Service Panel can be achieved by using this tool. For all those reasons, we decided to create a new tool under the name ”Cms Tools” which although for the setup part is using Installatron, the implementation to prepare the environment,the monitoring and the dependency resolution were created from scratch.

(25)

Chapter 4

The Prototype Maintenance Tool

We might consider a CMS system like Joomla! to be an entity composed of di↵erent components. CMS systems enhance functionality in new versions by adding new features and resolving known is-sues. We are aiming towards a complete tool, that will be able to perform initial installation, check for new versions, examine validity of versions that our system is safe to update to and make suggestions on the upgrade path. In this chapter the design and implementation decisions regarding the tool are described, applying the research findings regarding dependency solving and package managers main principles.

Remark: The maintenance tool is mainly written in Python, supported by Django Framework, mak-ing extensive use of RabbitMQ and Celery task distribution for task management and asynchronous execution. In order to connect di↵erent systems of hosting organisation, we used REST Framework creating API’s for some implementations (detailed analysis on each solution). All implementations are extensively tested using unit tests (nosetests), source code analysis coverage and integration tests (in some cases). More information regarding the tools that were used, can be found on Appendices. The tool source code is more that 1.700 lines while supported by more that 2.700 lines of unit tests. In total Radon software metrics1tool counted about 4.460 lines of code written.

4.1

First Installation

The product owner of the Installer at Byte, B.V provided some user stories regarding the way that the user would be able to interact with the system. The user stories cover all di↵erent Content Management Systems (Joomla!, Drupal, WordPress and Magento) for cluster (shared) hosting packages and are described as follows:

1. As a customer, I want to be able to install the newest version of the CMS I want to use on my hosting package via the Service Panel of Byte.

• I want to be able to choose where I want to install the CMS • I want to be able to install more than 1 CMS on 1 package • I want to be able to see my credentials

• I want to be able to go to the admin panel in 1 click after I installed the CMS

(26)

In order to proceed to the maintenance part, we need a Content Management System to be installed and configured. CMS systems are usually installed manually by the user as access to filesystem of the hosting server is required but also a database where the content(except from multimedia and documents) of the CMS is stored. The hosting organisation of this project (Byte B.V), decided to provide their customers with an automated tool called “Installabot”, that will be able to perform a full installation of a CMS system. In order to automate the process there where two major problems: the password hashing that CMS factories do while creating accounts and the migration of database as prefixes and user details are created upon installation. To overcome this bottleneck, Byte turned to an external Application Programming Interface (API) that is performing the basic part of installation. But first, a lot of important issues had to be allocated. To begin with, the Service Panel (user panel for hosting account administration) that Byte o↵ers, has not direct access to the file system of the user. As a result the installation should be performed remotely. All described tasks are grouped as a sequential chain using Celery in order to ensure correct sequential execution but also to track more precisely potential errors by using error reporting of Celery back-end server. The basic steps that Installabot (the name of the Byte Installer) follows are:

Figure 4.1: Installabot view for Joomla! from the Byte Service Panel

• Preliminary checks: Users have a limit on the number of databases they can own and only if their hosting plan allows the creation of a new database, the installation will begin. The installer also checks for the current version of PHP and MySQL that the server is running and if they comply with CMS requirements. Finally the installer validates that the domain that the user chose to use is hosted by Byte.

• FTP access creation: Installabot creates a new FTP user with read/write access on the required path that the user has predefined and ensures that this FTP user can login successfully. • Database and database user creation: The installer is able to create a new database, a

(27)

• Installation with Remote API call: Installabot calls the installer with all the necessary parameters such as ftp credentials, database credentials, type of CMS, domain and path that the user chose for the installation.

• Monitoring Installation Progress: After a successful call to the API, Installabot tracks progress of the installation, showing relevant information to the user. Once the installation is done, random-generated credentials are returned to the user to login to the CMS administrator’s page. If the installation fails for some reason, relevant messages appear and Byte technical team is notified.

• Clean Up: Regardless if the installation succeeds or fails, Installabot will delete the FTP access point that it created and also finish logging with details for success of failure.

Installabot supports multiple CMS systems and user is able to choose among Joomla!, Wordpress, Magento or Drupal (according to the hosting plan).

4.2

CMS System Discovery & Listing

Once the user has completed the installation of CMS system(either with automated or manual installation), the system needs to be monitored for possible updates that will enhance it’s functionality in future. The hosting organisation, asked for a complete overview that will be able to list the domains that a user owns and illustrate appropriate messages according to the status or actions required for each system. But first we have to detect the installed systems!

4.2.1

Tool Decomposition

The tool was designed and implemented in compliance to the infrastructure of the hosting organisation, thus specific systems and tools should be used. To begin with, the tool shall be able to scan all domains for a given user and discover the installed Content Management Systems. It is important to explain at this point, the structure of the system that we are building the tool for. Assuming that we own the domain and the domain is hosted in Byte, a root folder named after a username is created and it contains a folder named after the domain. For example if the username is “user” and the domain name is “uva.nl”, the home directory of the user is named “user” and the domain folder “uva.nl” is located in the home folder. For each sub domain the same applies: di↵erent folders are created under the user home directory. As a result, we have to scan all those folder to check for CMS existence, as users may have multiple systems in all domains or subdomains (more details regarding the scan will be discussed later). In order to store the results of the discovery and make them easily accessible, a new database is required. Byte has a strict policy of using dedicated databases for each application, but also that the Service Panel does not perform database transactions directly. A middle-ware implementation called “Pino” is used to make contact between the databases and the service panel. “Pino” is nothing more that a set of API’s designed to perform internal actions and return the results of those. Since we are creating a new tool, a new database was created but also a new API serving the purpose of our application. The instrumentation of the whole “scan and update” task shall be assigned to the task queue (controlled by Celery) that will ensure sequential execution and tracking of the results. Byte is currently providing customers with two di↵erent types of hosting packages: a clustrial one and a managed platform called “hypernode”. As a result both platforms shall be supported by the tool. Finally the scanning of CMS’s but also the check to ensure that they are in the latest version,

(28)

shall be done in an automated way(without requiring manual interaction). The same applies for the retrieval of latest versions for each CMS. In order to design the tool in a more efficient way, we should take into consideration the user stories that were provided for the tool before hand by the marketing department of Byte:

• As a customer, I want to be able to see if all of my CMSs are up-to-date.

• I want the ServicePanel to show me what the latest version of my CMS is and what version Im using for all of my domains and for all of the CMSs (in case I have installed more than one CMS on a hosting package)

• I want the ServicePanel to show me how I can update my CMS if necessary (through a KnowledgeBase article)

Keeping in mind those restrictions and suggestions, we can describe some key requirements that the tool must fulfill:

• The tool shall be accessible through Byte Service Panel.

• Database transactions shall be performed through the internal Byte API infrastructure called “Pino”.

• File scanning shall done in the web servers as for security reasons nor Service Panel or “Pino” have access to the file system of the user.

• The update task shall be performed as an asynchronous task in order to track status and to ensure correct execution as di↵erent infrastructures are taking part in the whole procedure. • The tool shall be compatible with all di↵erent platforms of the hosting organisation.

• Latest version and installed systems database entries for each CMS shall be updated automati-cally to the latest values without human interaction. Scanning shall be available to be triggered manually as well.

A first approach from architectural aspect can be illustrated in the following figure.

Figure 4.2: Byte Infrastructure Architecture

User is able to interact with the service panel. The Service panel can only interact with “Pino” (which manages databases) and “Pino” can also interact with Byte web servers. The database on the

(29)

subdomains, packets et c. With CMS DB we indicate the new database that will host the detected Content Management Systems. Those two databases are of course strictly connected as they are sharing a foreign key: The Domain. The structure of the database is described in the following ER-relationship figure:

Figure 4.3: Cms Tools Database Schema

Django, the framework that the tool was based on, uses a Object-role-modeling approach for database transactions. As a result the database tables are mentioned as models and the entries of the database are actually represented by instances of those models. More information on the ORM tha Django framework is using can be found online2. For the new database of the prototype, three tables were used:

CmsDomain: Contains the domains that are introduced to the system (after detecting a CMS installed on them) and are instances of Hybrid Domain model. Hybrid domain is an implementation of Byte that actually contains logic to ensure that the domain is also present in the main database of the domains (works as a validator). The attributes of this domain are set according to the specifications of Byte and some key elements are the name of the domain (e.g uva.nl), a foreign-key to the customer that is the owner of the domain, the cluster that hosts the domain etc.

CmsSupportedVersions: This table contains all the information regarding the CMS systems that our tool supports. For each CMS we store the name of the CMS, the latest version that is currently available by the authors, the date when the latest version check was performed and the link to the knowledge base of hosting organization for the particular CMS.

CmsSystem: This table contains the entries of each CMS detected. For each system entry there is a foreign key to CmsDomain represented by domain attribute, a cms attribute with a foreign key to CmsSupportedVersion, the full path where the system in installed and the currently installed version.

4.2.2

Retrieve Version(s) Information

Content Management Systems, are stored in web server folders under the defined structure that the creators have specified beforehand. Each CMS system, contains a file that illustrates the version of the current release that is installed on the web server. After research on official repositories of Content

(30)

Management Systems and source code analysis, we were able to determine which file contains version-related information and how those versions are spread in the file as it is usual to store subversions under di↵erent names (e.g Dev Level or Minor Version). Version files, are text files containing either strings or variable declarations (in string format) that describe the version of currently installed CMS. In order to match and extract versions, regular expressions are used, while the version itself is considered as string because some systems use a versioning scheme in xx.xx.xx format and float numbers cannot be an option. For Joomla, the file containing the version information is located in path “libraries/cms/version/” under “version.php” file name. The file content is illustrated in the following snippet:

1 /** @var string Product name. */

2 public $PRODUCT = ’Joomla!’;

3 /** @var string Release version. */

4 public $RELEASE = ’3.4’;

5 /** @var string Maintenance version. */

6 public $DEV_LEVEL = ’3’;

7 /** @var string Development STATUS. */

8 public $DEV_STATUS = ’Stable’;

Listing 4.1: Joomla! Version File

Joomla! is written in PHP, thus the file has a PHP format. As comments describe, the release version is 3.4 and the development level is 3. Joomla! is combining those two elements to declare the version (the version in this case is 3.4.3). For capturing the version in a valid way, we used two regular expressions:

• One to capture the release version: ^\s*public \$RELEASE = ’([0-9\.]+)’;$ • One to capture the development level: ^\s*public \$DEV_LEVEL = ’([0-9\.]+)’;$

Both regular expressions capture the part of the string contained in the quoted(‘’) part after the equation. In our example, the first expression captures 3.4 and the second one 3. Next step is to join those two strings and return the result as the detected version. The wrapper function does this join process by using the join built-in python function. Following the same approach, we were able to detect(after Byte request) versions for di↵erent CMS systems like Drupal, WordPress and Magento.

4.2.3

System Scan (Folder Traversal)

Locating the version files can be a long-lasting but also painful task from resources aspect. Assuming that a user may run multiple CMS systems from one hosting account (either using sub-domains or sub-folders), we have to track and display all of them no matter what their location is. In order to achieve that, we have to retrieve all folders and traverse through them. The hosting organisation is using a classic folder structure in the web servers and the home folder for each user, contains all domains and sub-domains that he/she owns (as described in tool decomposition). With a view to discover as many CMS systems possible, while keeping a balance in cost of the process (time and resources), we have to take the depth of this search into consideration. We introduce the term ”depth” as we will emulate the search procedure from trees. Assuming that the root (base) folder is the root of the tree, each node growing away from the root, illustrates a lower-level from the root. Consequently, each node growing from a node is considered a level lower that the previous one.

Referenties

GERELATEERDE DOCUMENTEN

Deze meldt drie vormen van Xanthomonas die Anthurium kunnen aantasten: een met een brede waardplantenreeks, virulent (ziekmakend) voor Anthurium, een die virulent voor Syngonium

niet van het Belgische Plioceen, maar Wood (1856: 19) noemt de soort wel van Engelse Midden Pliocene

gebied zijn diverse ontsluitingen met afzettingen uit het Stampien, de lokale benaming voor het Rupelien, een etage van het Oligoceen. De meeste klassieke vindplaatsen met o.a

Also from other systems, such as Cu-Ni-Ti and Fe-Ni-Ti [ 111, we have evidence that diffusion couples in which polyphase diffusion zones occur, or in which

However, a decision maker will in general be more interested in solutions to linear programming problems which have both flexibility properties and an acceptable

Cape dovecots and fowl,runs is well-presented and an excellent contribu- tion to the stock of books on vernacular architecture in South

Sjoerd Posthumus is twee jaar terug met standweiden gestart naar aanleiding van ganzenvraat. Doordat er minder gras stond is een groter perceel gegeven. Het systeem bleek

Er moeten vroeger veel stenen ge­ legen hebben , die door de landbouwers v erwijderd zijn.. De stenen breng ik te­ rug, die