Visualization of graphs and trees for software analysis

(1)

DOI:

10.6100/IR642975

Document status and date: Published: 01/01/2009 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

(2)

Visualization of Graphs and Trees

for Software Analysis

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de rector magnificus, prof.dr.ir. C.J. van Duijn, voor een

commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op

woensdag 24 juni 2009 om 16.00 uur

door

Danny Hubertus Rosalia Holten

(3)

Dit proefschrift is goedgekeurd door de promotoren:

prof.dr.ir. J.J. van Wijk en

prof.dr. A. van Deursen

A catalogue record is available from the Eindhoven University of Technology Library.

(4)

(5)

Eerste promotor:

prof.dr.ir. J.J. van Wijk (Technische Universiteit Eindhoven)

Tweede promotor:

prof.dr. A. van Deursen (Technische Universiteit Delft)

Kerncommissie:

prof.dr. M.T. de Berg (Technische Universiteit Eindhoven)

prof.dr. J.-D. Fekete (INRIA Unit´e de Recherche Saclay – ˆIle-de-France) dr. T.M. Munzner (University of British Columbia)

This work was financially supported by the Netherlands Organisation for Scientific Re-search (NWO) as part of the RECONSTRUCTORproject. NWO grant number 638.001.408.

This work was carried out in the ASCI graduate school (Advanced School for Computing and Imaging). ASCI dissertation series number 181.

c

Published by Technische Universiteit Eindhoven

Typeset in LA_TEX

(6)

1.4 Outline . . . 7 1.5 Related Publications . . . 8 1.5.1 Primary Papers . . . 8 1.5.2 Secondary Papers . . . 9 1.5.3 Tertiary Papers . . . 9 2 Background 11 2.1 Software Engineering . . . 11 2.2 Program Understanding . . . 13 2.3 Data Extraction . . . 14 2.3.1 Static Analysis . . . 14 2.3.2 Dynamic Analysis . . . 15 2.4 Visualization . . . 16 2.4.1 Information Visualization . . . 16 2.4.2 Software Visualization . . . 20

3 HEBs – Hierarchical Edge Bundles 29 3.1 Overview . . . 29

3.2 Introduction . . . 29

3.3 Related Work . . . 31

3.3.1 Tree Visualization Techniques . . . 31

3.3.2 Displaying Adjacency Relations . . . 32

3.3.3 Edge Aggregation Techniques . . . 34

3.4 Hierarchical Edge Bundles . . . 35

3.4.1 Principle . . . 35

3.4.2 Spline Models . . . 36

3.4.3 Rendering . . . 39

3.5 Results . . . 41

(7)

3.5.2 Performance . . . 47

3.6 Conclusions . . . 47

3.7 Future Work . . . 48

3.8 Adoption . . . 48

3.9 Acknowledgements . . . 50

4 Trace Visualization Using HEBs Views and MSVs 51 4.1 Overview . . . 51 4.2 Introduction . . . 51 4.3 Related Work . . . 53 4.4 EXTRAVIS . . . 55 4.4.1 Input Data . . . 56 4.4.2 HEBs View . . . 56

4.4.3 MSV – Massive Sequence View . . . 58

4.4.4 Synchronized Interaction . . . 59

4.5 HEBs View . . . 60

4.6 MSV – Massive Sequence View . . . 63

4.7 EXTRAVISUsage . . . 66

4.7.1 Trace Exploration . . . 66

4.7.2 Feature Location . . . 66

4.7.3 Top-Down Program Comprehension with Domain Knowledge . . 67

4.8 EXTRAVISEvaluation . . . 68

5 Visual Tree Comparison Using HEBs 71 5.1 Overview . . . 71

5.3.1 Tree Visualization . . . 72

5.3.2 Visual Comparison of Trees . . . 73

5.4 Visualization Design . . . 74 5.4.1 Input Data . . . 75 5.4.2 Visualization Layout . . . 75 5.4.3 Hierarchy Sorting . . . 77 5.4.4 User Interaction . . . 81 5.5 Results . . . 81 5.5.1 HEBs . . . 81

5.5.2 Bundling at Specified Levels . . . 82

5.5.3 Interacting with the Visualization . . . 82

5.5.4 Comparing Larger Data Sets . . . 83

(8)

CONTENTS iii

6 FDEBs – Force-Directed Edge Bundles 87

6.1 Overview . . . 87

6.3.1 Node-Link-Based Graph Visualization . . . 89

6.3.2 General Edge Clutter Reduction . . . 89

6.3.3 Edge Clutter Reduction using Edge Bundling . . . 90

6.4 FDEBs – Force-Directed Edge Bundles . . . 91

6.4.1 Main Technique . . . 92

6.4.2 Edge Compatibility Measures . . . 93

6.4.3 Calculation . . . 95

6.5 Results . . . 96

6.5.1 Rendering . . . 96

6.5.2 Visualization Examples and Comparison . . . 97

6.5.3 Bundle Straightening . . . 100

7 Visualization of Directed Edges in Graphs 103 7.1 Overview . . . 103

7.4 Graph Generation . . . 105

7.5 First User Study . . . 106

7.5.1 Single-Cue Directed-Edge Representations . . . 106

7.5.2 Hypotheses . . . 108

7.5.3 Study Setup . . . 109

7.5.4 Statistical Analysis and Results . . . 112

7.6 Follow-Up User Study . . . 115

7.6.1 Multi-Cue Directed-Edge Representations . . . 116

7.6.2 Hypotheses . . . 117

7.6.3 Study Setup . . . 117

7.6.4 Statistical Analysis and Results . . . 117

7.7 Recommendations . . . 119 7.8 Conclusions . . . 119 7.9 Future Work . . . 119 7.10 Acknowledgements . . . 120 8 Conclusions 121 8.1 Contributions . . . 121 8.2 Future Work . . . 124 Bibliography 127 Publications 147

(9)

Summary 149

Samenvatting 151

Dankwoord 153

(10)

Chapter 1

Introduction

Spurred by the ever-increasing demand to manipulate large amounts of information as quickly as possible, the rapid advancement in computer technology that took place in the second half of the 20th century has led to vast increases in data storage capacity and processing speed. Nowadays, personal computers can easily store hundreds of gigabytes worth of information and perform millions of complex calculations every second, some-thing which was previously the exclusive domain of mainframes and supercomputers.

Because this advancement in computer hardware allows more complex and demand-ing tasks to be performed, computer software has grown larger and more complex as well. On the other hand, the development of larger and more complex software also demands more powerful hardware. This leads to a positive feedback loop between hardware and software advancement. Large software packages and modern operating systems typically consist of tens of millions of lines of source code. In structure, content, and functional-ity, software systems are therefore sometimes considered to be among the most complex artifacts ever created [141].

Once a software system has been created, it needs to be maintained. Typical software maintenance tasks are correcting bugs, improving performance, implementing new func-tionality, as well as refactoring the software to improve its internal structure and thereby its maintainability. It is estimated that software maintenance accounts for up to 90% of the software engineering costs [69]. Furthermore, software engineers involved in software maintenance are estimated to constitute up to 80% of the personnel [35]. It is therefore of vital importance that software is not only regarded as an artifact that can be executed by a computer, but also as information that needs to be understood by people [53].

Understanding a software system implies studying the source code and documenta-tion in order to gain a level of understanding that is sufficient for a given maintenance task. However, additional information such as documentation and other design artifacts is often not available, outdated, or inappropriate for the task at hand. As a result, the process of program understanding is considered to be ill-defined and often very time-consuming [212]; it is estimated that up to 60% of a software maintenance task is spent on under-standing the software [11].

(11)

amounts of source code manually, which can quickly lead to information overload. Fur-thermore, it does not necessarily provide them with a high-level overview of the structure of the software system. To complicate matters further, a software engineer might not di-rectly know what he is looking for. He might be interested in unexpected function calls, for instance, but because it is hard to clearly specify what this pertains to, it is not straight-forward to automatically identify such information. It would therefore be beneficial to provide software engineers with a way to quickly and interactively explore software sys-tems at a higher level of abstraction. This allows them to get an idea of the organization of the system as well as the way in which parts of the system interact.

The research area of visualization can play an important role in realizing this. Visu-alization can be defined as the use of computer-supported, interactive, visual representa-tions of data to amplify cognition [26]. Visualization enables the comprehension of large amounts of data due to the high “bandwidth” of the Human Visual System (HVS). Fur-thermore, visualization takes advantage of the excellent pattern recognition abilities of the HVS and its ability to perform certain tasks preattentively. Preattentive processing refers to the ability of the HVS to rapidly identify certain visual properties. An example is the fast and effortless identification of a single red circle in the presence of a large group of black circles. Finally, visualization allows for the identification of emergent properties that were not anticipated [221].

Visualization is generally split into scientific visualization and information visualiza-tion. Scientific visualization is concerned with visualizing entities that have a physical representation in the real world. Examples are the three-dimensional visualization of medical data such as MRI scans and the visualization of fluid flow around an object. In-formation visualization is concerned with visualizing abstract data that does not have a physical representation, such as numbers, text, or points in a high-dimensional space.

Software visualization is therefore a subfield of information visualization and pertains to the visualization of abstract artifacts related to software and its development process. In short, software visualization is concerned with visualizing the structure, behavior, and evolution of software [52].

1.1 Motivation

Software visualization has been used to visualize many different aspects of software sys-tems and their related artifacts. A survey of this is presented in [Diehl, 2007] [52]; the following are examples regarding the aforementioned visualization of structure, behavior, and evolution.

Software metrics measure a specific software property, such as the number of lines of code or the number of bugs per line of code. The visualization of software metrics is an example of structural software visualization; the metrics pertain to the file and source code structure of a software system. Examples of the visualization of the behavior of software are animated algorithm visualization and the visualization of state transition graphs that are used to model system behavior [173]. Finally, the visualization of source code and additional data stored in software repositories is an example of software visualization concerned with visualizing changes in software systems over time [215].

(12)

1.1. MOTIVATION 3

Figure 1.1: The hierarchical organization (black) and call graph (red) of part of the soft-ware of a medical scanner by Philips Medical Systems, Eindhoven. The visualization was generated using the dot layout module of AT&T’s Graphviz package [66, 86]. This relatively small example consisting of 191 hierarchical elements and 334 call graph edges shows that a clutter-free visualization of both the hierarchy as well as additional relations on top of that hierarchy is not trivial.

The work in this thesis has been carried out as part of the NWO RECONSTRUCTOR

project, which aims at raising the state of the art in software architecture reconstruction by taking the Symphony software architecture reconstruction process as its starting point [50]. A software architecture is a high-level abstraction of a software system, which is indispensable for many software engineering tasks. Software architecture reconstruction is concerned with obtaining architectural information from an existing software system. Interactive visualization of the obtained information is one of the issues addressed by RECONSTRUCTOR.

In this thesis, we therefore focus on the visualization of the high-level organizational (hierarchical) structure of a software system as well as additional relations between the elements composing this structure. Software systems generally possess a tree structure

(13)

Figure 1.2: Visual comparison of two hierarchically organized software systems. The hierarchies comprise two different versions A and B of a software system. The relations between matching elements of A and B constitute the adjacency relations. A visualization technique capable of effectively visualizing trees and adjacency relations could also be applied to the visual comparison of hierarchically organized data.

(also called a hierarchy or containment), e.g., source code divided into directories, files, and classes. Dependency relations and function calls are examples of additional relations between the elements of this hierarchy. These additional relations are called adjacency relations. Simultaneously visualizing a tree structure and the adjacency relations on top of that tree structure is not trivial. As is illustrated in Figure 1.1, a straightforward visu-alization of both the hierarchical organization of a software system as well as its function call graph leads to a cluttered visualization, even in case of a relatively small software system. Furthermore, only few techniques are available that are specifically designed to display adjacency relations on top of a tree, as is also mentioned by Neumann et al. [158]. The combination of a tree and adjacency relations is not only relevant for the visual-ization of the hierarchical organvisual-ization of a software system and the relations between its elements, but also for the structural comparison of two hierarchically organized software systems as shown in Figure 1.2. In this case, the hierarchical components comprise two different versions A and B of a software system. The relations between matching ele-ments of A and B constitute the adjacency relations. A visualization technique capable of effectively visualizing trees and adjacency relations can therefore also be applied to the visual comparison of hierarchically organized data.

Finally, the effective visualization of hierarchically organized data that contains ad-ditional relations between the hierarchy items is not limited to software visualization. It is applicable to other types of data as well, since data comprising a tree and additional relations on top of it does not only pertain to software systems. An example of this are social networks comprised of individuals at the lowest level of the hierarchy and groups of

(14)

1.2. OBJECTIVE 5

individuals at higher levels. Relations could indicate if (groups of) people are acquainted. Another example is a hierarchically organized citation network consisting of publications at the lowest level of the hierarchy and departments and institutes at higher levels. Links between publications indicate one publication citing the other. Even this thesis is an ex-ample of such data: its organization into chapters, sections and subsections constitutes the hierarchy, while the adjacency relations comprise citings and references to chapters, sections, figures, and equations.

The visual comparison of hierarchically organized data is not just limited to software visualization either, since the comparison of hierarchies is also relevant for other areas. Examples are the structural comparison of phylogenetic trees in evolutionary biology and the comparison of large organization charts in the area of human resources and personnel.

1.2 Objective

The main research question addressed in this thesis is as follows:

How can users be enabled to understand the large amounts of information relevant for program understanding using visual representations?

Throughout this thesis, we take an iterative, experimental approach in answering this question. Our approach is characterized by first determining the requirements for the visualization based on each (sub)problem that we are trying to solve. A prototype appli-cation is subsequently developed to implement the desired visualization technique. Based on our own experience during the development of the prototype, feedback gathered from potential users, or formal evaluations by means of controlled user studies, we arrive at a set of recommendations regarding the application of the developed technique.

As mentioned before, we furthermore aim at keeping each visualization technique as generally applicable as possible, since a large class of data exists that fits the pattern of hierarchical data with additional relations between the hierarchy items. The visualization techniques should therefore not only be relevant to software analysis, but to other types of visualization-supported analysis in general.

1.3 Software Visualization Requirements

In the next chapter, we discuss the problem of program understanding in more detail. We describe how visualization can be used to support program understanding and present a general overview of related work in the area of software visualization. Section 2.4.2 of the next chapter touches upon several issues that affect many of the current software ization techniques. Based on these issues, we identified a set of requirements that visual-ization techniques should adhere to for an effective depiction of the structure of a software system and the relations between the elements of this structure. These requirements, as well as the rationale behind them, are presented below. The general applicability of the visualization techniques that are to be developed with regard to information visualization as a whole is taken into consideration as well.

(15)

Although the requirements presented below are important requirements for software visualization, they can nevertheless be ordered based on how specifically they apply to software visualization. As such, the first three requirements are very specific to software visualization, the fourth requirement is also applicable to the field of graph visualization, and the last two requirements apply to information visualization in general as well.

• Software visualizations should use an appropriate structural representation Using the textual outline of the source code as a structural representation leads to visualizations that are recognizable to programmers, but these are too low-level for software architecture visualizations. UML class diagrams furthermore suffer from scalability problems and should therefore only be used for small/partial software systems, e.g., a dozen interrelated classes. Alternative high-level organizations, e.g., Java package-class-method hierarchies, are better suited for software architec-ture visualizations;

• Execution trace visualizations should employ techniques to improve the visibility of individual calls as well as outliers

Execution trace visualizations are often so densely packed that multiple function calls are mapped onto the same screen area. Guaranteed visibility of hundreds of thousands of calls as well as outliers on a single screen becomes a problem. Importance-based drawing techniques could be used to address this issue;

• Software visualizations should be integrated into IDEs to facilitate usage and adoption

Integrating software visualization techniques into Integrated Development Environ-ments (IDEs) facilitates the data extraction step. IDEs can provide APIs to obtain static information such as the structural organization of the source code as well as the interrelationship between source code elements. Additional dynamic informa-tion can furthermore be obtained by executing a program in debug mode. Finally, direct integration into an IDE provides an easy way to link back to the source code from within a visualization;

• Software visualizations should provide a clutter-free depiction of the hierarchy and the adjacency relations

Straightforward visualizations of a software system hierarchy as well as its function call graph suffer from visual clutter (see Figure 1.1). Only few techniques are available that are specifically designed to display adjacency relations on top of a hierarchy [158]. A technique that is capable of depicting a hierarchy and additional adjacency relations in a clutter-free way is therefore of vital importance for software architecture visualization;

• Software visualizations should provide as much information as possible without visually overloading the user

2.5D techniques can be used to give elements a 3D appearance, e.g., by using (cush-ion) shading, which makes graphs and diagrams easier to interpret and remember. Texture can furthermore be combined with color to provide additional informa-tion without the need to use bivariate chromatic maps, which are difficult to read.

(16)

1.4. OUTLINE 7

Furthermore, using a 2.5D instead of a full 3D representation alleviates occlusion problems and addresses the issue that 3D spaces can be difficult to understand and navigate;

• The visualization techniques should be kept as generally applicable as possible The visualization techniques developed throughout this thesis should not only be relevant to software analysis, but to other types of visualization-supported analysis in general, since a large class of data exists that fits the pattern of hierarchical data with additional relations.

1.4 Outline

In addition to the software visualization requirements presented in the previous section, Chapters 3 to 7 contain the main contributions of this thesis. In Chapters 3 to 6, four new visualization techniques are presented. Chapter 7 presents a user study on the visualiza-tion of directed edges in graphs.

Chapter 3 presents a generic method for the visualization of compound graphs, i.e., graphs comprising a hierarchical component and additional adjacency relations. It is based on visually bundling adjacency edges together. This is realized by showing the hierarchy via a standard tree visualization method and the adjacency edges as curved lines that follow the hierarchy. Using these Hierarchical Edge Bundles (HEBs) to bundle adjacency edges together reduces visual clutter. HEBs also visualize implicit adjacency edges between parent nodes that are the result of explicit adjacency edges between their respective child nodes.

Chapter 4 illustrates how HEBs presented as a generic technique in Chapter 3 can be used to visualize the structure of hierarchically organized software systems as well as the call relations between the hierarchy elements. An additional Massive Sequence View (MSV) is introduced to enable analysis of sequences of call relations, i.e., function call sequences that were gathered from running programs in the form of execution traces. By using a special rendering technique, the MSV is capable of depicting hundreds of thousands of calls on a single screen.

Chapter 5 describes how HEBs can furthermore be used to support the visual com-parison of hierarchically organized data. The presented technique visualizes a pair of hierarchies that are to be compared and simultaneously depicts how these hierarchies are related by explicitly visualizing the relations between matching subhierarchies. The rela-tions between hierarchy elements are visualized using HEBs, which reduce visual clutter, visually emphasize splits, joins, and relocations of subhierarchies, and provide an intu-itive way in which users can interact with the relations. The focus throughout this chapter is on the comparison of different versions of hierarchically organized software systems, but the technique is applicable to other kinds of hierarchical data as well.

HEBs described in Chapters 3 to 5 show how edge bundling can effectively be used to significantly reduce visual clutter resulting from edge congestion. However, HEBs require a hierarchy to perform the bundling. In Chapter 6, we present a new edge bundling method that uses a self-organizing approach to bundling in which edges are modeled as flexible springs that can attract each other. In contrast to previous methods, no hierarchy

(17)

is required. In the resulting bundled graphs, visual clutter is reduced and high-level edge patterns are better visible. We also present a rendering technique that can be used to further emphasize the bundling.

In all of the visualization techniques described in the previous chapters, a clear de-piction of the direction of edges is important. In a node-link-based graph visualization, a directed link (edge) running from node A to B is generally visualized using an arrow: a line with a triangular arrowhead at node B. Although the arrow is intuitive, it does not guarantee that a user is able to determine edge direction as quickly and unambiguously as possible. To investigate this, we present five additional directed-edge representations using combinations of shape and color in Chapter 7. We performed a user study in which participants performed different tasks on a collection of graphs to investigate which rep-resentation is best in terms of speed and accuracy. We present our initial hypotheses, the outcome of the user studies, and recommendations regarding directed-edge visualization. Finally, Chapter 8 discusses the techniques developed in the previous chapters for the combined visualization of graphs and trees. Possible directions for future work are presented as well.

1.5 Related Publications

The major parts of the chapters outlined above are based on the published papers men-tioned in this section. To provide a clear context for each of the papers, the papers are divided into three categories: primary, secondary, and tertiary papers.

The primary papers are all first-authored by Danny Holten and each primary paper serves as the basis for a specific chapter. The secondary papers are all co-authored by Danny Holten and provide additional information, viewpoints, or formal evaluations re-garding the core, primary-paper material presented in a specific chapter. Finally, the ter-tiary papers are part of Danny Holten’s MSc work and are briefly discussed in Chapter 2 as background material on software visualization and the use of 2.5D techniques within information visualization.

1.5.1 Primary Papers

HOLTEN, D., Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hier-archical Data. In IEEE Transactions on Visualization and Computer Graphics (TVCG); Proceedings of the 12th IEEE Symposium on Information Visualization (INFOVIS’06), 12(5), pp. 741–748, 2006 (Best Paper Award, [100]).

Serves as core material for Chapter 3

HOLTEN, D., CORNELISSEN, B.,ANDWIJK,VAN, J. J., Trace Visualization using Hi-erarchical Edge Bundles and Massive Sequence Views. In Proceedings of the 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis (VIS-SOFT’07), pp. 47–54, 2007 ([101]).

(18)

1.5. RELATED PUBLICATIONS 9

HOLTEN, D., AND WIJK, VAN, J. J., Visual Comparison of Hierarchically Organized Data. Computer Graphics Forum; Proceedings of the 10th Eurographics/IEEE-VGTC Symposium on Visualization (EUROVIS’08), 27(3), pp. 759–766, 2008 ([104]).

HOLTEN, D.,AND WIJK, VAN, J. J., Force-Directed Edge Bundling for Graph Visual-ization. Computer Graphics Forum; Proceedings of the 11th Eurographics/IEEE-VGTC Symposium on Visualization (EUROVIS’09), to appear, 2009 ([103]).

HOLTEN, D., AND WIJK, VAN, J. J., A User Study on Visualizing Directed Edges in Graphs. In Proceedings of the 27th SIGCHI Conference on Human Factors in Computing Systems (CHI’09), pp. 2299–2308, 2009 (Best Paper Nominee, [105]).

1.5.2 Secondary Papers

CORNELISSEN, B., ZAIDMAN, A., HOLTEN, D., MOONEN, L., DEURSEN,VAN, A.,

ANDWIJK,VAN, J. J., Execution Trace Analysis through Massive Sequence and Circular Bundle Views. Journal of Systems and Software, 81(12), pp. 2252–2268, 2008 ([42]). Serves as additional material for Chapter 4

CORNELISSEN, B., HOLTEN, D., ZAIDMAN, A., MOONEN, L., WIJK, VAN, J. J., ANDDEURSEN,VAN, A., Understanding Execution Traces using Massive Sequence and Circular Bundle Views. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC’07), pp. 49–58, 2007 ([41]).

Serves as additional material for Chapter 4

ROUBTSOV, S., TELEA, A.,ANDHOLTEN, D., SQuAVisiT: A Software Quality Assess-ment and Visualisation Toolset. In Proceedings of the 7th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM’07), pp. 155–156, 2007 ([184]).

N ¨OLLENBURG, M., V ¨OLKER, M., WOLFF, A., AND HOLTEN, D., Drawing Binary Tanglegrams: An Experimental Evaluation. In Proceedings of the 11th SIAM/SIGACT Workshop on Algorithm Engineering and Experiments (ALENEX’09), pp. 106–119, 2009 ([160]).

1.5.3 Tertiary Papers

HOLTEN, D., WIJK,VAN, J. J.,AND MARTENS, J.-B., A Perceptually Based Spectral Model for Isotropic Textures. ACM Transactions on Applied Perception (TAP), 3(4), pp.

(19)

376–398, 2006 ([106]).

Presented as background material in Chapter 2

HOLTEN, D., VLIEGEN, R.,ANDWIJK,VAN, J. J., Visual Realism for the Visualization of Software Metrics. In Proceedings of the 3rd IEEE International Workshop on Visual-izing Software for Understanding and Analysis (VISSOFT’05), pp. 27–32, 2005 ([102]). Presented as background material in Chapter 2

(20)

Chapter 2

Background

Any fool can write code that a computer can understand. Good program-mers write code that humans can understand.

(Martin Fowler et al. [75], 1999)

2.1 Software Engineering

Software engineering is defined by the IEEE Software Engineering Body of Knowledge (SWEBOK) as

...the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software, and the study of these approaches; that is, the application of engineering to software[2].

The term first appeared in 1968 during the NATO Software Engineering Conference and was used with regard to the perceived “software crisis” at that time [156]. Nowadays, large software engineering projects are generally carried out within the context of a soft-ware engineering methodology. Such a methodology is used as a framework to structure, plan, and control the process of software engineering, i.e., the so-called software lifecycle. Since the 1970s, various software lifecycle models have been developed [18, 45, 185, 189, 226]. One of the best known software lifecycle models is the traditional waterfall model presented by Royce1 _{[185]. Alternatively, other software lifecycle models have}

been introduced since then. Examples are the prototyping model [45], the spiral model [18, 189], the Rapid Application Development (RAD) model [226], and agile software development methods [33] such as SCRUM [188] and Extreme Programming (XP) [13]. All of these models could be used to illustrate the importance of program understanding within the context of the full software lifecycle. However, the most important aspect with respect to the software lifecycle (and these models) is the presence of a maintenance phase, since program understanding comes into play most clearly during this phase. Each of the models mentioned above incorporates such a maintenance phase.

(21)

The maintenance phase is concerned with the modification of a software product af-ter delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment (ISO/IEC 14764 standard [108]). The maintenance phase is ended when the software product is discontinued. The software can still be used after discontinuation, but no more changes are made to the software and development has officially ended.

One might be tempted to conclude that software maintenance only accounts for a rel-atively small part of the engineering costs, since it is just one of the software development phases. Furthermore, bugs are normally corrected in the verification phase, so there are probably not many bugs left in the final implementation. However, as has been stated in the previous chapter, we know from fact that maintenance accounts for up to 90% of the software engineering costs [69] and up to 80% of the software development personnel [35]. Especially the program understanding process is known to be very time-consuming: approximately 50% of the time allocated for maintenance tasks is spent on acquiring sys-tem knowledge [38]. An explanation for this is as follows.

In general, many bugs still remain in a large software system, even after passing the verification phase. This is because it is generally impossible to anticipate all of the spe-cific real-life usage scenarios as a result of which certain bugs might surface. Therefore, correcting bugs is normally an ongoing process that extends beyond the verification phase into the maintenance phase.

Furthermore, software maintenance is concerned with far more than just correcting bugs; a detailed description of software maintenance as well as practical guidelines are provided by Pigoski [170]. Software maintenance can be split into four distinct categories. The first three categories presented below were identified by Swanson [204] and the last category was added as part of the ISO/IEC 14764 standard [108]:

• Corrective maintenance Corrects discovered problems;

• Adaptive maintenance

Keeps the software usable in a changed or changing environment. Most mainte-nance activities fall within this category;

• Perfective maintenance

Improves performance or maintainability;

• Preventive maintenance

Detects and corrects latent faults in the software before they become effective faults. Preventive maintenance is generally performed to facilitate further development and to accommodate for future changes.

Finally, software maintenance is harder than the other software development phases from a conceptual point of view because of its reverse engineering character. The initial phases of the software lifecycle are generally cases of forward software engineering; require-ments are translated into a design, which is subsequently implemented as source code. Software maintenance heavily relies on reverse software engineering instead, in which the inner workings of software are determined through analysis of the source code. Source

(22)

2.2. PROGRAM UNDERSTANDING 13

code is written in a programming language that is generally not well equipped to clearly communicate ideas or concepts. It is therefore not trivial to extract the (original) program design from the source code. In addition, source code is often obfuscated and intertwined due to performance-related optimizations, which further hinders program understanding. Issues regarding reverse engineering and program understanding are explained in more detail in the next section.

2.2 Program Understanding

Software reverse engineering and program understanding are often used interchangeably, although differences exist between both terms. Software reverse engineering is the pro-cess of identifying software components, their interrelationships, and representing these entities at a higher level of abstraction [157]. This process can be (partially) automated and does not always require a lot of mental effort during each step. Program understand-ing, on the other hand, is a cognitive process in which a software engineer tries to un-derstand the entities that were identified using reverse engineering techniques [38]. Thus, reverse engineering is necessary for program understanding.

As mentioned in Chapter 1, program understanding also implies studying the docu-mentation of a software system to gain a level of understanding that is sufficient for a given maintenance task. However, this information is often not available, outdated, or inappropriate for the task at hand [49]. As a result, the source code of a software system is often the only reliable asset to work with.

Furthermore, the amount of program understanding that is necessary differs greatly depending on the maintenance task at hand. Reverse engineering techniques can extract high-level program information such as software components and the interrelationships between them, but low-level information such as the source code itself might already be sufficient for a given task.

The debugging of a function that incorrectly sorts an array is an example of a de-bugging task that only requires low-level information: the only knowledge necessary to perform such a task is the low-level source code and an understanding of the sorting al-gorithm at hand.

On the other hand, refactoring (part of) a software system is a completely different type of task. Refactoring is a perfective maintenance task in which the internal structure and design of the source code is altered without changing its external behavior [75]. This is done to improve code understandability, to remove dead, i.e., unused, parts of the code, or to make source code easier to maintain. No bugs are fixed and no new features are added, although refactoring might precede such activities. The information necessary to perform code refactoring is more high-level than the source code itself. To determine how to reorganize the source code, a software engineer needs to have an overview of the software architecture of the program, i.e., a high-level view of the program that shows the components of the software, their hierarchical organization, and their interrelationships.

Code refactoring is a typical example of a task in which a software engineer does not immediately know what he is looking for, as mentioned in Chapter 1. Instead of having a software engineer analyze large amounts of data at the source-code level, a better

(23)

alterna-tive for high-level maintenance tasks would be to provide them with a way to interacalterna-tively explore the high-level software architecture.

This high-level information is generally not available directly and different types of analysis and data extraction techniques are required to extract the software architecture and possible additional information, such as software metrics, from the program source code and the compiled program itself during execution (run-time). These techniques are described in more detail in the next section.

2.3 Data Extraction

At a high level, techniques for obtaining information from an existing software system can be divided into static and dynamic program analysis techniques.

Static analysis is concerned with information that can be obtained from the source code directly. Examples are the organizational (hierarchical) structure of the code as well as additional software metrics. Static analysis is performed by parsing, i.e., analyzing, the code using fact extraction tools. Static analysis is not concerned with actually com-piling and executing source code to obtain information from a running program; this is performed by means of dynamic program analysis techniques.

Dynamic analysis is used to obtain additional information that is only available during run-time to augment the information that was extracted using static analysis. An example of information that can only be obtained during run-time is object polymorphism (late binding) [42].

2.3.1 Static Analysis

Static analysis is generally performed as a first step in the data extraction process. Static analysis as performed by the majority of fact extraction tools starts with lexical analysis, which breaks the source code text into tokens corresponding to keywords, identifiers, and symbols. Lexical analysis is followed by syntactic analysis, which involves parsing the token sequence into a parse tree to identify the syntactic structure of the program.

Fact Extraction

Many fact extractors, both commercial as well as non-commercial, are available for pop-ular programming languages such as C, C++, and Java. The best known fact extractors for these programming languages are the following.

RigiSystem [128, 200] is a complete reverse engineering system that enables users to extract, navigate, analyze and document the static structure of large software systems. The Rigi environment is free and offers parsers for C and COBOL. The SHriMP (Simple Hierarchical Multi-Perspective) views [199] that were later added to Rigi provide Java parsing capabilities as well.

PUMA [174] is a free library that allows complex transformations of C++ source code. Proprietary C++ fact extraction tools can be easily implemented using PUMA. C++ fact extraction is also provided by Columbus/CAN [37], a reverse engineering framework capable of analyzing large C/C++ projects. Columbus/CAN is free for scientific and

(24)

2.3. DATA EXTRACTION 15

educational purposes. CPPX [44] is another free, open source, general-purpose parser and fact extractor for C++. It relies on the preprocessing, parsing, and semantic analysis provided by the GNU C++ compiler. Among its aims are architecture recovery, source code visualization, restructuring, and refactoring.

A free fact extractor for Java is Javex [112], which extracts facts from Java class files. jCosmo [113] is a Java fact extractor that detects code smells in Java source code. These code smells can be used to review the quality of the analyzed code and indicate regions that could benefit from refactoring.

The output of the fact extractors mentioned above generally comprises plain text files. Although some extractors use a proprietary format to store the information within the text files, the majority of the output is formatted as XML [143], GXL (Graph Exchange Language) [231], or RSF (Rigi Standard Format) [232], a lightweight exchange format that uses sequences of tuples to encode graphs.

Software Metrics

The information extracted using the fact extractors mentioned above mainly pertains to structural information such as the hierarchical organization of the source code and its function call graph. Additional, more quantitative information can be obtained from the source code in the form of software metrics. As mentioned in Section 1.1, these metrics measure a specific source code property, such as the number of lines of code or the number of bugs per line of code. The following gives an overview of common metric calculation tools for C, C++, and Java.

CCCC (C and C++ Code Counter) [27] is a tool which analyzes C, C++, and Java files and generates a report on various metrics of the code. Metrics supported include number of lines of code, McCabe’s cyclomatic complexity [149], which is used to measure the complexity of a program, and additional metrics proposed by Chidamber et al. [31] and Henry et al. [95].

Number of lines of code as Non-Commenting Source Statements (NCSS) and Mc-Cabe’s cyclomatic complexity are also measured by JavaNCSS [111], a command-line-based source measurement suite for Java. Two other Java-specific metric calculation tools are JDepend [114] and JMetric [118]. JDepend traverses Java class files and generates design-quality metrics for each Java package. It allows users to measure the quality of a design in terms of its extensibility, reusability, and maintainability. JMetric is a metric calculation tool for Java that can measure number of lines of code, number of statements, Lack of Cohesion in Methods (LCOM), and McCabe’s cyclomatic complexity.

2.3.2 Dynamic Analysis

Although the organizational structure of the code, its function call graph, and source code metrics can be provided by static analysis, additional dynamic analysis is needed to extract information that is only available during execution of the program. Examples of information that can only be obtained during run-time are the aforementioned object polymorphism (late binding) in case of object-oriented code [42], the actual path through

(25)

conditional parts of code that was executed during a certain run-time scenario, and the actual values that were passed to and/or returned by functions.

Dynamic information can be collected by instrumenting the source code in such a way that all of the statements (function calls) and the relevant variable values are stored in chronological order during execution of the program, thus obtaining an execution trace. Although there are various ways in which existing source code can be instrumented, a fairly non-obtrusive way is provided by Aspect-Oriented Programming (AOP) [127]. AOP is a paradigm that increases modularity by allowing the separation of cross-cutting concerns; obtaining an execution trace from a running program is an example of such a concern. Well-known examples of aspect-oriented extensions for C, C++, and Java are AspectC [4, 32], AspectC++ [3], and AspectJ [5], respectively.

When performing dynamic analysis of Java programs, alternative ways of obtaining the execution trace without instrumenting and recompiling the source code are available as well. A collection of Java APIs (Application Programming Interfaces) with debugging-related functionality is provided in the form of Java’s JPDA (Java Platform Debugger Architecture [119]). Through JPDA it is possible to directly access the JVM (Java Virtual Machine) during execution of a Java program to extract an execution trace.

2.4 Visualization

Performing static and dynamic analysis on a piece of software can easily result in hun-dreds of megabytes of data, even for execution traces that represent only a few seconds worth of actual program execution. Although data abstraction techniques can be used to filter out unnecessary information, such as function calls to generic system libraries that are of little use for a specific program understanding task, interactive visualization tech-niques can be used in combination with these abstraction techtech-niques to get more insight in the massive amounts of data [237].

As mentioned in Chapter 1, the research area of visualization – pertaining to any tech-nique for creating images, diagrams, or animations to communicate a message – is split into the more specific research areas of scientific and information visualization. Scientific visualization concerns itself with visualizing entities that have a physical representation, whereas information visualization concerns itself with visualizing abstract data that does not have a physical representation. The visualization of software-related data obtained by performing static and dynamic analysis on a piece of software is an example of such abstract data that is well suited for information visualization.

Before describing the more specific area of software visualization in Section 2.4.2, the following section first gives a brief overview of information visualization in general.

2.4.1 Information Visualization

Although various definitions of information visualization exist [6, 30, 124], a clear and concise definition already presented in Chapter 1 is “the use of computer-supported, in-teractive, visual representations of abstract data to amplify cognition” [26].

(26)

2.4. VISUALIZATION 17

One of the main advantages of visualization is its ability to rapidly communicate an idea or story using an image, which is informally captured by the well-known saying that a picture is worth a thousand words.1 The following list gives a more detailed description of the advantages of information visualization. These advantages, combined with com-putational data analysis, can be applied to analytic reasoning to support the sense-making process [211].

• Information visualization utilizes the high bandwidth of the Human Visual System (HVS) to rapidly comprehend huge amounts of data [221];

• Information visualization utilizes external cognition [187] to expand human work-ing memory. External cognition refers to ways in which people augment their nor-mal cognitive processes with external aids, one of them being visualizations [26];

• Information visualization harnesses the ability of the HVS to process visual infor-mation preattentively. Preattentive processing refers to the ability of the HVS to rapidly identify certain visual properties prior to conscious attention; it determines what visual objects are offered up to our attention. As stated in Chapter 1, an exam-ple is the fast and effortless identification of a single red circle in the presence of a large group of black circles. Additional examples of high-level visual features that are preattentively processed by the HVS are shape, color, motion, spatial position, and texture [221];

• Information visualization allows for the perception of emergent properties that were not anticipated [221]. Within the related research area of visual analytics, which is the science of analytical reasoning facilitated by interactive visual interfaces, this is often restated as “detect the expected and discover the unexpected.” [211];

• Information visualization takes advantage of the excellent pattern recognition abil-ities of the HVS. These abilabil-ities are described by Gestalt laws [131] of pattern perception, which are robust rules that describe how we perceive visual patterns [221];

• Information visualization facilitates understanding of large- as well as small-scale data features [221]. The ability to view data using various level-of-detail represen-tations, each offering a different level of abstraction, is also captured by Shneider-man’s visual information seeking mantra [195]: “Overview first, zoom and filter, then details on demand.”;

• Information visualization enables problems with the data itself to become appar-ent, e.g., by revealing information about the way in which the data is collected. Furthermore, data errors and data artifacts often jump out in a visualization [221];

1_{The phrase is attributed to an article by Fred R. Barnard in the advertising trade journal Printers’ Ink to}

promote the use of images in advertisements on the sides of cars; the December 8, 1921 issue carries an ad titled “One Look is Worth A Thousand Words.” Another ad by Barnard appears in the March 10, 1927 issue with the phrase “One Picture is Worth Ten Thousand Words,” where it is labeled a Chinese proverb by Barnard so that people would take it seriously [96].

(27)

• Information visualization facilitates the formation of new hypotheses [221]. The availability of fast and relatively cheap consumer-level hardware, especially the ad-vent of affordable hardware accelerated graphics cards in the mid-1990s, has put infor-mation visualization within reach of a large group of potential users. Even mid-range consumer-level hardware is capable of visualizing hundreds of thousands of items at megapixel resolutions using millions of colors while still attaining interactive frame rates, i.e., 25 FPS (frames per second) or more. In addition, ongoing algorithmic advances in computer graphics and the availability of current high-end hardware such as fast multi-core CPUs, multi-gigabyte internal memory, and fully programmable and massively par-allel GPUs (Graphical Processing Units) have made the interactive visualization of ex-tremely large data sets even more practical.

Furthermore, special APIs such as OpenGL [162] and DirectX [54] have enabled easy access to hardware accelerated graphics provided by modern GPUs, while high-level (information) visualization frameworks such as VTK [130], Prefuse [172], and Flare [74] have eased the development of (information) visualization applications.

Figure 2.2 depicts the information visualization pipeline with respect to software-related data: the software visualization pipeline. It is based on the description of data extraction presented in Section 2.3 and the information visualization pipeline presented in [dos Santos et al., 2004] [186].

An Illustrative Example: Treemaps

An example of an effective information visualization technique are treemaps [194]. A treemap is a space-filling, 2D visualization that maps a tree structure into rectangles with each rectangle representing a node. Treemaps are very effective in showing node at-tributes using size (area) and color and they enable users to compare nodes and subtrees at varying depths in the tree. Treemaps furthermore facilitate the discovery of patterns and outliers [194].

Internal treemap rectangles correspond to child nodes and are visualized by recur-sively subdividing the containing rectangle. The direction of subdivision alternates per level between horizontal and vertical (slice-and-dice algorithm). A tree structure and its corresponding treemap are shown in Figure 2.1.

Figure 2.1: Construction of a treemap from the tree structure shown on the left using the slice-and-dice algorithm that alternates between horizontal and vertical subdivision.

(28)

Figure 2.2: The software visualization pipeline. Static and dynamic analysis techniques are used to obtain data from the source code and the running program, respectively. The data is then processed, filtered, and mapped to on-screen geometry based on user prefer-ences. The resulting visualization provides the user with information that can be used to adjust the visualization to gain additional or more specific insight.

(29)

Figure 2.3: Using SequoiaView [190] to visualize a hard drive. The hierarchical structure of a 150GB hard drive (50GB used) is shown. Shading is used to highlight the hierarchical structure, node area is used to indicate file size, and color is used to indicate file type.

Images,compressed archives,videos,text-based files,executables,dynamic link libraries, and theWindows page filemake up the most notable file types (patterns) in this example.

Squarified cushion treemaps [23, 227] provide an extension to slice-and-dice treemaps that eliminates the thin, elongated rectangles that often arise as a result of the slice-and-dice algorithm; the rectangles generated by the squarification algorithm approximate squares, which are easier to compare and select. Furthermore, nested rectangles are rep-resented using intuitively shaded, recursive cushions to provide a better insight in the structure and depth of the hierarchy.

Freely available disk browsing tools that employ such squarified cushion treemaps for the interactive visualization and exploration of file systems, for instance, complete hard drives, are widely used on Windows- and Linux-based PCs. The best known ex-amples are SequoiaView [190], KDirStat [123], and WinDirStat [230]. Figure 2.3 shows SequoiaView being used to visualize a hard drive and highlights the effectiveness of us-ing shadus-ing, node area, and color to show hierarchical structure, file size, and file type, respectively.

2.4.2 Software Visualization

Software visualization as a research field is concerned with the visualization of the struc-ture (and additional metrics), behavior, and evolution of software [52]; Figure 2.4 depicts its position within the more general hierarchy of computer science (sub)fields. Software

(30)

Figure 2.4: Software visualization as part of a hierarchy of computer science (sub)fields.

architecture visualization and structural software comparison are only two specific exam-ples of software visualization.

The remainder of this section provides a decomposition of software visualization and presents additional subfields besides software architecture visualization and struc-tural software comparison. As such, this section provides a general overview of software visualization and identifies the general problems that affect many of the current software visualization techniques. A more detailed comparison between the techniques proposed in this thesis and related work in the areas of software architecture visualization and struc-tural software comparison is provided in the relevant chapters in which each technique is presented.

Visualization of Software Structure and Metrics

Software structure and metrics visualization – of which software architecture visualization is a part – are treated simultaneously in the following because software visualization tools generally display software metrics and/or other structural information as additional visual attributes on top of the structural elements.

One of the earliest examples of this is SeeSoft [62], a tool for visualizing line-oriented software statistics. SeeSoft uses a fairly straightforward structural representation: the syn-tactic structure (textual outline) of the source code is directly mapped to a miniaturized, pixel-based representation. It can visualize up to 50.000 lines of code by mapping each line to a thin pixel row. Color is used to indicate a statistic of interest (such as a software metric) for each row. Figure 2.5 shows an example of this.

CSV (Code Structure Visualization) is a part of the VCN (Visual Code Navigator) tool suite [142] that expands upon the idea of line-based software visualization proposed by SeeSoft. Figure 2.6 shows how CSV maps the syntactic structure of the source code to a cushion-shaded representation. Users can interactively scale this representation to provide a continuous trade-off between a SeeSoft-like pixel-based representation and a fully readable textual representation.

(31)

Figure 2.5: SeeSoft [62] maps each line of code to a pixel row. Color is used to indi-cate a statistic of interest, such as a software metric.

Figure 2.6: CSV [142] provides a zoomable visualization of the syntactic structure of the source code. Depending on the zoom level, the text can be made readable using an in-teractive spotlight cursor.

Although the fairly straightforward representation used by SeeSoft and CSV might result in a source code representation that is fairly recognizable to programmers, such a low-level representation is not well suited for showing the hierarchical decomposition of a software system. A clear depiction of this decomposition is important when one wants to get an overview of the high-level structural abstraction of the software system, i.e., the software architecture.

A well-known technique for visualizing the hierarchical decomposition of software sys-tems is provided by SHriMP views [199] in the context of the Rigi reverse engineering environment [128, 200]. SHriMP views employ a nested graph formalism and a fisheye view algorithm for manipulating large graphs. For Rigi purposes, the nesting of nodes conveys the parent-child relationships in a software system hierarchy. Figure 2.7 illus-trates the use of SHriMP views to visualize a Bingo program written in Java.

The layout and nesting employed by the SHriMP system lead to a fairly inefficient use of screen space. Furthermore, although only a moderate amount of function calls are shown, problems related to visual clutter resulting from edge congestion already become apparent. Although more space-efficient hierarchy visualizations such as treemaps [194] might be used to address the former problem, providing a clutter-free visualization of hundreds or even thousands of function calls on top of a tree structure still remains a problem. This is also illustrated by Figure 1.1 in the previous chapter.

The inefficient use of screen space which subsequently leads to poor scalability is especially apparent in the case of UML (Unified Modeling Language) diagrams [19]. Al-though UML class diagrams and message sequence charts (MSCs) are frequently used to document software architectures, on-screen UML diagrams are only capable of depict-ing small software systems and/or parts of systems because of their visual representation. This is illustrated in Figure 2.8, which shows the use of MetricView [208] to visualize software metrics on top of a UML class diagram.

(32)

Figure 2.7: SHriMP views [199] are used to visualize the hierarchical structure and the calling behavior of a Bingo program written in Java.

Figure 2.8: MetricView [208] showing Cou-pling and Dynamicity-of-Classes metrics on top of a UML class diagram using red and blue, respectively.

Figure 2.9: The matrix-based Multilevel Call Matrix [90] on the left shows the same graph as the node-link-based representation on the right, which was generated using the dot layout module of AT&T’s Graphviz package [66, 86]. Green and red denote allowed and illegal function calls, respectively.

Visual clutter resulting from edge congestion can be addressed by using a matrix- instead of a node-link-based graph representation [90, 242]. Figure 2.9 shows a visualization of a Multilevel Call Matrix [90]. The software system hierarchy is displayed along the axes of the matrix and function calls are shown within the matrix as shaded cells. Green and red cells denote allowed and illegal function calls, respectively.

(33)

clut-Figure 2.10: CodeCity [225] showing class-level metrics on top of the package structure of jEdit 4.3pre15. God, brain, god + brain, and data classes are indicated using, blue, yellow, red, and green, respectively.

Figure 2.11: Visual clutter as a result of 3D occlusion and edge congestion in VisMOOS [77], an Eclipse plug-in for the structural vi-sualization of Java-based software systems.

ter as a result of edge congestion, whereas the matrix visualization shows a stable and clean layout. However, users are generally less familiar with the more abstract matrix representation and often find it harder to understand [1, 83, 125]. A solution to this prob-lem might be the use of a hybrid visualization such as NodeTrix [94]. NodeTrix uses node-link diagrams to show the global structure and matrices to show dense subgraphs. Many software visualization techniques use 3D representations in an attempt to make more efficient use of the available screen space [77, 87, 145, 165, 225]. An overview of 3D software visualization techniques is presented in [Teyseyre et al., 2009] [210].

Although 3D visualizations are generally preferable to 2D ones if the data pertains to physical, three-dimensional entities, it has been shown by Irani et al. that 3D shading can also be beneficial in case of graph and diagram visualization [109]. The use of 3D instead of 2D representations also provides a greater information density [182].

However, user interaction for changing the viewpoint is generally necessary to over-come occlusion problems. 3D spaces are often difficult to understand and navigate for users and having to change the viewpoint can aggravate this even further [21, 34, 98]. Fur-thermore, perspective projection can lead to difficulties when comparing element sizes. Finally, Plaisant et al. [171] report that 3D representations improve the screen-space prob-lem only marginally.

Figure 2.10 illustrates the use of CodeCity [225] to show metrics on top of a 3D visualization of a Java package hierarchy. Occlusion problems are apparent and screen-space usage is not readily improved either by using a 3D representation. A more extreme example of occlusion as a result of using a 3D visualization is presented in Figure 2.11, which shows a screenshot of VisMOOS by Fronk et al. [77], an Eclipse plug-in for the structural visualization of Java-based software systems.

Instead of resorting to either 2D or 3D, Ware [220] advocates the use of a so-called 2.5D approach when designing visualizations. In short, this means that the layout of the visualization as well as user interaction are restricted to a 2D space, thus preventing

(34)

Figure 2.12: A multivariate visualization of software metrics using texture and color [102]. (a) Methods with high fan-in and low fan-out (permissible situation). (b) Meth-ods with low fan-in and high-fan out (low-priority situation). (c) Two methMeth-ods (top left, bottom right) with fairly high fan-in and medium fan-out, indicating a potential problem area.

occlusion- and perspective-related problems. 3D cues such as shading, texture and cast shadows are used to give visual elements in this 2D space a 3D appearance, which makes it easier to remember and analyze graphs and diagrams [109].

Although this thesis focuses on structural software visualization, the visualization of ad-ditional relations between structural elements, and the structural comparison of a pair of hierarchically organized software systems, we have also worked on visualizing software metrics. In [Holten et al., 2005] [102], we note that most software visualizations only use a limited set of graphical elements, e.g., text, simple shapes and uniform colors, to depict information. However, the HVS enables rapid processing of additional visual cues like shading and texture, which are generally not used.

Figure 2.12 shows a visualization of the structure and metrics of a hierarchically orga-nized Java program [102] that uses squarified treemaps [23, 194], cushions [227], color, and bump-mapped textures [17]. Bump mapping results in images with realistic looking surface wrinkles without the need to model each wrinkle as a separate surface element.

A squarified cushion treemap is used to depict the high-level package-class-method hierarchy of the Java program as efficiently as possible with respect to screen space. Fur-thermore, two metrics, fan-in and fan-out, are shown simultaneously at the method level using color and texture, respectively. The fan-in of a method M refers to the number of methods that call M, whereas the fan-out of M refers to the number of methods that are called by M. It is generally desirable that methods with high fan-in have low fan-out to keep such critical methods from depending on (too many) other methods.

(35)

combi-nation of two colors, since bivariate chromatic maps are very difficult to read [218]. Our previous work on a perceptually based model for isotropic textures is used to generate textures with varying levels of spatial frequency, regularity, and physical contrast [106].

Finally, the cushion shading and bump-mapped textures that are used in conjunction with the treemap representation emphasize the hierarchical structure and make the tex-tures fuse better with the shaded surfaces of the cushions. Cushion shading and bump mapping are examples of using a 2.5D approach as proposed by Ware [220].

Visualization of Software Behavior

Apart from visualizing the structural organization of software systems, the interrelation-ships among the structural elements, and quantitative properties such as metrics, software visualization is also used to depict the behavior of programs.

An example of software behavior visualization is algorithm animation, which is the process of abstracting data, operations, and semantics of computer programs, and then creating animated graphical views of those abstractions [153, 198]. Algorithm animation, for example, animation of sorting algorithms, is frequently used for educational purposes within computer science [107, 139]. An overview of algorithm animation techniques is presented by Schaffer et al. [192].

The visualization of State Transition Diagrams (STGs) [88, 173] is another example of behavioral software visualization. STGs are used to describe the behavior of systems whose states evolve over time. A node in an STG represents a state that a system can be in, while an edge represents a transition from one state to another. The construction of an STG for a given software system is not trivial; STGs cannot be extracted directly from the source code or the execution thereof by means of standard static and dynamic pro-gram analysis techniques. Instead, such STGs may be generated from process-algebraic descriptions of system behavior written in languages such as mCRL2 [89].

Furthermore, STGs are software system abstractions that do not always have a clear, one-to-one correspondence to the underlying source code, since they focus on the state that a system can be in. In general, it is not possible to uniquely map a state to a specific part of the source code. In a similar vein, algorithm animation generally focuses on data transformations in a step-by-step fashion, e.g., displaying the transition from the current to the next state of an array that is being sorted. As such, the state of an array (or the state of a system’s memory in general) does not uniquely correspond to a specific part of the source code either.

Execution trace visualization, on the other hand, is a final example of software behav-ior visualization that does not suffer from this problem. A direct correspondence between elements of the execution trace and specific parts of the source code can easily be estab-lished. This is because an execution trace contains chronologically ordered function calls that correspond to a calling and called function in the source code. Execution trace visu-alization can offer support for software engineers that are concerned with feature location tasks, for example.

Most execution trace visualizations are based on two representations. One is a node-link graph in which edges between nodes represent function calls [56, 87, 136, 166, 167, 168]. The other is a 2D, space-reduced representation in which calls between elements are

Visualization of graphs and trees for software analysis

Visualization of Graphs and Trees

for Software Analysis

PROEFSCHRIFT

Contents

Chapter 1

Introduction

1.1

Motivation

1.2

Objective

1.3

Software Visualization Requirements

1.4

Outline

1.5

Related Publications

1.5.1

Primary Papers

1.5.2

Secondary Papers

1.5.3

Tertiary Papers

Chapter 2

Background

2.1

Software Engineering

2.2

Program Understanding

2.3

Data Extraction

2.3.1

Static Analysis

2.3.2

Dynamic Analysis

2.4

Visualization

2.4.1

Information Visualization

2.4.2

Software Visualization