Bringing instruments to a service-oriented interactive grid

(1)

Tilburg University

Bringing instruments to a service-oriented interactive grid

Lelli, Francesco

Publication date: 2007

Document Version

Publisher's PDF, also known as Version of record

Link to publication in Tilburg University Research Portal

Citation for published version (APA):

Lelli, F. (2007). Bringing instruments to a service-oriented interactive grid.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

(2)

Universit`a Ca’ Foscari di Venezia

Dipartimento di Informatica

Dottorato di Ricerca in Informatica

Ph.D. Thesis: TD-2007-2

Bringing Instruments to a Service-Oriented

Interactive Grid

Francesco Lelli

Supervisor Prof. Salvatore Orlando

Supervisor Dr. Gaetano Maron

PhD Coordinator Prof. Simonetta Balsamo

(3)

Author’s e-mail: lelli@dsi.unive.it or francesco.lelli@lnl.infn.it or francesco.lelli@cern.ch

Author’s address:

Dipartimento di Informatica Universit`a Ca’ Foscari di Venezia Via Torino, 155

30172 Venezia Mestre – Italia tel. +39 041 2348411

fax. +39 041 2348419

(4)

To the people that was believing in me. 1

(5)

(6)

Abstract

Current Grid technologies offer unlimited computational power and storage capacity for scientific research and business activities in heterogeneous areas over the world. Thanks to the Grid, different Virtual Organizations can operate together in or-der to achieve common goals. However, concrete use cases demand a more close interaction between various types of instruments accessible from the Grid, and the classical Grid infrastructure, typically composed of Computing and Storage Ele-ments. We cope with this open problem by proposing and realizing the first release of the Instrument Element (IE), i.e., a new Grid component that provides the Com-putational/Data Grid with an abstraction of real instruments, and the Grid users with a more interactive interface to control these instruments.

In this thesis we describe in detail the proposed software architecture of the IE, also by discussing the functional and non-functional requirements on which its design is based. The non-functional requirements demands not only deep interac-tion between users and devices to control instruments, but also the adopinterac-tion of high interoperable solutions, which only Service Oriented Architecture (SOA) based tech-nologies, like Web/Grid Services, can offer. Therefore, in order to solve the trade-off between the necessity of universality/interoperability and performance, we propose a set of solution that improves the performances of a SOA System, in terms of both throughput and latency of service invocations. Moreover, in order to fulfill the Quality of Service (QoS) nonfunctional requirement, we also devise a methodology that allows remote method execution times to be predicted. This makes it possible, for example, to select an alternative service that better guarantees the fulfilment of execution deadlines.

Still according to our analysis of non-functional requirements, we also address the problem of determining fast communication methods to permit instruments to properly inter-operate. In this regard, we investigate different publish/subscribe architectures, and we show that RMM-JMS is the best candidate solution for ac-complishing this task.

It is worth noting that even though all these solutions and results have been motivated, devised and exploited during the design of our IE, any application based on Web and Grid Services can benefit from them.

(7)

(8)

Acknowledgments

Desidero ringraziare:

Il Dr. Gaetano Maron, per avermi insegnato che semplici risultati pratici sono pi´u utili e difficili da raggiungere di brillanti teorie.

Il Prof. Salvatore Orlando, per avermi insegnato come solo con brillanti teorie si possano raggiungere semplici risultati pratici.

Tutte le persone che hanno creduto in me, in loro ho trovato il conforto per finire questo lavoro.

Tutte le persone che non hanno creduto in me, in loro ho trovato la forza per finire questo lavoro.

Infine la mia famiglia, mia madre e mio padre in particolare: senza di loro non sarei mai arrivato a scrivere queste note di ringraziamento.

2

2 _{I would like to thank:}

Dr. Gaetano Maron for having taught me that simple practical results are more useful and difficult to achieve than brilliant theories.

Prof. Salvatore Orlando for having taught me how only through brilliant theories one can achieve simple practical results.

All the people who have trusted me, in them I found the support to draw this work to a close. All the people who haven’t trusted me, in them I found the strength to draw this work to a close. Finally my family, especially my mother and father: without them I would never have made it

(9)

(10)

List of Figures

1.1 Interaction between the Instrument Element and others Grid

Com-ponent . . . 2

1.2 Instrument Element Use Cases . . . 4

2.1 abstraction model of a generic instrument . . . 9

2.2 The Instrument Element Abstraction . . . 11

2.3 The Instrument Element Architecture . . . 12

2.4 VCR-IE Interactions . . . 15

2.5 Classification of the Information Contained in the Resource Service . 16 2.6 Publish Subscribe Architecture . . . 17

2.7 IMS Architecture . . . 18

2.8 Instrument Discovery Interaction Diagram . . . 19

2.9 Embedded Devices . . . 20

3.1 3-tier Architecture and Sequence diagram . . . 26

3.2 Service request time for servlet and WS . . . 29

3.3 Handled request per second using a storage tier and varying the query result size . . . 31

3.6 Handled request per sec varying the input size . . . 33

3.9 XML parsed document with pointer to the nodes . . . 36

4.1 Critical Intervals in a Web Service invocation . . . 44

4.2 Sample distribution S-CPU 0% C-CPU 0% . . . 51

4.6 Sample distribution of the Server Deserialization Time (empty method) 53 4.7 Sample distribution of the Server Deserialization Time (Input 100 Double) . . . 54

(15)

4.9 Sample distribution of the Server Deserialization Time (Input 1000

Double) . . . 55

4.10 Sample distribution of the Server Deserialization Time (Input 10000 Double ) . . . 55

4.11 Sample distribution of a one way invocation of an empty method . . 56

4.12 Sample distribution of a one way invocation (Input 100 String) . . . 56

4.13 Sample distribution of a one way invocation (Input 100 String) whit server CPU occupancy of 80% . . . 57

4.14 Sample distribution of a one way invocation (Input 1000 String) . . . 57

4.15 Sample distribution of a one way invocation (Input 10000 String) . . 58

4.16 Sample distribution of a one way invocation (Input and output 10000 String) . . . 58

4.17 coefficients for the linear regression estimation of different g(x) . . . . 72

4.18 % of deadlines misses varying the number of clients (95 percentile) . . 73

4.19 % of deadlines misses varying the number of clients (975 percentile) . 74 4.20 One Way Estimation CPU server 0% CPU Client 80% . . . 75

4.21 End To End Estimation CPU server 0% CPU Client 80% . . . 75

4.22 One Way Estimation CPU server 80% CPU Client 80% . . . 76

4.25 One Way Estimation with 2 clients connected . . . 78

4.26 End To End Estimation with 2 clients connected . . . 78

4.27 End To End Estimation with 2 clients connected . . . 79

4.28 One Way Estimation with 3 clients connected . . . 79

4.29 End to End Estimation with 3 clients connected . . . 80

4.30 End to End Estimation with 3 clients connected . . . 80

4.31 One Way Estimation with 4 clients connected (Server Overloaded) . . 81

4.32 End to End Estimation with 4 clients connected (Server Overloaded) 82 4.33 End to End Estimation with 4 clients connected (Server Overloaded) 82 4.34 One Way Estimation with 6 clients connected (Server Overloaded) . . 83

4.35 End to End Estimation with 6 clients connected (Server Overloaded) 83 4.36 End to End Estimation with 6 clients connected (Server Overloaded) 84 4.37 One Way Estimation with 10 clients connected (Server Overloaded) . 84 4.38 End to End Estimation with 10 clients connected (Server Overloaded) 85 4.39 End to End Estimation with 10 clients connected (Server Overloaded) 85 4.40 Full client Side logic Architecture description . . . 88

4.41 Server and clients factors break down Architecture description . . . . 89

4.42 Third party QoS Enabler Architecture description . . . 91

5.1 typical environment for grid of instruments. . . 94

5.2 RMM-JMS broker/bridge in a LAN-WAN-LAN setup . . . 98

(16)

List of Figures vii

5.5 messages rate for varying number of publishers. Msg size 1000 Bytes 102 5.6 messages rate for varying number of publishers. Msg size 10000 Bytes 103 5.7 messages rate for varying number of Subscribers. Msg size 100 Byte . 104 5.8 messages rate for varying number of Subscribers. Msg size 1000 Bytes 104 5.9 messages rate for varying number of Subscribers. Msg size 10000 Bytes105

5.10 RTT test description . . . 105

5.11 Round Trip Time tests, experimental result. . . 106

5.12 percentage of RMM RTT anomalies in a sample of 100000 measures . 107 5.13 percentage of Sun MQ 3.6 RTT anomalies in a sample of 100000 measures . . . 107

5.14 percentage of Mantaray RTT anomalies in a sample of 100000 measures108 B.1 Instrument Element invocation benchmark test description . . . 116

B.2 Test 1 Results: VCR invocation capability . . . 116

B.3 Test 2 Results: IE messages handling capability . . . 117

B.4 Instrument Manager hierarchy performance tests: 1 IM . . . 117

B.5 Instrument Manager hierarchy performance tests: 3 IM . . . 118

B.6 Instrument Manager hierarchy performance tests: 3 IM in 3 machines 118 B.7 Instrument Element command distribution benchmark test results . . 119

C.1 Expert-based System . . . 123

C.2 ComparisonTraditionalFuzzySets . . . 126

C.3 Example of Neural Network . . . 131

C.4 Basic structure of neuron . . . 132

C.5 Interaction between agent and environment . . . 134

C.6 Agent action flowchart . . . 139

D.1 Testbed software architecture . . . 144

(17)

(18)

List of Tables

1.1 Non Functional Requirements . . . 5

3.1 Parser Comparison . . . 38

3.2 Parser Comparison . . . 39

3.3 Document Preparation Additional Cost . . . 39

4.1 factors that influence a remote method call . . . 46

4.2 Functions break down description descriptions . . . 62

4.3 Full factorial analysis first part . . . 63

4.4 Full factorial analysis second part . . . 64

4.5 Descriptions of the linear regression coefficients . . . 65

4.6 Contrast and Sum of Square analysis for the linear regression of the total execution time (first part) . . . 67

4.7 Contrast and Sum of Square analysis for the linear regression of the total execution time (second part) . . . 68

4.8 total execution time linear regression coefficients . . . 68

4.9 Residual analysis for the linear regression of the total execution time 69 4.10 Linear regression coefficient with the key factors analysis . . . 70

4.11 Linear regression coefficient with the standard linear regression model 70 4.12 coefficients for the linear regression estimation of different f(x) . . . . 71

(19)

(20)

Preface

This PhD Thesis has been realized and developed in close interaction with the following research and deployment projects:

• Grid Enalbed Remote Instrumentation with Distributed Control and Compu-tation (GridCC - EU FP6 contract 511382 )

• Compact Muon Solenoid Trigger and Data Acquisition System (CMS TriDAS) • Advanced Gamma Tracking Array (AGaTA)

(21)

(22)

1

Introduction

Grid computing refers to the coordinated and secured sharing of computing resources among dynamic collections of individuals, institutions and resources [13], [10], [32]. It involves the distribution of computing resources among geographically separated sites (creating a ”grid” of resources), all of which are configured with specialized software for routing jobs, authenticating users, monitoring resources, and so on.

The operative core of the standard computational Grid is mainly composed of two important elements: the Computing Element (CE) and the Storage Element (SE). The first one provides the final user with an abstraction of the backend of the computational system. In other words it is where the execution of an application is performed. This is a very general element that allows different computational nodes, like a single processor or a complex computing cluster, to be seen as a homogeneous set of interfaces. The second one, the Storage Element, provides storage facility for the input and output of the application that are executed on a CE. The storage can be as simple as a standard file-system, or a set of databases organized in a more complex structure. While remote control and data collection was part of the initial Grid concept most recent Grid developments have been concentrated on the sharing of distributed computational and storage resources.

In this scenario applications that need computational power only have to use these Grid elements in order to access an unlimited amount of computational power and disk storage. Unfortunately, as explained in Section 1.1, concrete use cases require a strong interaction between the instrumentation and the computational Grid, in addition they need to be accessed directly from a remote site in the world. For instance, in the Compact Muon Solenoid (CMS) Data Acquisition (DAQ) [10] system , where the data taking phase of an experiment occurs, physicists need a single entry point to operate the experiment and to monitor detector status and data quality. In Electrical Utility Networks (or power grids [32]), the introduction of very large numbers of ’embedded’ power generators often using renewable energy sources, creates a severe challenge for utility companies. In Geo-Hazards Systems a set of heterogeneous geographically distributed sensor need to be remotely accessed and monitored, while the combined instruments outputs should be automatically analyzed using the computational Grid.

(23)

Com-Figure 1.1: Interaction between the Instrument Element and others Grid Component

putation (GridCC) project [77], [83], [48], launched in September 2004 by the Eu-ropean Union, addresses these issues. The goal of GridCC is to exploit Grid oppor-tunities for secure and collaborative work of distributed teams to remotely operate and monitor scientific equipment using the Grid’s massive memory and comput-ing resources for storcomput-ing and processcomput-ing data generated by this kind of equipment. This thesis has been developed in close interaction with the GridCC project and, as we will point out in Section 1.2, a significant part of this work contributes to the outcome of the entire project.

Our idea is to implement a software component that can satisfy all the above mentioned requirements as a new Grid component: the Instrument Element (IE). It consists of a coherent collection of services that provide all the functionality to configure and control the physical instrument behind the IE and the interactions with the rest of the Grid. Figure 1.1 gives an idea of the relationship between the IE and its users and between the IE and others Grid components.

In order to achieve a fast and high level interaction with the instrument element component, users can directly access the controlled instrument using a Virtual Con-trol Room (VCR) [60]. Or, as a second possibility, an instrument operation can be part of a complex workflow managed by a Grid execution service that allows the IE to access and converse with the computational Grid. These ways to control the instruments are not mutually exclusive and can be performed in parallel where needed.

(24)

1.1. Instrument Element Functional and Nonfunctional Requirements 3

interactive co-operation between computing Grid resources and applications that have real-time requirements or need fast interaction with CEs and SEs. Finally the IE can be also linked to existing instrumentation in order to provide Grid interaction and remote control to standalone resources.

Next section (1.1) details the presented intuitive view.

1.1 Instrument Element Functional and

Nonfunc-tional Requirements

The term ’instrument’ describes a very heterogeneous category of devices. A set of use cases [13], [10], [32], [63], [130], [67], [51], [132], [123], [77], [73],[79], [128] have been deeply analyzed in order to collect the functional and non functional requirements of this new Grid component. From an intuitive point of view the IE should:

• Provide a uniform access to the physical device • Allow a standard Grid access to the instruments

• Allow the cooperation between different instruments that belong to different VOs

We need to point out that IE users do not need to be a human. Other software components must be able to control the Instruments.

Figure 1.2 shows a use case diagram of the system. A user of the Instrument Element can have one of three roles:

• an observer: i.e. someone that has the rights to monitor the operation of the Instrument.

• an operator that can instantiate an Instrument configuration, control and monitor the Instrument.

• an administrator: i.e. someone that can create Instrument configuration that can be accessed by the observer user and/or the operator user.

Monitoring operation is intended as the possibility to retrieve all information that can be used to determine the operational status and to track the operation of an Instrument System.

Control Operation is intended as the possibility to act on one or more Instru-ments and moving acquired data to and from the computational Grid.

(25)

Figure 1.2: Instrument Element Use Cases

Instantiation Operation (that it is considered as a Control extension) is in-tended as the possibility to create Instruments, if the physical Instruments provides this functionality, or to link to them otherwise.

At any moment there can be multiple observers, multiple administrators, but at most one operator that utilizes a particular Instrument. A second Operator user that tries to control an instrument already controlled by another Operator should be treated by the system as an Observer user. If the real Instrument can be partitioned in subsystems, multiple operators should be able to access different Instrument partitions via the same IE.

Close to these functional requirements a Grid of instruments introduces a set of nonfunctional requirements as summarized in table 1.1. For example the possibility to control about 104 _{instruments simultaneously in a interactive way. This}

intro-duces the need to provide a scalable system (marked as Scalability requirement) equipped with certain Quality of Service (QoS) guarantee as will be explained in future sections.

In a Grid of instruments, like the computational one, interoperability (marked as Standardization in table 1.1) is a mandatory requirements. Therefore the only possible communication between grid subcomponents is via Web Services (WS).

In addition, for complex systems, on-line diagnostics, error recovery and device organization robustness (marked as Autonomic) should be provided. Finally we need to consider that different instruments use different technologies and protocols in order to be accessed.

(26)

1.2. Thesis Contributions 5

Table 1.1: Non Functional Requirements

as pointed out in Section 1.2 and 1.3, in order to solve the trade-off between the necessity of universality and performance, we propose a set of solution that improves the capability of SOA Systems in term of throughput and service delay, allowing the prediction of a remote method execution time.

1.2 Thesis Contributions

In this Thesis we firstly introduce a classification of instruments and a uniform model for the control of each type of instruments. Then we show the IE Architecture and we present this system as a solution for integrating instruments with the ’classical’ Grid.

In order to fulfill the nonfunctional requirement of high interactivity of our IE, we evaluate the limit of XML-based protocols (SOAP) for performance and high-interactive distributed computing. One of the main results is a detailed analysis of the influence of XML parsers in the overheads associated with such communica-tions. We thus propose a new XML parser, the Cache Parser, which uses a cache to reduce the parsing time at sender and receiver side, by reusing information related to previously parsed documents/messages similar to the one under examination.

Furthermore, in order to address the QoS nonfunctional requirements of our IE, whose first release is based on Grid/Web Service technologies, we identify the prediction of a remote method execution time as one of the main challenges. We thus propose a methodology and an algorithm, based on 2k _{factorial analysis and}

on a Gaussian approximation of the collected data, that enables the estimation of a remote method execution time. Finally, we define three different software architectures in which the developed methodology and prediction algorithm can be exploited and integrated.

Still according to our analysis of non-functional requirements of our IE, we also address the problem of determining fast communication methods to permit instruments to properly inter-operate. In this regard, we investigate different pub-lish/subscribe architectures, and we show that RMM-JMS is the best candidate solution for accomplishing this task.

(27)

It is worth pointing out that the majority of the proposed solutions can be applied not only to the Instrument Element, but every application based on Web and Grid Services can benefit of these results.

Several parts of this thesis work are currently part of the outcome of the GridCC Project [77],[83], and the thesis has been developed in close interaction with the GridCC community. In particular, we have contributed to the entire software archi-tecture [48] presented in Section 2.2.1 and the related development presented in 2.3. The rest of the scientific contribution of this thesis is currently under adoption (see Chapter 2, 4, 5 and Appendix C) or under evaluation by the project community (Chapter 3)[48], [41],[37],[40],[20],[21],[19],[22],[35],[42],[28], [49]. In addition part of the results of this thesis have been pubblicated in conferences and journals [36], [44], [43], [45], [18], [15], [39], [16] or are under pubblication.

1.3 Thesis Overview

The rest of this PhD Thesis is organized as follows:

In Chapter 2 we show the Instrument Element system as a solution for inte-grating instruments with the ’classical’ Grid.

In Chapter 3 we evaluate the limit of XML for performance and high-interactive distributed computing and we also present the Cache Parser.

In Chapter 4 we address the QoS issues in a Web Service scenario. Then we present the methodology and the algorithm that enable the estimation of a remote method execution time. Finally, we suggest three different software architectures in which the developed algorithm can be integrated. In addition, in Appendix D we describe how the used datasets, exploited to train our predictive model and validate its estimations, have been created and we verify the data consistence.

In Chapter 5 we address the problem of determining a proper inter instrument fast communication channel. We investigate different publish/subscribe architec-tures. We thus present RMM-JMS as our candidate solution for accomplish this task. 5.4.

(28)

2

Instrument Element Architecture

The Grid is a hardware and software infrastructure for sharing computer power and data storage capacity over the Internet. The operative core of the standard computational Grid is mainly composed of two important elements: the Computing Element (CE) and the Storage Element (SE). The Grid aim to provide reliable and secure access to widely scattered resources for authorized users located virtually anywhere in the world. As anticipate in Chapter 1, we believe that the creation of a coherent collection of services can allow the remote configuration and control of physical instruments and a better integration with the computational Grid. These new collection of services that virtualize a set of physical instruments and permit a easy integration of them in the Grid is called are called Instrument Element (IE).

The rest of this chapter is organized as follows. In Section 2.1 we present a classification of instruments and a uniform model for the control of each type of instruments. In Section 2.2 we present the IE system as the way for integrating instruments and Grid. In Section 2.3 the technological choices for the implemen-tation of the first release of the proposed IE model are discussed. In addition in Appendix A we give an overview of the applications that already use the current implementation, while Appendix B shows some performance tests on the current implementation. Finally, in Appendix C we point out a set of techniques that can be plugged into the IE (see 2.2.2) in order to control complex Grid of instruments.

2.1 Instrument Classification

(29)

To summarize, from a Grid perspective, a device can belong to one of the fol-lowing categories:

• Dummy Instrument • Smart Instrument

• Smart Instrument in an ad hoc network

The first type comprises very simple hardware. The instrument in this category use data acquisition (DAQ) for remote operation; the data is collected with a set of devices that are physically linked together. The devices enable remote network or-ganization and provide higher-level functionalities. These instruments are typically deployed in remote sites far from the base station. In addition, only the remote (lo-cal from a sensor point of view) DAQ collector can be accessed in a remote way, thus acting as a proxy for the sensors that are physically connected to. Non-polarizable Petiau electrodes [79] used in the IMAA network, are examples of dummy sensors.

’Smart Instrument’ make up a large category comprising devices that provide all the functionality needed to be remotely accessed and controlled. High-energy physics and particle physics experiments use these kinds of sensors [130], [10], [128]. For instance, a Smart Instrument can be an electronic card that acquires data from the concrete detector and dispatches it to one or more machines that perform data aggregation. Typically, these devices are close together and physically connected via a fast communication channel, like a 1 Gb Ethernet or an optic fiber. Perfor-mances and scalability are an issue and key requirements of these instruments as part of a distributed system. An additional requirement imposed on these devices is autonomicity. The electronic front end and the information collector (event builder or EVB) of a high-energy experiment is composed by thousand of nodes (usually powerful PCs), thus, basic fault tolerance and dynamic instrument reconfiguration must be part of the device’s functionalities.

Smart Instrument in an ad hoc network can be seen as a specialization of the previous category. Such devices need not be in close physical proximity, but rather they can remotely communicate through specialized wireless connections. In gen-eral, batteries fuel these devices and mobile sensors are part of this category. The challenge is to minimize the energy consumption of the communication channel. Finally, we should keep battery consumption uniform between different devices to minimize human interaction with them.

2.1.1 A Uniform Instrument Model

(30)

2.1. Instrument Classification 9

Figure 2.1: abstraction model of a generic instrument

instrument hardware is changed and/or improved. A primary design goal for this chapter is to externalize the instrument description so that applications can build an operational model ”on the fly”. This approach makes it possible to preserve investments in data acquisition codes as instrument hardware evolves and to allow the same code to be used with several similar types of instruments.

The presented instrumentation model is used in order to meet the functional requirements, it can be used for each device independently from the category that it belongs. In the following Sections, we will better understand why the implemen-tation of this model is really dependent on the instrument types.

Figure 2.1 show our instrument model abstraction.

We can consider a generic device as a collection of Parameters, Attributes, a Control Model, plus an optional description language. More detailed Parameters are variables on which the instrument depends, like range or precision values of a measure, while Attributes refer to the effective object that the instrument is mea-suring.

The main difference between Parameters and Attributes is that typically the first one are accessed in a polling mode wile for certain type of attribute, like a cam streaming for instance, a publish/subscribe or a stream access method is more appropriate. Therefore, both push and pull access ways must be supported for attributes.

(31)

[87] can be use to describe the semantic of the particular instrument.

The presented abstraction provides a uniform layer across different device and can be used as building block for the control of complex system like the CMS ex-periment [10] that is composed by about 2 ∗ 107 _{piece of hardware and about 10}4

machines dedicate to the on-line elaboration phase. In order to simplify the control of these systems, instruments can be logically (or physically) grouped into hierar-chies, from which data can be aggregated or for which control commands affect multiple sensors or actuators. What is needed in this case is an instrument aggre-gation model, like the one that we will present in the next Section, which is capable to control all these devices in a congruent way.

2.2 Instrument Aggregation Model

The integration of one single instrument into the grid is a relatively simple task. Problems come when million of different instruments need to interoperate each other in order to achieve common goals, giving to the external users a well defined entry point. In order to simplify this task, some services could be created around the instrument that allow a uniform interaction , by giving the illusion of controlling a single device.

We define the term Instrument Element (IE) as a set of services that provide the needed interface and implementation that enables the remote control and monitoring of physical instruments. The IE need to be really flexible; in the simplest scenario this abstraction can represent a simple geospatial sensor or an FPGA card that performs a specific function, while in a more complex sensor network it can be used as a bridge between sensors and computational Grid. Finally, the IE can be part of the device instrumentation, permitting the organization of the instrument in a network allowing Grid interaction.

Unlike the CE and the SE, this Grid component is not accessed using non inter-active computational job execution, but requires a close interaction with the users sit in the Virtual Control Room (VCR) [60].

If we see the IE as a black box that allows the interaction of instruments in a uniform way, we can identify 3 different communication channels. Firstly a uniform interface that allows the control of the different system devices in a uniform and coherent way. Secondly an output channel that allow a fast instrument coopera-tion that permits the recepcoopera-tion of asynchronous data and monitor informacoopera-tion that came from the instrument attributes. And finally a set of services that allow the interaction between the instrument and the standard Grid system.

(32)

2.2. Instrument Aggregation Model 11

Figure 2.2: The Instrument Element Abstraction

therefore an IE can be part of other IEs. In addition, the IE acts also as a protocol adapter providing a uniform way to control heterogeneous devices. We believe that Web Services are an excellent choice when there is a need to provide a common language to the cross domain collaboration and to hide the internal implementation details of accessing specific instruments. Standards like SWE [13], JMX [50], or IVI [123] and P2P Systems like JINI [107] and Freenet [117], [11] have been analyzed in order to ensure that the front end (called Virtual Instrument Grid Services or VIGS) final methods are really capable to provide a generic instrument virtualization.

As final remark as already introduced in 1.1, the control of instruments demands deep interaction between users and devices. Consider that when the access is per-formed via internet using Web Services calls, the remote invocation time becoming critical in order to understand if a service can be controlled properly or the delays introduced by the wire are unacceptable.

To summarize the mentioned requirements, we can define two different types of access with different Quality of Service:

• Strict (hard) guarantees: The response to a requested service is reliable; in this case the availability of an Agreement Service [48] that perform advanced reservation is necessary and can negotiate ”interaction times” with a compo-nent.

• Loose (soft) guarantees: The response to a requested service is unreliable. Therefore QoS is provided on top on a best-effort infrastructure. In this case a prediction method based on statistical approach must be provided.

(33)

re-Figure 2.3: The Instrument Element Architecture

mote method execution time and can be used in this particular context, have been presented in Chapter 3, [44] and in Chapter 4, [43].

What follows is a description of the main building block and a detailed descrip-tion of the most important ones.

2.2.1 Instrument Aggregation Building Block

This Section describes the main IE building blocks presenting the set of additional services that can simplify the access and the control of instruments. We present the mentioned services as centralized components and in the next sections (2.2.5)we discuss on where these services should be implemented in a centralized or in a distributed way is underlined.

Figure 2.3 represents the detailed Instrument Element architecture and what follows is a short description of each subsystem component.

Instrument Managers (IM): The instrument managers are the parts of the instrument element that perform the actual communication with the instruments. They act as protocol adapters that implement the instrument specific protocols for accessing its functions. One IM can control more than one single device and it is coherent with the model presented in section 2.1.1. In other words it can be seen by other IMs as an instrument, allowing a hierarchic partition of the controlled device where the complexity of the system requires such control structure.

(34)

the system in groups, in order to facilitate the access. In this context a resource can be any hardware or software component that can be managed directly or indirectly through the network.

Information and Monitor Service (IMS): This service disseminates moni-toring data to the interested partners, giving a single access point to all the infor-mation produced by the instruments.

Problem Solver (PS): It has the main task of collecting alarms, errors, warn-ings and messages, which in our models are instrument attributes and parameters, in order to identify error conditions [36]. We have to note that an error recovery can be part of the IM control logic but while this component mainly is involved in the on-line recovery the PS can act off-line trying to discover unknown rules.

Access Control Manager (ACM): It is responsible for checking user creden-tials and deciding whether an external request should be processed by the Instrument Element [48].

Data Mover (DM): since we can not assume that instruments are complex devices, they need an external service in order to deal with the classical computa-tional Grid. This component provides this funccomputa-tionality in a centralized way. In a fully distributed scenario like Sensor Network a decentralized approach could more appropriate, therefore part of this functionality could be part of the IM component. This service provides the SRM [80], [81] interface to any external storage (SE) or processing elements (CE). It finds the ’best’ mechanism, such as GridFTP [74], [86] or others transport protocols, to move a file from one storage resource to another. For more demanding application grid standards could be inappropriate due to the high bandwidth requested and might be needed a streaming output and/or an MPI interface that allows push subscription capability of the instrument attributes.

2.2.2 The Instrument Manager (IM) Abstraction

(35)

key feature when working in a Grid of instruments. Instruments in the ’smart sensor’ category need to exchange data without loss, as fast as they are being generated. The data collectors devices, or some other dedicated set of instrument, process the data (filter and aggregate) and move it to a final location. Afterwords the data, via the data mover, is stored in a repository or in a storage element for future off-line analysis.

The main components of each Instrument Manager are:

The Communication Tools can be used by every single sub-component in order to publish/receive information like logs, errors, states, configuration, etc and to receive messages coming from others component. This service acts also as a proxy for the higher-level Data Mover service. If data produced by the set of controlled instruments arrive at large rate, this low-level service makes the movement of such data more manageable. Finally, these tools represent the instrument front end of the Information and Monitoring System.

The Control Manager is the component that actually controls the instrument and/or the instrumentation. The typical use case of this sub-system is to receive inputs from the users that are controlling the devices. It can also receive inputs like states or errors that come from the physical controlled instrument. So it can react in an autonomous way to unexpected behavior of the controlled resources by allowing automatic recovering procedures to be started. This autonomic action becomes critical, if the IM controls a large set of instruments while for simpler devices such functionality becomes less important compared to the need to have a plug-in system that integrates the device drivers.

Input Manager: waits for an external input and present it to the Event Proces-sor. Input can come from users, other instrument managers, or from the devices themselves in case of smart instruments. In this last case, considering that each in-strument have a different way to communicate, a driver component must be provided and loaded inside the particular IM. Resource Proxy: this component represents the instrument manager instrument front end. The control library used in order to ex-change messages with the physical device must be plugged inside; this is the minimal customization action that must be performed in order to plug a generic device into the Grid.

Event Processor: in this component the command will be elaborated in order to control the instrument properly. By default, events just trigger the proper action in the proper resource proxy. In the same time, using the plug-in system, this compo-nent can be used in order to provide an aggregate and more complex control allowing the possibility to plug inside every possible algorithm: Expert systems, fuzzy logic, custom if-then-else, Neural Networks, etc.. Appendix C provides an overview of techniques that can be adopted in complex control systems. As final remark, this component represents the basic infrastructure of the Problem Solver allowing recov-ery action and or fault-tolerant procedure, in case of sub-system failure.

(36)

Figure 2.4: VCR-IE Interactions

the action that must be performed according to a particular triggered transition. It also provides the possibility to perform introspection by external users that want to control the particular IM.

Figure 2.4 shows in details the interactions occurring in the IM Sub-component. The dotted lines are optional actions that can be performed or not on the basis of the particular received input, while the others lines are mandatory. In addition, the first two lines are actions triggered by the users in the VCR. From the temporal point of view, events coming from users command or instrument messages are received from the input manager and the information is processed in the Event Processor sub-module. Such sub-module, on the basis of the received input, can decide to perform a state transition according to its finite state machine (FSM), and/or control the real instruments via the Resource Proxy module. Once the action is successfully or unsuccessfully accomplished, a notification is sent back to the users.

2.2.3 Resource Service (RS)

The complexity of the information managed by the RS is really instrument depen-dent and ranges from practically fix configuration to a configuration and orchestra-tion of thousands of nodes [13], [10],[32],[128]. In any case, it provides a uniform way and a single entry point of access to the information related to a particular instrument from external users.

(37)

Figure 2.5: Classification of the Information Contained in the Resource Service to bookmark resources by authorized users.

The Figure 2.5 classify the information that have to be retrieved from the Re-source Service for every instrument.

From a semantic point of view we can divide the information in three different categories. (1) Information, like physical locations of configuration file or driver type etc, which is only internal to the Instrument Element and is needed in order to ensure the correct instrument instantiation. (2) Information that can be modified at run time by the users that could change the global behavior. The numbers and types of instruments that should be used to perform particular aggregate actions are typical information that belongs to this category. The last category (3) of information identifies the instrument topology, i.e., both potential and actually performed intra-instrument connections.

The same information could be categorized from the dynamicity point of view. (a) Static information refer to data that will be defined at deployment time and they will never change in the future. As opposite, (b) Dynamic information consists of data that can change in an automatic way, without users intervention. In the middle, we have (c) Low Dynamic information, which corresponds, for example,to adjustments performed by the users at run time.

(38)

Figure 2.6: Publish Subscribe Architecture

2.2.4 Information and Monitor Service (IMS)

Instruments and Instrument Managers dispatch data and monitorable information via this service. We have to point out that more demanding applications like [10] request that this service can handle about 105 _{messages per second. Therefore this}

system cannot be implemented in a centralized way. In addition, as explained in chapter 3 standard messages format like SOAP cannot be used in these component due to the overhead that this serialization introduces [44]. Finally, taking into account that instruments are independent and low coupled with each others, an information and monitor system should preserve the mentioned properties.

As a result, architectures like the one presented in [91], [6] and [4] appear to be the most appropriate for this task. In these systems peers publish information in a given topic to subscribers that previously notifies their interest (see figure 2.6 ).

Using this particular approach we allow peers in the network to the dynamically appear and disappear, preserving a certain robustness with respect to system faults [14].

Once the connection to the particular information channel has been established, publishers can start sending data. Considering that high throughput is a must in this cases a bridged/relay [4],[109] solution appears to be the only way to preserve this characteristic.

Figure 2.7 tries to explain what was mentioned before in a simplified scenario, where one publisher, which belongs to a particular private network, tries to send information to peers disseminated throughout the world. Using a smart protocol, the publisher can reach peers locate in the same LAN and/or communicate to the bridge component that is connected with others bridges via standard communication protocols, like TCP and HTTP. Bridges, once received the messages, firstly convert them into more suitable format, and secondly send them to peers that are in the same LAN domain. Multicast protocols can be used in order to reduce the number of messages that publishers need to continuously improving the performance of the system [8].

2.2.5 Static vs Dynamic Aggregation Models

(39)

Figure 2.7: IMS Architecture

set of examples. Even if we believe that a uniform and coherent set of services can facilitate their aggregation and the interoperability across different organization, the approach to the implementation could be really different. In [10] for instance, the CMS data acquisition phase can start only if all the instruments of the system are in a consistent status, while, in [13], [32], [51], instruments can dynamically join the system. In these cases we can not assume information like an index of all in-struments, that is the base abstraction of the Resource Service, is static. The IM behavior needs to dynamically adapt itself to the dynamic existing instrument struc-ture. This particular functionality is typical of P2P [66] systems where the network can dynamically adapt to peer changes. By the way, a single and relayable entry point of this information is mandatory if we want to provide a set of instruments as a service for the computational Grid.

Considering the information categorization that we defined in Section 2.2.3, we can note that static information and some of the low dynamics, belong to a partic-ular instrument instance while instruments topology information must be a sheer attribute between devices in order to avoid collisions organizing the instruments in the proper way. In typical discovery based on P2P, a peer announces itself to the network giving other peers the possibility to perform query and exchange data.

In this scenario, instruments can dynamically engage other existing instruments, performing a system lookup and allowing the dynamic determination of the peers’ topology.

(40)

Figure 2.8: Instrument Discovery Interaction Diagram

Figure 2.8 explains the dynamic join of an instrument into the System. After a bootstrap, the instrument sends a discovery request to other peers and relays even-tually forward this request to unreachable devices. Instruments reply to this request by announcing their presence into the network and then the new born instrument enquirers the others in order to discover what type of device they are. Once the instrument finds the needed resources it engages and uses them.

If an instrument disappears from the system, other devices can repeat the dis-covery/Information Enquiry phases in order to try and find the needed resources. In addition this operation can also be repeated in case of failure in order to detect the recovery of the needed subcomponent allowing an autonomic behavior.

(41)

Figure 2.9: Embedded Devices

2.2.6 Embedded Devices

In this context we refer to an embedded device as one with limited computational and network capacity. In order to reduce the system requirements, IM with very limited functionality can be directly installed into the electronic front end. This enables others IMs to remotely contact this device using a low level communication channel, allowing a bridged communication that can enable a more elaborate interaction.

For complex system like [10], [128], Embedded applications that run in ad hoc electronic cards can also use the communication tools to dispatch the data just acquired to the thousands of nodes that constitute the event builder layer. Each Event Builder (EVB) machine, which can be seen as an instrument, operates as a subscriber on the messages (data) sent by the publishers - the devices on the cards. The selection of the required devices is enabled by associating a topic with a device. The incoming data messages are then sampled using the publish/subscribe selector capabilities [4], [91]. The data is further processed and an event is generated and sent to a subset of machines that perform an additional intermediate step of ”collection and aggregation”. Finally, this data is sent to the last layer of machines that save the data via instrument managers directly connected to the data mover.

2.3 Conclusion

Several technologies have been used in order to implement the first Java based IE release, such as tomcat + axis as web service engine, that provide a WS-I compliant [99] software end point. The usage of quite mature open source software was driving our choices in this field.

(42)

2.3. Conclusion 21

order to retrieve the instruments configuration information and the IE embedded GUI is based on JSP [92] and AJAX [47] technologies.

Finally the standard Apache Logging System [102], such as log4j, has been adopted in order to acquire run time information of each single software sub-component.

In order to ensure the high throughput needed by the IMS component (see 1.1 2.2.4 ) we have also used a JMS [91] implementation on top of a high performance Reliable Multicast Messaging (RMM) layer [8]. As we will better understand in Chapter 5, RMM allows hosts to reliably exchange data messages over the standard IP multicast network. It exploits the IP multicast infrastructure to ensure scal-able resource conservation and timely information distribution with reliability and traffic control added on top of the standard multicast networking. The RMM-JMS extension is a very efficient Java implementation of the JMS standard using RMM services.

RMM-JMS supports:

• Multicast transport for pub/sub messaging: Supporting the JMS topic-based messaging and API, with matching done at the IP multicast level. The trans-port is a Nack-based reliable multicast protocol.

• Direct (broker- less) unicast for point-to-point messaging: Supporting the JMS Queue based messaging and API. The transport is the TCP protocol.

• Bridged/Brokered unicast transport for pub/sub messaging.

In order to fulfill the Scalability requirement mention in Section 1.1, Chapter 3 evaluate the limit of XML for high-performance and high-interactive distributed computing like the presented IE. It also proposes a new parser, the Cache Parser, which uses a cache to reduce the parsing time sender and receiver side, by reusing information related to previously parsed documents/messages similar to the one under examination.

(43)

(44)

3

On Improving the Web Service

Interactivity

This chapter addresses the problem of performance degradation due to the use of Web Service (WS) communication in a distribute architecture like the Instrument Element Architecture proposed in chapter 2. As stressed in Section 1.1 and 2.3 Scal-ability and high Interactivity between users and instruments is a challenge activity. Therefore we believe that, reducing the overhead introduce by the adoption a high levels standards like WS, we can increase the effective system performances.

In Section 3.1 we formalize the above mentioned problem while in Section 3.2 we present the present status of art.

In Section 3.3 we investigate the limitations of XML for high-performance and high-interactive distributed computing. Experimental results clearly show that fo-cusing on parsers, which are routinely used for deserialize XML messages exchanged in these system, we can improve the performance of a generic end-to-end solution based on web services.

In Section 3.4 we present a new parser, the Cache Parser, which uses a cache to reduce the parsing time sender and receiver side, by reusing information related to previously parsed documents/messages similar to the one under examination.

Finally, in section 3.5 we will show how our fast parser can improve the global throughput of any application based on Web or Grid Services, or also JAXP-RPC. Experimental results demonstrate that our algorithm is 25 times faster than the fastest algorithm in the market and, if used in a WS scenario, can dramatically increase the number of requests per second handled by a server (up to 150% of improvement) bringing it close to a system that does not use xml at all.

3.1 Problem Formalization

(45)

users to introduce elements and attributes, their names and their relations in the document, by specifying a particular XML syntax (DTD/Xschema). The purpose of this syntax is to define the legal building blocks, the structure and the list of legal elements of an XML document.

An XML-based set of technologies are those at the basis of Web Services (WSs) [38], [76],[82],[66], by which existing legacy systems can be wrapped as WSs, and made available for integration with other systems. Applications exposed as Web services are accessible by other applications running on different hardware plat-forms and written in different programming languages. Using this approach, the complexity of these systems can be encapsulated behind XML/SOAP protocols.

A common trade-off in computing is between the need of universality and perfor-mance, and this is particularly true when WSs must be exploited to design a system in which both high performance and QoS requirements are mandatory. Fulfilling both such requirement is really necessary in a limit case such as scientific com-puting. It demands the full range of capabilities that industrial computing does: reliable transfer in distributed heterogeneous environments, parallel programs of-ten exchanging data, self-contained modules sending events to steer other modules, and complex run-time systems designed for heterogeneous environments with dy-namically varying loads, multiple communication protocols, and different Quality of Service (QoS) requirements. Unfortunately, the qualities of SOAP that makes it universally usable tend to lower the communication performance. In particular, the features that make XML communication inefficient regard the primarily ASCII format of XML, and the verbosity of XML, due to the need of expressing tags and attributes besides the true information content.

As we will see in Section 3.3, in a WS environment a lot of runtime activity is however spent in parsing XML documents: every client or server process needs to exploit an XML parser to send/receive messages. So speeding up the parsing algorithm should have a big impact on the total communication time, by largely reducing overheads. In particular, we are interested in reducing the overheads on the receiver side, where the task of a parser is to deserialize the message by checking whether it conforms to the DTD/Xschema syntax, and extracting data from the textual XML representation.

(46)

3.2. Related Work 25

on the document syntactic tree. Otherwise, the document will be analyzed by using a standard parser, and a new cache entry will be created to store the syntactic tree of the new document.

Like all the WS, the VIGS, i.e. the user interface of our IE, has a static structure. In addition the XML documents/messages that are exchanged between the IE’s pro-cesses and the specific instruments, are usually characterized by a ”persistent” struc-ture. Note that in our WS implementation such messages are XML-formatted ones, which are inserted in SOAP envelopes and then passed on via HTTP to a receiver that parses it to extract valuable data. Even if all the remarks about the persistence of the structures of the message are motivated by our specific Grid use-case, where a multitude of senders have to send multiple times messages characterized by the same structure to a small set of receivers, a similar persistence in XML messages exchanged can also be observed in several other distributed applications based on Web/Grid Service technologies.

3.2 Related Work

Parsers break documents into pieces such as start tags, end tags, attribute, value pairs, chunks of text content, processing instructions, comments, and so on. These pieces are fed to the application using a well-defined API, implementing a particular parsing model. Four parsing models are commonly in use [98], [131], [124], [56], [27]: One-step parsing (DOM): the parser reads the whole XML document, and generates a data structure (a parse tree) describing its entire contents.

Push parsing (SAX): the parser sends notifications to the application about the types of XML document pieces it encounters during the parsing process.

Pull Parsing: the application always asks the parser for the next piece of information appearing in the document associated with a given element. It is as if the application has to ”pull” the information out of the parser, and hence the name of the model.

Hybrid Parsing: this approach tries to combine different characteristics of the other parsing models to create efficient parsers for special scenarios.

(47)

Figure 3.1: 3-tier Architecture and Sequence diagram document sharing the same XML tree but with different tag value.

3.3 Understanding the XML limit

This Section describes a high level and general architecture, used to build a generic modern web-based application like our IE. The general architecture reviewed is applicable across technologies [53], [3] so that we use it to understand the limits that an XML solution can introduce in terms of interactivity and handling requests per second. A modern web-based enterprise application has 3 layers (see figure 3.1): A client layer, which is responsible for interacting with the user, e.g., for rendering Web Page;

A middle tier that includes:

1. A Presentation Layer, which interprets user inputs (e.g., her/his submitted HTML forms), and generates the outputs to be presented to her/him (e.g., a WebPage, including their dynamic content).

2. A Business Logic Layer, which enforces validations, and handles the interaction with the data layer.

A data layer, which stores and manages data, and offers the handling interface to the upper layers.

(48)

3.3. Understanding the XML limit 27

cost of maintenance. Three-tier architectures also enable large-scale deployments, in which hundreds or thousands of end users are enabled to use applications that access business information. Our motivating application, the IE, follows this ab-stract architecture: it is just a 3-tier application, with a strong separation between the Business Layer and the Presentation layer, and which uses a very simple data layer. Talking about business to consumer applications, the client layer of a web application is implemented as a web browser running on the user’s client machine. Its job in a web-based application is to display data and let the user enter/update data. In a business to business scenario the client layer can be a generic application, compliant to the web-service standard. The presentation sub-layer generates (or displays) WebPages, or produces (or interprets) XML-based SOAP messages in a Web Service scenario. If necessary, it may include dynamic content in them. The dynamic content can originate from a database, and it is typically retrieved by the Business logic that:

• performs all required calculations and validations;

• manages a workflow (including keeping track of session data); • handles all the needed data access.

For smaller web applications, it may be unnecessarily complex to have two sepa-rate sub-layers in the middle tier. In addition the sub-layer communications typically do not use XML.

From a temporal point of view (showed in figure 3.1) a client (Web Services, web browser, java, c++, etc) performs a request to the business logic, which dynamically retrieves the requested information. During the elaboration phase, the server can either perform one or more queries to a persistent storage, or interact with other sub-business unit. Once the information is retrieved, the server formats the retrieved information in a way that the client is able to understand, and sends it back to the client.

(49)

3.3.1 3-tier Test bed

In order to reproduce the architecture presented at the beginning of this paragraph, in the same local LAN we set up two different server machines, the former hosting an Oracle DB Server as a storage system, and the latter hosting a Tomcat Application Server as a Middle Tier. Then we run a set of clients on other different machines in the same cluster.

The hardware used in this test is:

• Database Server: Dual Xeon 1.8 GHz, with 2 GB RAM, Ethernet 1 Gbps, OS Red Hat Linux Advanced Server 2.1.

• Application Server: Dual Xeon1.8 GHz, with 1.5 GB RAM, Ethernet 1 Gbps, OS Red Hat Linux Advanced Server 2.1.

• Clients: 25 Pentium III 600 MHz, with 256 MB RAM, Ethernet 1 Gbps, OS Linux Red Hat 9.0.

As Client/Server communications we used: • Simple HTTP

• SOAP/XML over HTTP

Within the same Tomcat server, we also set up an Axis engine, in order to evaluate the performance of a WS communication (i.e., SOAP/XML+HTTP with automatic generated stubs) between client and server.

The DB table structure and the complexity of the queries are really simple (i.e.., 1 table, with 5 attribute, as a DB structure).

Our benchmark is thus the following: the clients (Java applications) perform a request to a Tomcat Servlet or to Web Service. The business logic layer, in order to present the result to the client, performs a query using a pre-generated JDBC connection, to the DB server. Then the same layer formats the result and sends it to the client.

In this scenario we evaluate both the response time of a single client connec-tion, and the global application server throughput in terms of satisfied requests per second. The results are shown in next paragraph.

3.3.2 Service Request Time

We measured the service request time, just using one client. Therefore we can assume that both the application server and the data base server are not busy, as they have to serve one request at a time.

(50)

3.3. Understanding the XML limit 29

Bringing instruments to a service-oriented interactive grid

Dipartimento di Informatica

Dottorato di Ricerca in Informatica

Ph.D. Thesis: TD-2007-2

Bringing Instruments to a Service-Oriented

Interactive Grid

Francesco Lelli

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Preface

1

Introduction

1.1

Instrument Element Functional and

Nonfunc-tional Requirements

1.2

Thesis Contributions

1.3

Thesis Overview

2

Instrument Element Architecture

2.1

Instrument Classification

2.1.1

A Uniform Instrument Model

2.2

Instrument Aggregation Model

2.2.1

Instrument Aggregation Building Block

2.2.2

The Instrument Manager (IM) Abstraction

2.2.3

Resource Service (RS)

2.2.4

Information and Monitor Service (IMS)

2.2.5

Static vs Dynamic Aggregation Models

2.2.6

Embedded Devices

2.3

Conclusion

3

On Improving the Web Service

Interactivity

3.1

Problem Formalization

3.2

Related Work

3.3

Understanding the XML limit

3.3.1

3-tier Test bed

3.3.2

Service Request Time