• No results found

From Scripts Towards Provenance Inference

N/A
N/A
Protected

Academic year: 2021

Share "From Scripts Towards Provenance Inference"

Copied!
21
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

From Scripts

towards Provenance Inference

Rezwanul Huq*, Peter M.G. Apers, Andreas Wombacher University of Twente, The Netherlands.

Yoshihide Wada, Ludovicus P. H. van Beek Utrecht University, The Netherlands.

(2)

eScience Application

(3)

Activity View V1 View V2 View V3 Buffers Trigger T Output View V’ σI1(V1) σI2(V2) σI3(V3)

Workflow Model: Activity

 Views: data

 Interval predicates  windows

 Trigger: based on windows  Exactly one output view Windows

(4)

Workflow & Provenance

V1 V0 V2 V4 V3 P1 P2 P3  Provenance: derivation history of data products starting from its original sources.

(5)

Workflow Provenance Capture: State-of-art

Provenance-unaware Platform

Languages like Python Tools like Excel, R

Provenance-aware Platform

Kepler, Taverna, Karma, VisTrails STREAM, Aurora Workflow Provenance Bridging Gap – How? Manually Building Time, Training

(6)

Problem Statement

How to capture Workflow Provenance

automatically in a provenance-unaware

(7)

Our Contribution: Workflow Provenance Inference

Provenance-unaware Platform

Languages like Python Tools like Excel, R

Provenance-aware Platform

Kepler, Taverna, Karma, VisTrails STREAM, Aurora Workflow Provenance Workflow Provenance Inference For Python

(8)

Workflow Provenance Inference: Challenge

 Capturing data dependences by analyzing the script.

 Translating control dependences into data dependences.

(9)

Workflow Provenance Inference: Overview

Python Script Parsing 1 AST Traversing Objects Transformation2 Initial Graph Re-writing 2 Provenance Graph

1 off-the-shelf grammar from ANTLR site 2Attributed Graph Grammar (AGG)

(10)

Provenance Graph Model

 Represented as a graph  Provenance graph

Windows  Trigger

 Input-output ratio  hasOutput

(11)

Transformation Phase

 Building the initial graph

 Preserving order between statements  Maintaining versions of variables

(12)
(13)

Re-writing Phase

 A rule consists of LHS and RHS.

 A pattern matches to LHS will be replaced by RHS.  Re-write rules for:

 Translating control-flow statements  Maintaining persistence of views  Ensuring compactness of the graph

(14)
(15)
(16)

Evaluation: Use Case

 Water Scarcity Modeling.

 We focus on estimating irrigation water demand.

 Several files (PCRaster maps with 360*720 dimension) are used with different PCRaster operations.

(17)

Evaluation: Quantitative Analysis

Workflow Provenance

Lines 120 Initial Graph: ~ 450 nodes

Final Graph: ~ 139 nodes

Fine-grained Provenance Inference

> 3000 maps ~ 40 GB offline data Inference Methods

(18)

Evaluation: Qualitative Analysis (I)

 Open-ended interview with two scientists

 Debugging-friendliness  Extensibility

(19)

Evaluation: Quantitative Analysis (II)

“I need to access library

functions or functions written elsewhere.”

“This is too detailed. I want to group some elements to have an overview of the processing”

“Sometimes, I used to spend hours finding reasons for

having an unexpected value.”

Extensibility

 Need to enter few information for the very first run.

Customization

 Adaptation based on user preference is possible.

Debugging-friendliness

Easy access to data  code efficiency

(20)

Conclusion & Future Plan

 Workflow provenance capture in provenance-unaware platform

 Manually capturing requires both time, training

Workflow Provenance Inference

 Future Plan

 Address other control-flow statements  Build a complete framework with GUI

(21)

Referenties

GERELATEERDE DOCUMENTEN

• To investigate the impact of technology readiness, product characteristics and social influence on AP acceptance in a larger sample, including both self-selected and invited

Die Ge sondheidskomitee het die klagtes teen hulle weerle. en daarop gewys clat die ondertekenaars van die petisie onder 'n wanindruk verkeer. het hulle uit die

The regional morphological differences can be studied through a recently developed trans-national database, containing more than 135,000 mostly yearly cross-shore

To investigate what local energy planning and implementation processes look like in the post-liberalisation era we conduct a systematic literature review by addressing the

Daar- naast geeft de SWOV in het kader van dit uitgebreide onderzoek advie- zen over meer algemene problemen op verkeersveiligheidgebied, die niet alleen voor deze provincie

on to this entreaty by marrying the criticality of municipal service delivery with the promise, hope and government mandate entrenched in the entire Bill of Rights

Impact of person-centered and integrated care for community-living older adults on quality of care and service use and costs (prof SA Reijneveld, prof HPH Kremer, dr K Wynia).

Most notably in light of the latter principles, Held can be said to go beyond Kant with his cosmopolitan project, by addressing the social and economic issues