• No results found

Optimizing RDF chain queries using genetic algorithms

N/A
N/A
Protected

Academic year: 2021

Share "Optimizing RDF chain queries using genetic algorithms"

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Optimizing RDF chain queries using genetic algorithms

Citation for published version (APA):

Hogenboom, A. C., Milea, D. V., Frasincar, F., & Kaymak, U. (2010). Optimizing RDF chain queries using genetic algorithms. 1-1. Paper presented at Dutch-Belgian Database Day 2010 (DBDBD 2010), November 22, 2010, Hasselt, Belgium, Hasselt, Belgium.

Document status and date: Published: 01/01/2010 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Optimizing RDF Chain Queries using Genetic Algorithms

Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak

{hogenboom, milea, frasincar, kaymak}@ese.eur.nl

Econometric Institute

Erasmus University Rotterdam

PO Box 1738, NL-3000 DR

Rotterdam, the Netherlands

In an Electronic Commerce environment, Seman-tic Web technologies are promising enablers for large-scale knowledge-based systems, as they facil-itate machine-interpretability of data through ef-fective data representation. Fast query engines are required for efficient real-time querying of large amounts of data, usually represented using the Re-source Description Framework (RDF). An RDF model is a collection of RDF facts declared as a col-lection of triples, each of which consists of a subject, a predicate, and an object. These triples can be vi-sualized using an RDF graph, which is a node and directed-arc diagram, in which each triple is repre-sented as a node-arc-node link. RDF sources can be queried using SPARQL. The execution time of a query depends on the order in which parts of the query paths are executed. The query optimization challenge addressed here is to determine the right join order, hereby optimizing the overall response time.

In the context of the Semantic Web, two-phase optimization (2PO) has been proposed to optimize RDF query paths [3]. However, other algorithms have not yet been used for RDF query path determi-nation, while genetic algorithms (GAs) have proven to be more effective than SA in similar problems [2]. A GA is an optimization algorithm simulating bio-logical evolution according to the principle of sur-vival of the fittest. A set of chromosomes, repre-senting solutions, is exposed to evolution, consisting of selection, crossovers , and mutations. The main goal we pursue consists of investigating whether an approach based on GAs outperforms 2PO in RDF query path determination. As a first step, we fo-cus on the performance of such algorithms when optimizing a special class of SPARQL queries, RDF chain queries (where the WHERE statement only contains a set of chained RDF node-arc-node pat-terns), on a single source.

We assess the performance of a GA compared to 2PO on a single source. Each algorithm is tested on chain queries varying in length from 2 to 20 predicates. Each experiment is iterated 100 times. For relatively small chain queries containing up to

about 10 predicates, 2PO turns out to require the least time for query optimization. For bigger chain queries, a GA converges faster to the solution. Fur-thermore, a GA tends to find better solutions of more consistent quality than 2PO does, especially for larger queries. When a time limit of 1 second is set (allowing the algorithms to perform at least a couple of iterations while assuming this to be an ac-ceptable maximum waiting time in a real-time envi-ronment), a GA tends to generate solutions of even better quality compared to 2PO. The consistency in solution quality of RCQ-GA, as opposed to 2PO, is not clearly affected by a time limit.

In his talk, Alexander Hogenboom (PhD stu-dent at Erasmus University Rotterdam) will present these results, as further detailed in [1]. He will show that in optimizing query paths for chain queries in a single-source RDF query execution environment, the performance of a GA compared to 2PO is posi-tively correlated with solution space complexity and environmental restrictiveness (a time limit). The proposed GA outperforms 2PO in solution quality, execution time needed, and consistency of solution quality.

References

[1] Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak. RCQ-GA: RDF Chain Query Optimization using Genetic Algo-rithms. In Tenth International Conference on E-Commerce and Web Technologies (EC-Web 2009), pages 181–192, 2009.

[2] Michael Steinbrunn, Guido Moerkotte, and Al-fons Kemper. Heuristic and Randomized Opti-mization for the Join Ordering Problem. The VLDB Journal, 6(3):191–208, 1997.

[3] Heiner Stuckenschmidt, Richard Vdovjak, Jeen Broekstra, and Geert Jan Houben. Towards Dis-tributed Processing of RDF Path Queries. Inter-national Journal of Web Engineering and Tech-nology, 2(2-3):207–230, 2005.

Referenties

GERELATEERDE DOCUMENTEN

Now perform the same PSI blast search with the human lipocalin as a query but limit your search against the mammalian sequences (the databases are too large, if you use the nr

Universiteit Utrecht Mathematisch Instituut 3584 CD Utrecht. Measure and Integration

Two variants of this algorithm has been developed: a basic variant whereby the full data stream must always be scanned and all the tuples matching the query or current group

On the other hand, if a retrieval model uses methods (such as probabilistic relationships between different query terms) that cannot be expressed using simple probabilistic

Graphically querying RDF using RDF-GL Citation for published version (APA):..

Not only did the anti-slavery cause have a powerful film that could stir people's hearts, it also had a passionate, high-profile advocate in its British director, Steve McQueen,

Keywords: Consumer decision making, Search engine, Information search intent, Purchase intent, Search Queries, Search Query Anatomy, Topic familiarity, Media

We want to use LEAN to improve the current processes (especially the delivery reliability) in which flow is an important aspect. The focus of LEAN is on getting the right things to