KIT | KIT-Bibliothek | Impressum | Datenschutz

Topic-based Selectivity Estimation for Hybrid Queries over RDF Graphs

Wagner, Andreas; Bicer, Veli; Tran, Duc Thanh

Abstract:
The Resource Description Framework (RDF) has
become an accepted standard for describing entities on the Web.
Many such RDF descriptions are text-rich – besides structured
data, they also feature large portions of unstructured text. As a
result, RDF data is frequently queried using predicates matching
structured data, combined with string predicates for textual constraints:
hybrid queries. Evaluating hybrid queries requires accurate
means for selectivity estimation. Previous works on selectivity
estimation, however, suffer from inherent drawbacks, reflected
in efficiency and effective issues. In this paper, we present a
general framework for hybrid selectivity estimation. Based on its
requirements, we study the applicability of existing approaches.
Driven by our findings, we propose a novel estimation approach,
TopGuess, exploiting topic models as data synopsis. This enables
us to capture correlations between structured and unstructured
data in a uniform and scalable manner. We study TopGuess in
theorical manner, and show TopGuess to guarantee a linear space
complexity w.r.t. text data size, and a selectivity estimation time
complexity independent from its synopsis size. ... mehr

Open Access Logo


Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Forschungsbericht
Jahr 2013
Sprache Englisch
Identifikator KITopen-ID: 1000091515
Verlag KIT, Karlsruhe
Umfang 19 S.
Externe Relationen Abstract/Volltext
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page