On the Usefulness of SQL-Query-Similarity Measures to Find User Interests

Arzamasova, Natalia; Böhm, Klemens; Goldman, Bertrand; Saaler, Christian; Schäler, Martin

doi:10.5445/IR/1000093761

On the Usefulness of SQL-Query-Similarity Measures to Find User Interests

Arzamasova, Natalia; Böhm, Klemens; Goldman, Bertrand; Saaler, Christian; Schäler, Martin

Abstract:

In the sciences and elsewhere, the use of relational databases has become ubiquitous. An important challenge is finding hot spots of user interests. In principle, one can discover user interests by clustering the queries in the query log. Such a clustering requires a notion of query similarity. This, in turn, raises the question of what features of SQL queries are meaningful. We have studied the query representations proposed in the literature and corresponding similarity functions and have identified shortcomings of all of them. To overcome these limitations, we propose new similarity functions for SQL queries. They rely on the so-called access area of a query and, more specifically, on the overlap and the closeness of the access areas. We have carried out experiments systematically to compare the various similarity functions described in this article. The first series of experiments measures the quality of clustering and compares it to a ground truth. In the second series, we focus on the query log from the well-known SkyServer database. Here, a domain expert has interpreted various clusters by hand. We conclude that clusters obtained with our new measures of similarity seem to be good indicators of user interests.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000093761

Veröffentlicht am 16.04.2019

Export

Statistiken

Seitenaufrufe: 371
seit 16.04.2019

Downloads: 2480
seit 16.04.2019

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Forschungsbericht/Preprint
Publikationsjahr	2019
Sprache	Englisch
Identifikator	ISSN: 2190-4782 KITopen-ID: 1000093761
Verlag	Karlsruher Institut für Technologie (KIT)
Umfang	18 S.
Serie	Karlsruhe Reports in Informatics ; 2019,3
Schlagwörter	SQL log analysis, SQL query representations, similarity measures

Repository KITopen

On the Usefulness of SQL-Query-Similarity Measures to Find User Interests

Abstract: