KIT | KIT-Bibliothek | Impressum | Datenschutz
Open Access Logo
DOI: 10.5445/IR/1000083093
Veröffentlicht am 28.05.2018

Exploiting subspace distance equalities in Highdimensional data for knn queries

Schäler, Martin; Broneske, David; Köpen, Veit; Saake, Gunter

Efficient k-nearest neighbor computation for high-dimensional data is an important, yet challenging task. The response times of stateof-the-art indexing approaches highly depend on factors like distribution of the data. For clustered data, such approaches are several factors faster than a sequential scan. However, if various dimensions contain uniform or Gaussian data they tend to be clearly outperformed by a simple sequential scan. Hence, we require for an approach generally delivering good response times, independent of the data distribution. As solution, we propose to exploit a novel concept to efficiently compute nearest neighbors. We name it sub-space distance equality, which aims at reducing the number of distance computations independent of the data distribution. We integrate knn computing algorithms into the Elf index structure allowing to study the sub-space distance equality concept in isolation and in combination with a main-memory optimized storage layout. In a large comparative study with twelve data sets, our results indicate that indexes based on sub-space distance equalities compute the least amount of distances. For ... mehr

Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Forschungsbericht
Jahr 2018
Sprache Englisch
Identifikator ISSN: 2190-4782
URN: urn:nbn:de:swb:90-830934
KITopen ID: 1000083093
Verlag Karlsruhe
Umfang 11 S.
Serie Karlsruhe Reports in Informatics ; 2018,6
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page