KIT | KIT-Bibliothek | Impressum

Hiding Outliers in HighDimensional Data Spaces

Steinbuß, Georg; Böhm, Klemens

Abstract: Detecting outliers in high-dimensional data is crucial in many domains. Due to the curse of dimensionality, one typically does not detect outliers in the full space, but in subspaces of it. More specifically, since the number of subspaces is huge, the detection takes place in only some subspaces. In consequence, one might miss hidden outliers, i.e., outliers only detectable in certain subspaces. In this paper, we take the opposite perspective, which is of practical relevance as well, and study how to hide outliers in high-dimensional data spaces. We formally prove characteristics of hidden outliers. We also propose an algorithm to place them in the data. It focuses on the regions close to existing data objects and is more efficient than an exhaustive approach. In experiments, we both evaluate our formal results and show the usefulness of our algorithm using di↵erent subspace selection schemes, outlier detection methods and data sets.

Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Forschungsbericht
Jahr 2017
Sprache Englisch
Identifikator DOI(KIT): 10.5445/IR/1000064592
ISSN: 2190-4782
URN: urn:nbn:de:swb:90-645920
KITopen ID: 1000064592
Verlag Karlsruhe
Umfang 13 S.
Serie Karlsruhe Reports in Informatics ; 2017,1
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page