KIT | KIT-Bibliothek | Impressum | Datenschutz

Hiding Outliers in HighDimensional Data Spaces

Steinbuß, Georg; Böhm, Klemens

Detecting outliers in high-dimensional data is crucial in many domains. Due to the curse of dimensionality, one typically does not detect outliers in the full space, but in subspaces of it. More specifically, since the number of subspaces is huge, the detection takes place in only some subspaces. In consequence, one might miss hidden outliers, i.e., outliers only detectable in certain subspaces. In this paper, we take the opposite perspective, which is of practical relevance as well, and study how to hide outliers in high-dimensional data spaces. We formally prove characteristics of hidden outliers. We also propose an algorithm to place them in the data. It focuses on the regions close to existing data objects and is more efficient than an exhaustive approach. In experiments, we both evaluate our formal results and show the usefulness of our algorithm using di↵erent subspace selection schemes, outlier detection methods and data sets.

Open Access Logo

Volltext §
DOI: 10.5445/IR/1000064592
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2017
Sprache Englisch
Identifikator ISSN: 2190-4782
KITopen-ID: 1000064592
Verlag KIT, Karlsruhe
Umfang 13 S.
Serie Karlsruhe Reports in Informatics ; 2017,1
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page