Hiding Outliers in HighDimensional Data Spaces

Steinbuß, Georg; Böhm, Klemens

doi:10.5445/IR/1000064592

Hiding Outliers in HighDimensional Data Spaces

Steinbuß, Georg; Böhm, Klemens

Abstract:

Detecting outliers in high-dimensional data is crucial in many domains. Due to the curse of dimensionality, one typically does not detect outliers in the full space, but in subspaces of it. More specifically, since the number of subspaces is huge, the detection takes place in only some subspaces. In consequence, one might miss hidden outliers, i.e., outliers only detectable in certain subspaces. In this paper, we take the opposite perspective, which is of practical relevance as well, and study how to hide outliers in high-dimensional data spaces. We formally prove characteristics of hidden outliers. We also propose an algorithm to place them in the data. It focuses on the regions close to existing data objects and is more efficient than an exhaustive approach. In experiments, we both evaluate our formal results and show the usefulness of our algorithm using di↵erent subspace selection schemes, outlier detection methods and data sets.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000064592

Export

Statistiken

Seitenaufrufe: 211
seit 05.05.2018

Downloads: 466
seit 17.01.2017

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Forschungsbericht/Preprint
Publikationsjahr	2017
Sprache	Englisch
Identifikator	ISSN: 2190-4782 urn:nbn:de:swb:90-645920 KITopen-ID: 1000064592
Verlag	Karlsruher Institut für Technologie (KIT)
Umfang	13 S.
Serie	Karlsruhe Reports in Informatics ; 2017,1

Repository KITopen

Hiding Outliers in HighDimensional Data Spaces

Abstract: