Hiding Outliers in HighDimensional Data Spaces
Detecting outliers in high-dimensional data is crucial in many domains. Due to the curse of dimensionality, one typically does not detect outliers in the full space, but in subspaces of it. More specifically, since the number of subspaces is huge, the detection takes place in only some subspaces. In consequence, one might miss hidden outliers, i.e., outliers only detectable in certain subspaces. In this paper, we take the opposite perspective, which is of practical relevance as well, and study how to hide outliers in high-dimensional data spaces. We formally prove characteristics of hidden outliers. We also propose an algorithm to place them in the data. It focuses on the regions close to existing data objects and is more efficient than an exhaustive approach. In experiments, we both evaluate our formal results and show the usefulness of our algorithm using di↵erent subspace selection schemes, outlier detection methods and data sets.
|Zugehörige Institution(en) am KIT
||Institut für Programmstrukturen und Datenorganisation (IPD)
KITopen ID: 1000064592
||Karlsruhe Reports in Informatics ; 2017,1
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page