Dimension-based subspace search for outlier detection

Trittenbach, Holger; Böhm, Klemens

doi:10.1007/s41060-018-0137-7

Dimension-based subspace search for outlier detection

Trittenbach, Holger; Böhm, Klemens

Abstract:

Scientific data often are high dimensional. In such data, finding outliers are challenging because they often are hidden in subspaces, i.e., lower-dimensional projections of the data. With recent approaches to outlier mining, the actual detection of outliers is decoupled from the search for subspaces likely to contain outliers. However, finding such sets of subspaces that contain most or even all outliers of the given data set remains an open problem. While previous proposals use per-subspace measures such as correlation in order to quantify the quality of subspaces, we explicitly take the relationship between subspaces into account and propose a dimension-based measure of that quality. Based on it, we formalize the notion of an optimal set of subspaces and propose the Greedy Maximum Deviation heuristic to approximate this set. Experiments on comprehensive benchmark data show that our concept is more effective in determining the relevant set of subspaces than approaches which use per-subspace measures.

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2018
Sprache	Englisch
Identifikator	ISSN: 2364-415X, 2364-4168 KITopen-ID: 1000083489
Erschienen in	International Journal of Data Science and Analytics
Verlag	Springer
Projektinformation	GRK 2153/1 (DFG, DFG KOORD, GRK 2153/1)
Vorab online veröffentlicht am	14.06.2018
Schlagwörter	Outlier mining Subspace search High-dimensional data
Nachgewiesen in	OpenAlex Dimensions

Externe Links

Originalveröffentlichung
DOI: 10.1007/s41060-018-0137-7

Dimensions
Zitationen: 14

Export

Statistiken

Seitenaufrufe: 191
seit 25.06.2018

Repository KITopen

Dimension-based subspace search for outlier detection

Abstract: