KIT | KIT-Bibliothek | Impressum | Datenschutz

A framework for dependency estimation in heterogeneous data streams

Fouché, Edouard 1; Mazankiewicz, Alan 1; Kalinke, Florian 1; Böhm, Klemens 1
1 Karlsruher Institut für Technologie (KIT)

Abstract:

Estimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to a better understanding of data and improves both the runtime and the outcomes of downstream Data Mining tasks. Dependency estimation from static numerical data has received much attention. However, real-world data often occurs as heterogeneous data streams: On the one hand, data is collected online and is virtually infinite. On the other hand, the various components of a stream may be of different types, e.g., numerical, ordinal or categorical. For this setting, we propose Monte Carlo Dependency Estimation (MCDE), a framework that quantifies multivariate dependency as the average statistical discrepancy between marginal and conditional distributions, via Monte Carlo simulations. MCDE handles heterogeneity by leveraging three statistical tests: the Mann–Whitney U, the Kolmogorov–Smirnov and the Chi-Squared test. We demonstrate that MCDE goes beyond the state of the art regarding dependency estimation by meeting a broad set of requirements. Finally, we show with a real-world use case that MCDE can discover useful patterns in heterogeneous data streams.


Verlagsausgabe §
DOI: 10.5445/IR/1000120837
Veröffentlicht am 02.07.2020
Originalveröffentlichung
DOI: 10.1007/s10619-020-07295-x
Scopus
Zitationen: 3
Web of Science
Zitationen: 3
Dimensions
Zitationen: 4
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2021
Sprache Englisch
Identifikator ISSN: 0926-8782, 1573-7578
KITopen-ID: 1000120837
Erschienen in Distributed and parallel databases
Verlag Springer
Band 39
Seiten 415–444
Vorab online veröffentlicht am 06.06.2020
Nachgewiesen in Scopus
Dimensions
Web of Science
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page