Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Fouché, Edouard

doi:10.5445/IR/1000127232

Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Fouché, Edouard

Abstract:

Data Mining – known as the process of extracting knowledge from massive data sets – leads to phenomenal impacts on our society, and now affects nearly every aspect of our lives: from the layout in our local grocery store, to the ads and product recommendations we receive, the availability of treatments for common diseases, the prevention of crime, or the efficiency of industrial production processes.
However, Data Mining remains difficult when (1) data is high-dimensional, i.e., has many attributes, and when (2) data comes as a stream. Extracting knowledge from high-dimensional data streams is impractical because one must cope with two orthogonal sets of challenges. On the one hand, the effects of the so-called "curse of dimensionality" bog down the performance of statistical methods and yield to increasingly complex Data Mining problems. On the other hand, the statistical properties of data streams may evolve in unexpected ways, a phenomenon known in the community as "concept drift". Thus, one needs to update their knowledge about data over time, i.e., to monitor the stream.
While previous work addresses high-dimensional data sets and data streams to some extent, the intersection of both has received much less attention. ... mehr

KITopen-Download

Volltext

DOI: 10.5445/IR/1000127232

Veröffentlicht am 08.12.2020

Export

Statistiken

Seitenaufrufe: 896
seit 08.12.2020

Downloads: 586
seit 08.12.2020

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Hochschulschrift
Publikationsdatum	08.12.2020
Sprache	Englisch
Identifikator	KITopen-ID: 1000127232
Verlag	Karlsruher Institut für Technologie (KIT)
Umfang	X, 164 S.
Art der Arbeit	Dissertation
Fakultät	Fakultät für Informatik (INFORMATIK)
Institut	Institut für Programmstrukturen und Datenorganisation (IPD)
Prüfungsdatum	15.07.2020
Schlagwörter	Data Mining, Data Stream Monitoring, Multivariate Statistics, Online Learning Algorithms, Predictive Maintenance, Anomaly Detection
Nachgewiesen in	OpenAlex
Globale Ziele für nachhaltige Entwicklung
Referent/Betreuer	Böhm, K.

Repository KITopen

Estimating Dependency, Monitoring and Knowledge Discovery in High-Dimensional Data Streams

Abstract: