KIT | KIT-Bibliothek | Impressum

A scalable architecture for online anomaly detection of WLCG batch jobs

Kuehn, E.; Fischer, M; Giffels, M.; Jung, C.; Petzold, A.

For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially con_guration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring tra_c data and characteristics of WLCG batch jobs and pilots locally on di_erent worker nodes. On the one hand local information itself are not su_cient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local miscon_guration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.

Zugehörige Institution(en) am KIT Steinbuch Centre for Computing (SCC)
Institut für Kernphysik (IKP)
Publikationstyp Zeitschriftenaufsatz
Jahr 2016
Sprache Englisch
Identifikator DOI: 10.1088/1742-6596/762/1/012002
ISSN: 1742-6588, 1742-6596
URN: urn:nbn:de:swb:90-634953
KITopen ID: 1000063495
HGF-Programm 53.52.02; LK 02
Erschienen in Journal of physics / Conference Series
Band 762
Seiten 012002
Lizenz CC BY 3.0 DE: Creative Commons Namensnennung 3.0 Deutschland
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page