A scalable architecture for online anomaly detection of WLCG batch jobs

Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.

doi:10.1088/1742-6596/762/1/012002

A scalable architecture for online anomaly detection of WLCG batch jobs

Kuehn, E.

¹; Fischer, M.

¹; Giffels, M.

¹; Jung, C. ¹; Petzold, A.

¹
¹ Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

For data centres it is increasingly important to monitor the network usage, and learn from network usage patterns. Especially con_guration issues or misbehaving batch jobs preventing a smooth operation need to be detected as early as possible. At the GridKa data and computing centre we therefore operate a tool BPNetMon for monitoring tra_c data and characteristics of WLCG batch jobs and pilots locally on di_erent worker nodes. On the one hand local information itself are not su_cient to detect anomalies for several reasons, e.g. the underlying job distribution on a single worker node might change or there might be a local miscon_guration. On the other hand a centralised anomaly detection approach does not scale regarding network communication as well as computational costs. We therefore propose a scalable architecture based on concepts of a super-peer network.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000063495

Externe Links

Originalveröffentlichung
DOI: 10.1088/1742-6596/762/1/012002

Scopus

Dimensions

Export

Statistiken

Seitenaufrufe: 472
seit 06.05.2018

Downloads: 389
seit 18.09.2017

Zugehörige Institution(en) am KIT	Institut für Kernphysik (IKP) Scientific Computing Center (SCC) Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2016
Sprache	Englisch
Identifikator	ISSN: 1742-6588, 1742-6596 urn:nbn:de:swb:90-634953 KITopen-ID: 1000063495
HGF-Programm	53.52.02 (POF III, LK 02) GridKa
Erschienen in	Journal of physics / Conference Series
Verlag	Institute of Physics Publishing Ltd (IOP Publishing Ltd)
Band	762
Seiten	012002
Nachgewiesen in	OpenAlex Dimensions Scopus
Globale Ziele für nachhaltige Entwicklung

Repository KITopen

A scalable architecture for online anomaly detection of WLCG batch jobs

Abstract: