Active job monitoring in pilots

Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.

doi:10.1088/1742-6596/664/5/052019

Active job monitoring in pilots

Kuehn, E.

¹; Fischer, M.

¹; Giffels, M.

¹; Jung, C. ¹; Petzold, A.

¹
¹ Karlsruher Institut für Technologie (KIT)

Abstract:

Recent developments in high energy physics (HEP) including multi-core jobs and multi-core pilots require data centres to gain a deep understanding of the system to monitor, design, and upgrade computing clusters. Networking is a critical component. Especially the increased usage of data federations, for example in diskless computing centres or as a fall-back solution, relies on WAN connectivity and availability. The specific demands of different experiments and communities, but also the need for identification of misbehaving batch jobs, requires an active monitoring. Existing monitoring tools are not capable of measuring fine-grained information at batch job level. This complicates network-aware scheduling and optimisations. In addition, pilots add another layer of abstraction. They behave like batch systems themselves by managing and executing payloads of jobs internally. The number of real jobs being executed is unknown, as the original batch system has no access to internal information about the scheduling process inside the pilots. Therefore, the comparability of jobs and pilots for predicting run-time behaviour or network performance cannot be ensured. ... mehr

KITopen-Download

Volltext

DOI: 10.5445/IR/110104358

Externe Links

Originalveröffentlichung
DOI: 10.1088/1742-6596/664/5/052019

Scopus
Zitationen: 1

Dimensions
Zitationen: 1

Export

Statistiken

Seitenaufrufe: 499
seit 06.05.2018

Downloads: 341
seit 13.10.2017

Zugehörige Institution(en) am KIT	Institut für Kernphysik (IKP) Scientific Computing Center (SCC) Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2015
Sprache	Englisch
Identifikator	ISSN: 1742-6588, 1742-6596 urn:nbn:de:swb:90-AAA1101043589 KITopen-ID: 110104358
HGF-Programm	51.01.01 (POF III, LK 01) Teilchenphysik
Erschienen in	Journal of physics / Conference Series
Verlag	Institute of Physics Publishing Ltd (IOP Publishing Ltd)
Band	664
Heft	5
Seiten	052019/1-8
Nachgewiesen in	OpenAlex Dimensions Scopus

Repository KITopen

Active job monitoring in pilots

Abstract: